Learning Path

Site Reliability Engineer Learning Path

Follow this curated path to enhance your ability to maintain reliable services at scale in Datadog.

Through hands-on courses, you’ll explore how to understand application performance, accurately monitor infrastructure and networking in real time, and implement SLO-driven strategies—ensuring you can quickly detect, analyze, and resolve system-wide issues.

This path is designed for Site Reliability Engineers (SREs) and other roles tasked with optimizing service uptime and performance.

You’ll learn how to do the following:

Track service health by interpreting key application performance metrics (e.g., request volume, error rate, latency, etc.)

Investigate infrastructure metrics by filtering, grouping, and visualizing data with tags

Gain deep visibility into network performance to identify bottlenecks and failures

Establish effective SLIs, set realistic SLO targets, and manage SLOs

Centralize and manage incident response

Begin Path - Enroll for Free

Getting Started with Incident Management

Learn how to manage incidents using Datadog Incident Management. By the end of this course, you'll know how to set up Incident Management, detect and declare incidents, and guide your team through resolution.

View Course

Getting Started with APM Metrics & Traces

Monitor service health and performance with Application Performance Management. Explore traces to understand requests and interactions between services. Track key metrics to understand trends that impact system behavior and user experience.

View Course

Getting Started with Infrastructure and Cloud Network Monitoring

Learn how to analyze metrics, visualize network and infrastructure performance, and troubleshoot issues effectively in this introduction to using Datadog’s Infrastructure and Cloud Network Monitoring (CNM).

View Course

Getting Started with Service Level Objectives (SLOs)

In this course, you’ll deepen your understanding of Service Level Objectives (SLOs) and gain hands-on experience using them to solve issues in a web application. This builds on the concepts covered in Understanding Service Level Objectives.

View Course

Learning Center

Site Reliability Engineer Learning Path

You’ll learn how to do the following:

Getting Started with Incident Management

Getting Started with APM Metrics & Traces

Getting Started with Infrastructure and Cloud Network Monitoring

Getting Started with Service Level Objectives (SLOs)

Leave feedback about your experience in our Learning Path Survey.

Complete all courses in the path to earn your Credly badge.

Running into an issue?

Site Reliability Engineer Learning Path

You’ll learn how to do the following:

Getting Started with Incident Management

Getting Started with APM Metrics & Traces

Getting Started with Infrastructure and Cloud Network Monitoring

Getting Started with Service Level Objectives (SLOs)

Leave feedback about your experience in our Learning Path Survey. Complete all courses in the path to earn your Credly badge.

Leave feedback about your experience in our Learning Path Survey.

Complete all courses in the path to earn your Credly badge.