Learning Path

Site Reliability Engineer Learning Path

Follow this curated path to enhance your ability to maintain reliable services at scale in Datadog.

Through hands-on courses, you’ll explore how to understand application performance, accurately monitor infrastructure and networking in real time, and implement SLO-driven strategies—ensuring you can quickly detect, analyze, and resolve system-wide issues.

This path is designed for Site Reliability Engineers (SREs) and other roles tasked with optimizing service uptime and performance.

You’ll learn how to do the following:

Track service health by interpreting key application performance metrics (e.g., request volume, error rate, latency, etc.)
Investigate infrastructure metrics by filtering, grouping, and visualizing data with tags
Gain deep visibility into network performance to identify bottlenecks and failures
Establish effective SLIs, set realistic SLO targets, and manage SLOs
Centralize and manage incident response

Getting Started with APM Metrics & Traces

Monitor service health and performance with Application Performance Management. Explore traces to understand requests and interactions between services. Track key metrics to understand trends that impact system behavior and user experience.

View Course

Getting Started with Infrastructure and Cloud Network Monitoring

NEW! Learn how to analyze metrics, visualize network and infrastructure performance, and troubleshoot issues effectively in this introduction to using Datadog’s Infrastructure and Cloud Network Monitoring (CNM).

View Course

Getting Started with Service Level Objectives (SLOs)

NEW! In this course, you’ll deepen your understanding of Service Level Objectives (SLOs) and gain hands-on experience using them to solve issues in a web application. This builds on the concepts covered in Understanding Service Level Objectives.

View Course

Introduction to Incident Management

In this course, you learn about managing incidents by working through a hands-on example with Datadog Incident Management. You also learn how to use Slack to effectively communicate incident status to your team.

View Course

Leave feedback about your experience in our Learning Path Survey.

Complete all courses in the path to earn your Credly badge.