Back to Glossary
DevOps

What is SRE (Site Reliability Engineering)?

Google's approach to applying software engineering practices to operations problems.

Site Reliability Engineering (SRE) is a discipline that applies software engineering principles to operations. Created at Google, SRE teams use code and automation (instead of manual processes) to manage production systems. Core SRE concepts: SLOs/SLIs/SLAs, error budgets, toil reduction, blameless postmortems, and on-call management. SREs write software to automate operational work. The goal: reduce toil, improve reliability, and create a virtuous cycle of reliability investment.

Deep Dive Guide

sre will absorb devops roles

Test your knowledge of SRE (Site Reliability Engineering) and 130 other DevOps concepts