Error Budget

The maximum amount of unreliability a service can have within a given period, derived from the SLO.

An error budget is the inverse of an SLO — it quantifies the maximum amount of unreliability a service is allowed within a measurement period. If your SLO targets 99.9% availability over 30 days, your error budget is 0.1% of that period, approximately 43.8 minutes of downtime.

Error budgets are a key concept in Site Reliability Engineering (SRE) that help teams balance reliability with feature velocity. When the error budget is healthy, teams can ship features and take risks. When the error budget is nearly exhausted, teams should prioritize stability — slowing deployments, investing in reliability work, or rolling back risky changes.

Error budget policies define what happens when the budget is exhausted: deployment freezes, mandatory reliability sprints, or escalation to leadership. This framework removes subjective debates about "how reliable is reliable enough" and replaces them with data-driven decisions. Monitoring tools like Hyperping provide the availability data needed to track error budget consumption in real-time.