MTBF (Mean Time Between Failures)

The average time between consecutive failures of a repairable system.

MTBF, or Mean Time Between Failures, measures the average elapsed time between one failure and the next for a system that is repaired and returned to service. It is a key reliability metric for repairable systems like web services, databases, and infrastructure.

MTBF is calculated as the total uptime divided by the number of failures. For example, if a service ran for 720 hours in a month and experienced 2 failures, the MTBF would be 360 hours. MTBF = MTTF + MTTR, meaning it accounts for both operational time and repair time.

A higher MTBF indicates a more reliable system. Teams improve MTBF by investing in redundancy, implementing chaos engineering practices, using canary deployments to catch issues early, and monitoring proactively with tools like Hyperping to detect degradation before it becomes a full outage.

Hyperping monitoring dashboard

Related Terms

MTTF (Mean Time to Failure)
The average time a non-repairable system or component operates before it fails.
MTTR (Mean Time to Recover)
The average time it takes to restore a system or service after a failure or incident.
Availability
The proportion of time a system is functional and accessible, often expressed as a percentage.
Five Nines (99.999% Uptime)
A reliability standard allowing no more than 5 minutes and 15 seconds of downtime per year.
Chaos Engineering
The practice of intentionally injecting failures into a system to test its resilience and uncover we...

Related Resources

Get started

Start monitoring in the next 5 minutes.

Stop letting customers discover your outages first. Set up monitoring, status pages, on-call, and alerts before your next coffee break.

14 days free trial — No card required