Root Cause Analysis (RCA)

A systematic investigation technique used to identify the fundamental cause of an incident, not just its symptoms.

Root cause analysis (RCA) is a systematic process for identifying the underlying cause of an incident or problem, rather than just addressing its symptoms. The goal is to understand why the incident occurred so that preventive measures can be implemented to avoid recurrence.

Common RCA techniques include the "Five Whys" (repeatedly asking "why" until you reach the fundamental cause), fishbone diagrams (categorizing potential causes by type), timeline analysis (mapping events chronologically to identify the trigger), and fault tree analysis (modeling how combinations of failures led to the incident).

In software engineering, root causes often fall into categories like code defects, configuration errors, capacity issues, dependency failures, human errors in operations, or process gaps. Modern SRE practice emphasizes that most incidents have multiple contributing factors rather than a single root cause, and the most impactful improvements often come from addressing systemic issues (like lack of monitoring coverage or insufficient testing) rather than the immediate trigger.

Hyperping monitoring dashboard

Related Terms

Post-Mortem (Incident Review)
A structured review conducted after an incident to identify root causes and prevent recurrence.
Blameless Postmortem
An incident review process that focuses on systemic improvements rather than individual fault.
Incident Management
The process of detecting, responding to, resolving, and learning from service disruptions.

Related Resources

Get started

Start monitoring in the next 5 minutes.

Stop letting customers discover your outages first. Set up monitoring, status pages, on-call, and alerts before your next coffee break.

14 days free trial — No card required