Observability

The ability to understand the internal state of a system by examining its external outputs: logs, metrics, and traces.

Observability is the ability to understand what is happening inside a system based on the data it produces externally. It is built on three pillars: metrics (numerical measurements over time, like CPU usage or request latency), logs (timestamped records of discrete events), and traces (end-to-end records of requests as they flow through distributed systems).

Observability goes beyond traditional monitoring. While monitoring tells you "is the system up?" and "is the metric above the threshold?", observability lets you ask arbitrary questions about system behavior: "Why are requests from EU users slower than US users?" or "What changed between yesterday and today that caused error rates to spike?"

Building observable systems requires instrumenting applications with structured logs, exporting metrics to time-series databases, implementing distributed tracing, and making this data queryable. External monitoring tools like Hyperping complement internal observability by providing an outside-in view of service availability and performance as users experience it.