Failover

The automatic switching to a backup system when the primary system fails, ensuring service continuity.

Failover is the process of automatically or manually switching operations from a failed primary system to a standby (secondary) system. The goal is to maintain service continuity with minimal disruption when the primary system experiences a failure.

Failover can be automatic (triggered by health checks detecting a failure) or manual (initiated by an operator). Automatic failover is faster but requires careful configuration to avoid false triggers. The time it takes to complete a failover is a key factor in achieving your RTO (Recovery Time Objective).

Common failover architectures include active-passive (standby system takes over when primary fails), active-active (multiple systems share the load and absorb each other's traffic during failures), and DNS-based failover (DNS records are updated to point to backup servers). Monitoring plays a critical role in failover — tools like Hyperping can detect when a primary endpoint goes down and verify that the failover endpoint is serving correctly.

Hyperping monitoring dashboard

Related Terms

Redundancy
The duplication of critical system components to increase reliability and eliminate single points of...
Availability
The proportion of time a system is functional and accessible, often expressed as a percentage.
RTO (Recovery Time Objective)
The maximum acceptable duration of time a service can be offline after a disaster or failure before ...
Load Balancing
The distribution of incoming network traffic across multiple servers to ensure no single server is o...
Health Check
An endpoint or process that verifies whether a service or its dependencies are functioning correctly...

Related Resources

Get started

Start monitoring in the next 5 minutes.

Stop letting customers discover your outages first. Set up monitoring, status pages, on-call, and alerts before your next coffee break.

14 days free trial — No card required