Auto-Scaling

The automatic adjustment of compute resources based on current demand to maintain performance and control costs.

Auto-scaling is the ability of a system to automatically increase or decrease its compute resources (servers, containers, functions) based on current demand. When traffic spikes, auto-scaling adds capacity to maintain performance. When traffic drops, it removes capacity to reduce costs.

Auto-scaling can be reactive (triggered by metrics crossing a threshold, like CPU usage exceeding 70%) or predictive (based on historical patterns, like scaling up before expected peak hours). Most cloud platforms support auto-scaling for virtual machines, container orchestrators (Kubernetes HPA), and serverless functions (which inherently auto-scale).

Auto-scaling relies on accurate monitoring data to make scaling decisions. If metrics are delayed or inaccurate, the system may scale too late or too aggressively. External monitoring with Hyperping complements auto-scaling by verifying that the user-facing experience remains acceptable during scaling events — because adding capacity doesn't help if the load balancer or application layer has a bug.

Related Terms

Load Balancing

The distribution of incoming network traffic across multiple servers to ensure no single server is o...

Throughput

The rate at which a system processes requests or data, typically measured in requests per second.

Availability

The proportion of time a system is functional and accessible, often expressed as a percentage.

Related Resources

Uptime Monitoring →

Get started

Start monitoring in the next 5 minutes.

Stop letting customers discover your outages first. Set up monitoring, status pages, on-call, and alerts before your next coffee break.

Talk to sales

14 days free trial. No card required.