Auto-scaling is the ability of a system to automatically increase or decrease its compute resources (servers, containers, functions) based on current demand. When traffic spikes, auto-scaling adds capacity to maintain performance. When traffic drops, it removes capacity to reduce costs.
Auto-scaling can be reactive (triggered by metrics crossing a threshold, like CPU usage exceeding 70%) or predictive (based on historical patterns, like scaling up before expected peak hours). Most cloud platforms support auto-scaling for virtual machines, container orchestrators (Kubernetes HPA), and serverless functions (which inherently auto-scale).
Auto-scaling relies on accurate monitoring data to make scaling decisions. If metrics are delayed or inaccurate, the system may scale too late or too aggressively. External monitoring with Hyperping complements auto-scaling by verifying that the user-facing experience remains acceptable during scaling events — because adding capacity doesn't help if the load balancer or application layer has a bug.