Kubernetes HorizontalPodAutoscaler (HPA)

Kubernetes HorizontalPodAutoscaler (HPA), or Horizontal Pod Autoscaler, is a Kubernetes controller that automatically adjusts the number of pod replicas for a workload based on demand. It scales horizontally by adding pods when metrics such as CPU usage, memory usage, or custom application metrics rise above a target, then removes pods when demand drops. HPA commonly manages Deployments, ReplicaSets, and StatefulSets, and it works with the Kubernetes metrics APIs, such as Metrics Server for CPU and memory data. For example, if an API deployment has a target CPU utilization of 60% and traffic spikes during business hours, HPA can increase replicas from 3 to 10, then scale back down later. HPA helps keep applications responsive and cost-aware, but it depends on accurate metrics, sensible resource requests, and workloads that can safely run multiple replicas.

DevOps Glossary