DevOps Glossary

Circuit Breaker

Circuit Breaker is a pattern that pauses calls to failing services to reduce cascading failures during outages.

Circuit Breaker is a resilience pattern that temporarily stops calls to a failing dependency, such as an API, database, queue, or microservice. In practical terms, it prevents one unhealthy service from consuming resources and causing cascading failures across the rest of your system.

What a circuit breaker does

A circuit breaker watches calls to a dependency and tracks failure signals such as timeouts, connection errors, HTTP 5xx responses, or rejected requests. When failures pass a configured threshold, the circuit breaker “opens” and blocks new calls for a short period.

Instead of waiting for a failing service to time out repeatedly, your application can fail fast, return a fallback response, serve cached data, or degrade the feature safely.

How it works

Most circuit breakers use three states:

  • Closed: Calls flow normally. The circuit breaker records successes and failures.
  • Open: Calls are blocked immediately because the dependency appears unhealthy.
  • Half-open: A limited number of test calls are allowed through. If they succeed, the circuit closes. If they fail, it opens again.

Teams usually configure circuit breakers with settings such as:

  • Failure threshold: For example, open the circuit after 50% of requests fail over a rolling 30-second window.
  • Timeout: For example, treat calls taking longer than 2 seconds as failures.
  • Open duration: For example, block calls for 10 seconds before trying half-open test requests.
  • Minimum request count: For example, require at least 20 requests before calculating a failure rate.
  • Fallback behavior: Return cached data, a default value, an error response, or a reduced feature set.

Common use cases

  • Microservices: Stop one slow service from exhausting threads, connections, or worker pools in other services.
  • Third-party APIs: Protect your application when payment, email, identity, or analytics providers are unavailable.
  • Databases and caches: Avoid repeated expensive calls to an overloaded datastore.
  • Event-driven systems: Pause or redirect work when downstream consumers or brokers are failing.
  • Edge and API gateways: Fail fast before requests reach unhealthy upstream services.

Simple example

Assume a checkout service calls a payment provider. The provider starts timing out, and each request waits 5 seconds before failing. Without a circuit breaker, checkout workers pile up waiting for responses. Eventually, the whole checkout path may become unavailable.

With a circuit breaker, the checkout service detects the high timeout rate and opens the circuit. New payment attempts fail quickly with a controlled response, such as “Payment is temporarily unavailable. Please try again.” After 30 seconds, the circuit breaker allows a few test requests. If the payment provider has recovered, normal traffic resumes.

Benefits

  • Limits cascading failures: A failing dependency is less likely to take down unrelated parts of the system.
  • Reduces resource exhaustion: Threads, connections, CPU, and memory are not tied up by repeated slow calls.
  • Improves recovery behavior: Unhealthy services get time to recover instead of receiving constant retry traffic.
  • Supports graceful degradation: Applications can return cached data, partial responses, or clear errors.

Tradeoffs and limitations

  • Bad thresholds can hurt availability: If thresholds are too sensitive, the circuit may open during brief spikes. If they are too loose, failures may spread before protection starts.
  • Fallbacks need careful design: Returning stale prices, stale permissions, or incomplete account data can create product or security issues.
  • It does not fix the root cause: A circuit breaker contains failure, but the dependency still needs debugging and recovery.
  • It needs observability: You should track circuit state, failure rates, latency, fallback usage, and recovery attempts.

Circuit breaker vs retry vs timeout

  • Timeout: Sets a maximum wait time for one call. For example, stop waiting after 1 second.
  • Retry: Tries the same operation again after a failure. Retries should use backoff and jitter to avoid adding load.
  • Circuit breaker: Stops new calls for a period when a dependency is likely unhealthy.

These patterns often work together. A service might use a 1-second timeout, retry once with backoff, then let the circuit breaker open if many calls still fail.

Related tools and implementations

Circuit breaker support exists in many service mesh, gateway, and application libraries. Examples include Resilience4j for Java, Polly for .NET, Hystrix in older Java systems, Envoy outlier detection and circuit breaking, Istio traffic policies, and Spring Cloud CircuitBreaker.

The best implementation depends on where you want the protection to live: inside application code, at the sidecar proxy, in an API gateway, or at the service mesh layer.