Reliability

DevOps glossary terms in Reliability.

  • Incident Management

    Coordinated way to detect, prioritize, fix, and learn from service outages or other unplanned problems so systems get back to normal fast.

    Reliability

  • Service Level Agreement (SLA)

    A contract that defines expected service uptime, performance, and support response times between a provider and a customer.

    Reliability

  • Circuit Breaker

    Circuit Breaker is a pattern that pauses calls to failing services to reduce cascading failures during outages.

    Reliability

  • Dead Letter Queue (DLQ)

    Dead Letter Queue (DLQ) is a queue for failed messages, used to isolate errors for later retry or inspection.

    Reliability

  • Error Budget

    Allowed downtime or errors a service can have before it breaks its reliability goal (SLO).

    Reliability

  • gRPC Deadline

    A gRPC deadline is a per-RPC time limit that tells services when to stop waiting and fail the request.

    Reliability

  • Uptime

    The percentage of time a system or service is up, running, and available to users.

    Reliability

  • Chaos Engineering

    Deliberately and safely breaking parts of a system to see what happens, then fixing weak spots so it stays reliable under stress.

    Reliability