Reliability

DevOps glossary terms in Reliability.

  • Chaos Engineering

    Chaos Engineering is a practice of safely injecting failures into a system to observe its behavior and improve reliability.

    Reliability

  • Error Budget

    The allowed amount of downtime or errors a service can have and still hit its reliability target (SLO).

    Reliability

  • Incident Management

    Coordinated way to detect, prioritize, fix, and learn from service outages or other unplanned problems so systems get back to normal fast.

    Reliability

  • Service Level Agreement (SLA)

    A contract that defines expected service uptime, performance, and support response times between a provider and a customer.

    Reliability

  • Dead Letter Queue (DLQ)

    Dead Letter Queue (DLQ) is a queue for failed messages, used to isolate errors for later retry or inspection.

    Reliability

  • Circuit Breaker

    Circuit Breaker is a pattern that pauses calls to failing services to reduce cascading failures during outages.

    Reliability

  • gRPC Deadline

    A gRPC deadline is a per-RPC time limit that tells services when to stop waiting and fail the request.

    Reliability

  • Uptime

    The percentage of time a system or service is up, running, and available to users.

    Reliability