Monitoring and Logging

Service Level Indicator (SLI)
A specific number that measures what users actually experience from a service, like success rate, response time, or error rate.
Monitoring and Logging
Log Rotation
Automated rotation, compression, and retention of log files to prevent unbounded growth and simplify troubleshooting and storage.
Monitoring and Logging
Observability
Observability uses logs, metrics, and traces to infer system state and quickly troubleshoot performance and reliability issues.
Monitoring and Logging
Elasticsearch
Distributed search and analytics engine for indexing, querying, and aggregating large datasets in near real time.
Monitoring and Logging
eBPF
Linux kernel tech for running tiny safe programs in the kernel to trace, measure, and sometimes control system and network behavior.
Monitoring and Logging
OpenTelemetry
OpenTelemetry is an open-source observability framework that standardizes traces, metrics, and logs from app services.
Monitoring and Logging
Prometheus
Prometheus is an open-source monitoring and alerting toolkit that scrapes time-series metrics from services and evaluates queries and alert rules.
Monitoring and Logging
Prometheus Recording Rule
Prometheus Recording Rule is a Prometheus rule that precomputes PromQL into time series for faster alerts.
Monitoring and Logging
Grafana Tempo
Open source distributed tracing backend that stores and searches request traces so you can find where services are slow or breaking.
Monitoring and Logging