OpenTelemetry Collector
OpenTelemetry Collector is an open source service that receives, processes, and exports telemetry data such as traces, metrics, and logs. In practical terms, it acts as a vendor-neutral pipeline between your applications, infrastructure, and observability backends.
The Collector is part of OpenTelemetry, often called OTel, which standardizes how teams generate and move telemetry data. Instead of sending data directly from every service to a monitoring vendor, you can send it to the Collector first, then control routing, filtering, batching, enrichment, and export behavior in one place.
What the OpenTelemetry Collector does
The OpenTelemetry Collector helps teams manage telemetry data before it reaches tools such as Prometheus, Grafana, Jaeger, Tempo, Datadog, New Relic, Honeycomb, Elasticsearch, or cloud observability services.
Common tasks include:
- Receiving telemetry: Accepts traces, metrics, and logs from applications, agents, SDKs, and infrastructure components.
- Processing data: Adds metadata, drops noisy data, batches records, samples traces, filters attributes, and transforms telemetry.
- Exporting telemetry: Sends data to one or more backends using supported protocols and exporters.
- Reducing vendor lock-in: Keeps instrumentation mostly independent from the observability backend.
- Centralizing control: Lets platform and SRE teams manage telemetry routing and policy through configuration.
How it works
The Collector is configured as a pipeline. Each pipeline defines how telemetry enters, how it is processed, and where it is sent.
A typical pipeline has three main parts:
- Receivers: Define where telemetry comes from. Examples include OTLP, Prometheus, Jaeger, Zipkin, host metrics, Kubernetes events, and file logs.
- Processors: Modify or control telemetry in transit. Examples include batch, memory limiter, resource detection, attributes, transform, filter, and tail sampling processors.
- Exporters: Define where telemetry goes. Examples include OTLP, Prometheus remote write, logging, Jaeger, Elasticsearch, Kafka, and vendor-specific exporters.
The most common protocol used with the Collector is OTLP, the OpenTelemetry Protocol. Applications instrumented with OpenTelemetry SDKs often send telemetry to the Collector over OTLP using gRPC or HTTP.
Deployment patterns
You can run the OpenTelemetry Collector in different ways depending on your environment and traffic volume.
- Agent mode: Runs close to the workload, often as a DaemonSet in Kubernetes or as a local service on a VM. This pattern is useful for collecting host metrics, container logs, and local application telemetry.
- Gateway mode: Runs as a centralized service that receives telemetry from many applications or agents. This pattern is useful for routing data to several backends, enforcing shared processing rules, and reducing direct outbound connections from workloads.
- Sidecar mode: Runs next to a specific application container. This is less common but can work when a service needs isolated telemetry handling.
Many production setups combine agent and gateway modes. For example, a Kubernetes DaemonSet may collect node-level data and forward it to a gateway Collector that applies sampling, batching, and export rules.
Common use cases
- Standardizing telemetry pipelines: Send data from many services through one consistent path instead of configuring every service separately.
- Migrating observability tools: Export the same telemetry to a current backend and a new backend during a migration.
- Reducing telemetry cost: Drop low-value attributes, sample traces, filter noisy logs, and batch data before export.
- Adding infrastructure context: Enrich telemetry with Kubernetes namespace, pod name, node name, cloud region, service name, or deployment environment.
- Improving reliability: Buffer and batch telemetry so applications do not need to handle every backend connection directly.
- Routing by environment or service: Send production traces to one backend, staging metrics to another, or security-relevant logs to a separate destination.
Simple example
Assume you run 40 microservices on Kubernetes. Each service emits traces through the OpenTelemetry SDK. Instead of configuring each service to send traces directly to your observability vendor, you configure each service to send OTLP data to a Collector service inside the cluster.
The Collector can then:
- Add Kubernetes metadata such as namespace, pod, and container name.
- Batch spans to reduce network overhead.
- Apply tail sampling so only useful traces are exported, such as slow requests or failed requests.
- Send traces to Tempo for engineering teams and selected data to another backend for longer retention.
This keeps application instrumentation simple. Your platform team can change export destinations or sampling rules without redeploying every application.
Key components and related concepts
- OTLP: The OpenTelemetry Protocol used to send telemetry between SDKs, Collectors, and backends.
- OpenTelemetry SDK: Code library used inside applications to generate traces, metrics, and logs.
- Instrumentation: The process of adding telemetry generation to application code, frameworks, runtimes, or libraries.
- Trace sampling: A method for deciding which traces to keep or drop. The Collector supports processors such as probabilistic sampling and tail sampling.
- Resource attributes: Metadata that describes where telemetry came from, such as service.name, deployment.environment, cloud.region, and k8s.namespace.name.
- Collector distribution: A packaged build of the Collector that includes a chosen set of receivers, processors, exporters, and extensions.
OpenTelemetry Collector vs. OpenTelemetry SDK
The OpenTelemetry SDK runs inside your application or runtime. It creates telemetry data, such as spans for HTTP requests or metrics for request duration.
The OpenTelemetry Collector runs outside the application. It receives that telemetry, processes it, and sends it to the right destination.
A common setup is:
- An application uses an OpenTelemetry SDK or auto-instrumentation agent.
- The application sends OTLP telemetry to the Collector.
- The Collector enriches, batches, samples, and routes the data.
- The Collector exports the data to one or more observability backends.
Benefits
- Vendor-neutral pipeline: You can change backends with fewer application-level changes.
- Centralized configuration: Teams can manage telemetry behavior through Collector config rather than scattered service settings.
- Better control over data volume: Sampling, filtering, and batching help control noise and cost.
- Flexible routing: The same telemetry stream can be sent to different destinations based on data type, service, or environment.
- Operational consistency: Platform teams can apply common metadata, limits, and export rules across services.
Tradeoffs and limitations
- It adds another component to operate: You need to deploy, monitor, scale, and upgrade the Collector.
- Bad configuration can drop data: Filters, sampling rules, queue limits, or memory limits can remove telemetry if configured incorrectly.
- It does not replace instrumentation: Applications still need SDKs, agents, or integrations to generate useful telemetry.
- High-cardinality data still needs care: The Collector can filter or transform attributes, but teams should still avoid unbounded labels such as user IDs in metrics.
- Scaling depends on workload: Trace-heavy systems, large log volumes, and complex processors may require multiple Collector replicas and careful resource limits.
When to use it
Use the OpenTelemetry Collector when you want a standard telemetry pipeline across services, clusters, and observability tools. It is especially useful for Kubernetes platforms, microservice environments, multi-cloud setups, and teams that need to control telemetry volume before it reaches paid backends.
For a small application with one backend and low telemetry volume, direct export from the SDK may be enough. As the system grows, the Collector usually becomes the cleaner place to manage routing, enrichment, filtering, and sampling.