What Is OpenTelemetry? Observability Standard Explained Simply
OpenTelemetry (OTel) is the open standard for collecting traces, metrics, and logs. Learn what it is, why it matters, and how to start using it.
Every time you look at a Grafana dashboard or chase a slow API request across 10 microservices, you're depending on observability data. OpenTelemetry is the standard that collects that data — without locking you into any vendor.
The Problem Before OpenTelemetry
Before OTel, if you wanted traces, you'd use Jaeger SDK. For metrics, Prometheus client. For logs, maybe Fluentd. Each service needed a different SDK, different config, different agent.
If you switched from Jaeger to Zipkin, you rewrote all your instrumentation code.
OpenTelemetry solves this by being a single, vendor-neutral SDK that collects traces, metrics, and logs, and exports to any backend you choose.
What Is OpenTelemetry?
OpenTelemetry (OTel) is:
- A standard — defines how observability data is collected and formatted
- An SDK — libraries for every major language to instrument your code
- A Collector — an agent/proxy that receives, processes, and exports telemetry
- A protocol — OTLP (OpenTelemetry Protocol) for sending data
It's maintained by CNCF (same org as Kubernetes, Prometheus, Helm).
The Three Pillars
Traces
A trace follows a request as it moves through your system.
User request → API Gateway → Auth Service → User Service → DB
| | | | |
Span 1 Span 2 Span 3 Span 4 Span 5
50ms total 5ms 15ms 20ms 8ms
Each step is a span. All spans with the same trace ID form a trace. This is how you find which service is slow.
Metrics
Numerical measurements over time: CPU usage, request count, error rate, latency histogram.
http_requests_total{method="GET", status="200"} 1523
http_request_duration_seconds{p99} 0.234
Logs
Text records of events. OTel can attach trace IDs to logs so you can correlate "this log line came from this trace."
How It Works
Your App (with OTel SDK)
↓
OpenTelemetry Collector
↓
Backends:
- Traces → Jaeger / Tempo / Datadog
- Metrics → Prometheus / Datadog / New Relic
- Logs → Loki / Elasticsearch / Datadog
The Collector is the key piece — it decouples your app from backends. Change backend? Update Collector config, not app code.
Quick Start: Instrument a Python App
pip install opentelemetry-sdk \
opentelemetry-instrumentation-fastapi \
opentelemetry-exporter-otlp# main.py
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from fastapi import FastAPI
# Set up tracing
provider = TracerProvider()
processor = BatchSpanProcessor(
OTLPSpanExporter(endpoint="http://otel-collector:4317")
)
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
app = FastAPI()
FastAPIInstrumentor.instrument_app(app) # Auto-instruments all routes
tracer = trace.get_tracer(__name__)
@app.get("/users/{user_id}")
def get_user(user_id: str):
with tracer.start_as_current_span("get-user-from-db") as span:
span.set_attribute("user.id", user_id)
# ... your code
return {"id": user_id}FastAPIInstrumentor automatically creates spans for every request — you don't have to manually wrap each endpoint.
The OpenTelemetry Collector
The Collector runs as a sidecar or standalone deployment:
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
resource:
attributes:
- key: service.environment
value: production
action: insert
exporters:
jaeger:
endpoint: jaeger:14250
prometheus:
endpoint: "0.0.0.0:8889"
loki:
endpoint: http://loki:3100/loki/api/v1/push
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, resource]
exporters: [jaeger]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus]
logs:
receivers: [otlp]
processors: [batch]
exporters: [loki]One Collector config routes all telemetry to the right backends.
Deploy Collector on Kubernetes
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-collector
namespace: monitoring
spec:
selector:
matchLabels:
app: otel-collector
template:
spec:
containers:
- name: collector
image: otel/opentelemetry-collector-contrib:latest
ports:
- containerPort: 4317 # OTLP gRPC
- containerPort: 4318 # OTLP HTTP
volumeMounts:
- name: config
mountPath: /etc/otelcol
volumes:
- name: config
configMap:
name: otel-collector-configDaemonSet = one collector per node, so pods send data to local collector without network hops.
Auto-Instrumentation on Kubernetes
The OTel Operator can auto-instrument pods without code changes:
# Install the operator
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml# Annotate your deployment to enable auto-instrumentation
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
annotations:
instrumentation.opentelemetry.io/inject-python: "true" # or java, nodejs, dotnetThe operator injects the OTel SDK as an init container — your app gets tracing without any code changes.
OTel vs Prometheus
| OpenTelemetry | Prometheus | |
|---|---|---|
| Data types | Traces + Metrics + Logs | Metrics only |
| Collection | Push (OTLP) | Pull (scrape) |
| Language SDKs | All major languages | Client libraries |
| Backend | Any (via exporters) | Prometheus + Grafana |
| Adoption | Growing fast | Established |
They're complementary. Many teams use Prometheus for metrics but OTel for traces and logs. OTel can also export metrics in Prometheus format.
Summary
OTel SDK → instruments your code (traces, metrics, logs)
OTel Collector → receives, processes, exports
OTLP → the wire protocol between them
Backends → where data lives (Jaeger, Tempo, Prometheus, Loki)
OpenTelemetry is becoming the standard way to instrument cloud-native apps. If you're setting up a new service in 2026, start with OTel — you'll be able to switch backends without touching your app code.
KodeKloud Observability Course — covers Prometheus, Grafana, OpenTelemetry, and distributed tracing with hands-on labs.
Grafana Cloud — free tier that accepts OTLP data natively. The fastest way to see your traces and metrics without running your own backends.
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AI-Powered Kubernetes Anomaly Detection: Beyond Static Thresholds
Static alerts miss 40% of real incidents. Learn how AI and ML-based anomaly detection — using tools like Prometheus + ML, Dynatrace, and custom LLM runbooks — catches what thresholds can't.
Build an AI Alert Classifier for Grafana Using LLMs (2026)
Tired of noisy Grafana alerts that wake you up for nothing? Build an AI layer that classifies incoming alerts as actionable or noise, enriches them with context, and routes them intelligently — using Claude or GPT-4 as the reasoning engine.
Build a Complete Kubernetes Monitoring Stack from Scratch (2026)
Step-by-step project walkthrough: set up Prometheus, Grafana, Loki, and AlertManager on Kubernetes using Helm. Real configs, real dashboards, production-ready.