Grafana Loki: The Complete Log Aggregation Guide for DevOps Engineers (2026)
Grafana Loki is the Prometheus-inspired log aggregation system built for Kubernetes. This guide covers architecture, installation, LogQL queries, and production best practices.
If you're already using Prometheus for metrics and Grafana for dashboards, Loki is the missing piece of your observability stack.
Loki is Grafana's log aggregation system — designed to work exactly like Prometheus but for logs. Instead of scraping metrics, it collects log streams and indexes only their labels (not their content), which makes it dramatically cheaper to run than Elasticsearch.
This guide covers everything: what Loki is, how it works, how to deploy it in Kubernetes, and how to query logs effectively with LogQL.
What Is Grafana Loki?
Loki was built at Grafana Labs in 2018 with a specific design philosophy: index the labels, not the log content.
Traditional log aggregation systems like Elasticsearch index every word in every log line. That makes search fast but storage expensive — often 10–20x the raw log volume.
Loki only indexes metadata labels (like app=nginx, namespace=production). The actual log content is stored compressed and queried on demand. This makes Loki:
- 10x cheaper to run than Elasticsearch at scale
- Native to Kubernetes — Pod labels become log labels automatically
- Unified with Grafana — switch between metrics and logs in the same dashboard
- Prometheus-compatible — same label model, familiar query language
Loki Architecture
Understanding Loki's components helps you deploy and debug it effectively.
┌─────────────────────────────────────────────────────────┐
│ Loki Architecture │
│ │
│ ┌──────────┐ ┌────────────┐ ┌─────────────────┐ │
│ │ Promtail │───▶│ Distributor│───▶│ Ingester │ │
│ │ (agent) │ │ │ │ (write buffer) │ │
│ └──────────┘ └────────────┘ └────────┬────────┘ │
│ │ │
│ ┌──────────┐ ┌────────────┐ ┌────────▼────────┐ │
│ │ Grafana │◀───│ Querier │◀───│ Object Store │ │
│ │ │ │ │ │ (S3/GCS/Azure) │ │
│ └──────────┘ └────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────┘
Key components:
| Component | Role |
|---|---|
| Promtail | Agent that runs on each node, tails log files, and pushes to Loki |
| Distributor | Receives log streams, validates, and routes to ingesters |
| Ingester | Buffers logs in memory and flushes to object storage |
| Querier | Executes LogQL queries against object storage |
| Object Store | S3, GCS, Azure Blob, or local filesystem for log chunks |
Installing Loki with Helm
The recommended way to run Loki in Kubernetes is with the official Helm chart.
Add the Grafana Helm repository
helm repo add grafana https://grafana.github.io/helm-charts
helm repo updateCreate a values file for Loki
# loki-values.yaml
loki:
auth_enabled: false
commonConfig:
replication_factor: 1
storage:
type: filesystem # use s3 in production
# Single binary mode (good for small clusters)
deploymentMode: SingleBinary
singleBinary:
replicas: 1
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 1
memory: 1Gi
# Disable components not needed in single binary mode
read:
replicas: 0
write:
replicas: 0
backend:
replicas: 0Install Loki
helm install loki grafana/loki \
--namespace monitoring \
--create-namespace \
-f loki-values.yamlInstalling Promtail (Log Collector)
Promtail runs as a DaemonSet on every node, tailing pod logs and shipping them to Loki.
# promtail-values.yaml
config:
clients:
- url: http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push
snippets:
pipelineStages:
- cri: {} # parse CRI log format (Kubernetes default)helm install promtail grafana/promtail \
--namespace monitoring \
-f promtail-values.yamlVerify Promtail is running on all nodes:
kubectl get daemonset promtail -n monitoring
kubectl logs -n monitoring -l app.kubernetes.io/name=promtail --tail=20Connecting Loki to Grafana
Add Loki as a data source in Grafana:
- Open Grafana → Configuration → Data Sources → Add data source
- Select Loki
- Set URL:
http://loki.monitoring.svc.cluster.local:3100 - Click Save & Test
Or use a ConfigMap to provision it automatically:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources
namespace: monitoring
labels:
grafana_datasource: "1"
data:
loki.yaml: |
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
url: http://loki.monitoring.svc.cluster.local:3100
isDefault: false
jsonData:
maxLines: 1000LogQL: Querying Logs
LogQL is Loki's query language — it's similar to PromQL but designed for log streams.
Basic log stream selector
# All logs from the nginx app in production
{app="nginx", namespace="production"}
# All logs from any pod with the error label
{level="error"}
# Logs from a specific container
{container="api-server", namespace="default"}Filter expressions
# Lines containing "error"
{app="nginx"} |= "error"
# Lines NOT containing "health"
{app="api"} != "/health"
# Lines matching a regex
{app="backend"} |~ "status=5[0-9]{2}"
# Multiple filters chained
{app="nginx", namespace="production"} |= "error" != "404"Parsing structured logs (JSON)
Most modern apps log in JSON. Loki can parse it:
# Parse JSON and filter by field
{app="api"} | json | level="error"
# Extract specific fields
{app="api"} | json | line_format "{{.level}}: {{.message}}"
# Filter by parsed field
{app="backend"} | json | status_code >= 500Log rate queries (metrics from logs)
# Error rate per minute
rate({app="api"} |= "error" [1m])
# Count of 5xx errors per pod
sum by (pod) (
count_over_time({app="nginx"} | json | status >= 500 [5m])
)Useful Loki Dashboards in Grafana
Error rate panel
sum(rate({namespace="production"} |= "error" [5m])) by (app)Use this as a time-series panel to see which app generates the most errors.
Log volume by namespace
sum(rate({namespace=~".+"} [5m])) by (namespace)Recent errors (table panel)
{namespace="production"} |= "error" | json | line_format "{{.timestamp}} [{{.app}}] {{.message}}"Production Configuration: Using S3 as Storage
For production, use object storage instead of local filesystem:
# loki-production-values.yaml
loki:
auth_enabled: true
storage:
type: s3
s3:
region: us-east-1
bucketnames: my-loki-logs
s3forcepathstyle: false
storageConfig:
aws:
s3: s3://my-loki-logs
region: us-east-1
boltdb_shipper:
active_index_directory: /loki/index
shared_store: s3
cache_location: /loki/cache
filesystem:
directory: /loki/chunks
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/loki-s3-roleCreate the IAM policy for Loki to write to S3:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-loki-logs",
"arn:aws:s3:::my-loki-logs/*"
]
}
]
}Log Retention Policy
Set retention so old logs are automatically deleted:
loki:
limits_config:
retention_period: 30d # delete logs older than 30 days
compactor:
retention_enabled: true
working_directory: /loki/compactorAlerting on Logs with Loki
Loki supports alerting rules similar to Prometheus:
# Create a PrometheusRule for Loki alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: loki-alerts
namespace: monitoring
spec:
groups:
- name: loki-log-alerts
interval: 1m
rules:
- alert: HighErrorRate
expr: |
sum(rate({namespace="production"} |= "error" [5m])) > 10
for: 2m
labels:
severity: warning
annotations:
summary: "High error rate detected in production logs"
description: "More than 10 errors/sec for 2 minutes"
- alert: CriticalException
expr: |
count_over_time({namespace="production"} |= "CRITICAL" [1m]) > 0
for: 0m
labels:
severity: critical
annotations:
summary: "Critical exception in production"Loki vs Elasticsearch: When to Choose What
| Factor | Loki | Elasticsearch |
|---|---|---|
| Cost | Low (label index only) | High (full-text index) |
| Search flexibility | Label + filter based | Full-text search |
| Setup complexity | Simple | Complex |
| Kubernetes-native | Yes | No (needs config) |
| Full-text search | No | Yes |
| Good for | Kubernetes logs, structured logs | Audit logs, compliance, complex search |
Use Loki when: You're on Kubernetes, your logs are structured (JSON), and cost matters.
Use Elasticsearch when: You need full-text search, compliance/audit log retention, or complex aggregations across unstructured data.
Learn More
Want to go deeper on observability — Prometheus, Grafana, Loki, OpenTelemetry, and production monitoring — KodeKloud's hands-on DevOps courses walk you through real setups with actual Kubernetes clusters. No toy examples.
If you're deploying Loki on a cloud VPS or managed Kubernetes, DigitalOcean's managed Kubernetes is one of the most cost-effective ways to get started — clean UI, simple scaling, and great docs.
Summary
Grafana Loki gives you Kubernetes-native log aggregation at a fraction of the cost of Elasticsearch. With Promtail collecting logs automatically from every pod, LogQL for powerful queries, and native Grafana integration, it completes your observability stack alongside Prometheus and Tempo.
Start with single-binary mode for small clusters, use S3 for production storage, and set a 30-day retention policy. You'll have production-grade logging running within an hour.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AI-Powered Kubernetes Anomaly Detection: Beyond Static Thresholds
Static alerts miss 40% of real incidents. Learn how AI and ML-based anomaly detection — using tools like Prometheus + ML, Dynatrace, and custom LLM runbooks — catches what thresholds can't.
Build a Complete Kubernetes Monitoring Stack from Scratch (2026)
Step-by-step project walkthrough: set up Prometheus, Grafana, Loki, and AlertManager on Kubernetes using Helm. Real configs, real dashboards, production-ready.
How to Set Up Prometheus Alertmanager from Scratch (2026)
Step-by-step guide to setting up Prometheus Alertmanager for Kubernetes monitoring. Covers installation, alert rules, routing, Slack/PagerDuty integration, silencing, and production best practices.