Grafana Loki: The Complete Log Aggregation Guide for DevOps Engineers (2026)

Grafana Loki is the Prometheus-inspired log aggregation system built for Kubernetes. This guide covers architecture, installation, LogQL queries, and production best practices.

If you're already using Prometheus for metrics and Grafana for dashboards, Loki is the missing piece of your observability stack.

Loki is Grafana's log aggregation system — designed to work exactly like Prometheus but for logs. Instead of scraping metrics, it collects log streams and indexes only their labels (not their content), which makes it dramatically cheaper to run than Elasticsearch.

This guide covers everything: what Loki is, how it works, how to deploy it in Kubernetes, and how to query logs effectively with LogQL.

What Is Grafana Loki?

Loki was built at Grafana Labs in 2018 with a specific design philosophy: index the labels, not the log content.

Traditional log aggregation systems like Elasticsearch index every word in every log line. That makes search fast but storage expensive — often 10–20x the raw log volume.

Loki only indexes metadata labels (like app=nginx, namespace=production). The actual log content is stored compressed and queried on demand. This makes Loki:

10x cheaper to run than Elasticsearch at scale
Native to Kubernetes — Pod labels become log labels automatically
Unified with Grafana — switch between metrics and logs in the same dashboard
Prometheus-compatible — same label model, familiar query language

Loki Architecture

Understanding Loki's components helps you deploy and debug it effectively.

┌─────────────────────────────────────────────────────────┐
│                    Loki Architecture                     │
│                                                         │
│  ┌──────────┐    ┌────────────┐    ┌─────────────────┐  │
│  │ Promtail │───▶│  Distributor│───▶│    Ingester     │  │
│  │ (agent)  │    │            │    │  (write buffer) │  │
│  └──────────┘    └────────────┘    └────────┬────────┘  │
│                                             │           │
│  ┌──────────┐    ┌────────────┐    ┌────────▼────────┐  │
│  │ Grafana  │◀───│  Querier   │◀───│  Object Store   │  │
│  │          │    │            │    │  (S3/GCS/Azure) │  │
│  └──────────┘    └────────────┘    └─────────────────┘  │
└─────────────────────────────────────────────────────────┘

Key components:

Component	Role
Promtail	Agent that runs on each node, tails log files, and pushes to Loki
Distributor	Receives log streams, validates, and routes to ingesters
Ingester	Buffers logs in memory and flushes to object storage
Querier	Executes LogQL queries against object storage
Object Store	S3, GCS, Azure Blob, or local filesystem for log chunks

Installing Loki with Helm

The recommended way to run Loki in Kubernetes is with the official Helm chart.

Add the Grafana Helm repository

bash

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Create a values file for Loki

yaml

# loki-values.yaml
loki:
  auth_enabled: false
  commonConfig:
    replication_factor: 1
  storage:
    type: filesystem    # use s3 in production
 
# Single binary mode (good for small clusters)
deploymentMode: SingleBinary
 
singleBinary:
  replicas: 1
  resources:
    requests:
      cpu: 200m
      memory: 256Mi
    limits:
      cpu: 1
      memory: 1Gi
 
# Disable components not needed in single binary mode
read:
  replicas: 0
write:
  replicas: 0
backend:
  replicas: 0

Install Loki

bash

helm install loki grafana/loki \
  --namespace monitoring \
  --create-namespace \
  -f loki-values.yaml

Installing Promtail (Log Collector)

Promtail runs as a DaemonSet on every node, tailing pod logs and shipping them to Loki.

yaml

# promtail-values.yaml
config:
  clients:
    - url: http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push
 
  snippets:
    pipelineStages:
      - cri: {}    # parse CRI log format (Kubernetes default)

bash

helm install promtail grafana/promtail \
  --namespace monitoring \
  -f promtail-values.yaml

Verify Promtail is running on all nodes:

bash

kubectl get daemonset promtail -n monitoring
kubectl logs -n monitoring -l app.kubernetes.io/name=promtail --tail=20

Connecting Loki to Grafana

Add Loki as a data source in Grafana:

Open Grafana → Configuration → Data Sources → Add data source
Select Loki
Set URL: http://loki.monitoring.svc.cluster.local:3100
Click Save & Test

Or use a ConfigMap to provision it automatically:

yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasources
  namespace: monitoring
  labels:
    grafana_datasource: "1"
data:
  loki.yaml: |
    apiVersion: 1
    datasources:
      - name: Loki
        type: loki
        access: proxy
        url: http://loki.monitoring.svc.cluster.local:3100
        isDefault: false
        jsonData:
          maxLines: 1000

LogQL: Querying Logs

LogQL is Loki's query language — it's similar to PromQL but designed for log streams.

Basic log stream selector

logql

# All logs from the nginx app in production
{app="nginx", namespace="production"}
 
# All logs from any pod with the error label
{level="error"}
 
# Logs from a specific container
{container="api-server", namespace="default"}

Filter expressions

logql

# Lines containing "error"
{app="nginx"} |= "error"
 
# Lines NOT containing "health"
{app="api"} != "/health"
 
# Lines matching a regex
{app="backend"} |~ "status=5[0-9]{2}"
 
# Multiple filters chained
{app="nginx", namespace="production"} |= "error" != "404"

Parsing structured logs (JSON)

Most modern apps log in JSON. Loki can parse it:

logql

# Parse JSON and filter by field
{app="api"} | json | level="error"
 
# Extract specific fields
{app="api"} | json | line_format "{{.level}}: {{.message}}"
 
# Filter by parsed field
{app="backend"} | json | status_code >= 500

Log rate queries (metrics from logs)

logql

# Error rate per minute
rate({app="api"} |= "error" [1m])
 
# Count of 5xx errors per pod
sum by (pod) (
  count_over_time({app="nginx"} | json | status >= 500 [5m])
)

Useful Loki Dashboards in Grafana

Error rate panel

logql

sum(rate({namespace="production"} |= "error" [5m])) by (app)

Use this as a time-series panel to see which app generates the most errors.

Log volume by namespace

logql

sum(rate({namespace=~".+"} [5m])) by (namespace)

Recent errors (table panel)

logql

{namespace="production"} |= "error" | json | line_format "{{.timestamp}} [{{.app}}] {{.message}}"

Production Configuration: Using S3 as Storage

For production, use object storage instead of local filesystem:

yaml

# loki-production-values.yaml
loki:
  auth_enabled: true
  storage:
    type: s3
    s3:
      region: us-east-1
      bucketnames: my-loki-logs
      s3forcepathstyle: false
  storageConfig:
    aws:
      s3: s3://my-loki-logs
      region: us-east-1
    boltdb_shipper:
      active_index_directory: /loki/index
      shared_store: s3
      cache_location: /loki/cache
    filesystem:
      directory: /loki/chunks
 
serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/loki-s3-role

Create the IAM policy for Loki to write to S3:

json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-loki-logs",
        "arn:aws:s3:::my-loki-logs/*"
      ]
    }
  ]
}

Log Retention Policy

Set retention so old logs are automatically deleted:

yaml

loki:
  limits_config:
    retention_period: 30d    # delete logs older than 30 days
 
  compactor:
    retention_enabled: true
    working_directory: /loki/compactor

Alerting on Logs with Loki

Loki supports alerting rules similar to Prometheus:

yaml

# Create a PrometheusRule for Loki alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: loki-alerts
  namespace: monitoring
spec:
  groups:
    - name: loki-log-alerts
      interval: 1m
      rules:
        - alert: HighErrorRate
          expr: |
            sum(rate({namespace="production"} |= "error" [5m])) > 10
          for: 2m
          labels:
            severity: warning
          annotations:
            summary: "High error rate detected in production logs"
            description: "More than 10 errors/sec for 2 minutes"
 
        - alert: CriticalException
          expr: |
            count_over_time({namespace="production"} |= "CRITICAL" [1m]) > 0
          for: 0m
          labels:
            severity: critical
          annotations:
            summary: "Critical exception in production"

Loki vs Elasticsearch: When to Choose What

Factor	Loki	Elasticsearch
Cost	Low (label index only)	High (full-text index)
Search flexibility	Label + filter based	Full-text search
Setup complexity	Simple	Complex
Kubernetes-native	Yes	No (needs config)
Full-text search	No	Yes
Good for	Kubernetes logs, structured logs	Audit logs, compliance, complex search

Use Loki when: You're on Kubernetes, your logs are structured (JSON), and cost matters.

Use Elasticsearch when: You need full-text search, compliance/audit log retention, or complex aggregations across unstructured data.

Learn More

Want to go deeper on observability — Prometheus, Grafana, Loki, OpenTelemetry, and production monitoring — KodeKloud's hands-on DevOps courses walk you through real setups with actual Kubernetes clusters. No toy examples.

If you're deploying Loki on a cloud VPS or managed Kubernetes, DigitalOcean's managed Kubernetes is one of the most cost-effective ways to get started — clean UI, simple scaling, and great docs.

Summary

Grafana Loki gives you Kubernetes-native log aggregation at a fraction of the cost of Elasticsearch. With Promtail collecting logs automatically from every pod, LogQL for powerful queries, and native Grafana integration, it completes your observability stack alongside Prometheus and Tempo.

Start with single-binary mode for small clusters, use S3 for production storage, and set a 30-day retention policy. You'll have production-grade logging running within an hour.

Grafana Loki: The Complete Log Aggregation Guide for DevOps Engineers (2026)

What Is Grafana Loki?

Loki Architecture

Installing Loki with Helm

Add the Grafana Helm repository

Create a values file for Loki

Install Loki

Installing Promtail (Log Collector)

Connecting Loki to Grafana

LogQL: Querying Logs

Basic log stream selector

Filter expressions

Parsing structured logs (JSON)

Log rate queries (metrics from logs)

Useful Loki Dashboards in Grafana

Error rate panel

Log volume by namespace

Recent errors (table panel)

Production Configuration: Using S3 as Storage

Log Retention Policy

Alerting on Logs with Loki

Loki vs Elasticsearch: When to Choose What

Learn More

Summary

Stay ahead of the curve

Related Articles

AI-Powered Kubernetes Anomaly Detection: Beyond Static Thresholds

Build an AI-Powered SLO Breach Predictor with Claude and Prometheus

Build an AI Alert Classifier for Grafana Using LLMs (2026)

Comments