All Articles

Grafana Loki: The Complete Log Aggregation Guide for DevOps Engineers (2026)

Grafana Loki is the Prometheus-inspired log aggregation system built for Kubernetes. This guide covers architecture, installation, LogQL queries, and production best practices.

DevOpsBoysMar 14, 20266 min read
Share:Tweet

If you're already using Prometheus for metrics and Grafana for dashboards, Loki is the missing piece of your observability stack.

Loki is Grafana's log aggregation system — designed to work exactly like Prometheus but for logs. Instead of scraping metrics, it collects log streams and indexes only their labels (not their content), which makes it dramatically cheaper to run than Elasticsearch.

This guide covers everything: what Loki is, how it works, how to deploy it in Kubernetes, and how to query logs effectively with LogQL.


What Is Grafana Loki?

Loki was built at Grafana Labs in 2018 with a specific design philosophy: index the labels, not the log content.

Traditional log aggregation systems like Elasticsearch index every word in every log line. That makes search fast but storage expensive — often 10–20x the raw log volume.

Loki only indexes metadata labels (like app=nginx, namespace=production). The actual log content is stored compressed and queried on demand. This makes Loki:

  • 10x cheaper to run than Elasticsearch at scale
  • Native to Kubernetes — Pod labels become log labels automatically
  • Unified with Grafana — switch between metrics and logs in the same dashboard
  • Prometheus-compatible — same label model, familiar query language

Loki Architecture

Understanding Loki's components helps you deploy and debug it effectively.

┌─────────────────────────────────────────────────────────┐
│                    Loki Architecture                     │
│                                                         │
│  ┌──────────┐    ┌────────────┐    ┌─────────────────┐  │
│  │ Promtail │───▶│  Distributor│───▶│    Ingester     │  │
│  │ (agent)  │    │            │    │  (write buffer) │  │
│  └──────────┘    └────────────┘    └────────┬────────┘  │
│                                             │           │
│  ┌──────────┐    ┌────────────┐    ┌────────▼────────┐  │
│  │ Grafana  │◀───│  Querier   │◀───│  Object Store   │  │
│  │          │    │            │    │  (S3/GCS/Azure) │  │
│  └──────────┘    └────────────┘    └─────────────────┘  │
└─────────────────────────────────────────────────────────┘

Key components:

ComponentRole
PromtailAgent that runs on each node, tails log files, and pushes to Loki
DistributorReceives log streams, validates, and routes to ingesters
IngesterBuffers logs in memory and flushes to object storage
QuerierExecutes LogQL queries against object storage
Object StoreS3, GCS, Azure Blob, or local filesystem for log chunks

Installing Loki with Helm

The recommended way to run Loki in Kubernetes is with the official Helm chart.

Add the Grafana Helm repository

bash
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Create a values file for Loki

yaml
# loki-values.yaml
loki:
  auth_enabled: false
  commonConfig:
    replication_factor: 1
  storage:
    type: filesystem    # use s3 in production
 
# Single binary mode (good for small clusters)
deploymentMode: SingleBinary
 
singleBinary:
  replicas: 1
  resources:
    requests:
      cpu: 200m
      memory: 256Mi
    limits:
      cpu: 1
      memory: 1Gi
 
# Disable components not needed in single binary mode
read:
  replicas: 0
write:
  replicas: 0
backend:
  replicas: 0

Install Loki

bash
helm install loki grafana/loki \
  --namespace monitoring \
  --create-namespace \
  -f loki-values.yaml

Installing Promtail (Log Collector)

Promtail runs as a DaemonSet on every node, tailing pod logs and shipping them to Loki.

yaml
# promtail-values.yaml
config:
  clients:
    - url: http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push
 
  snippets:
    pipelineStages:
      - cri: {}    # parse CRI log format (Kubernetes default)
bash
helm install promtail grafana/promtail \
  --namespace monitoring \
  -f promtail-values.yaml

Verify Promtail is running on all nodes:

bash
kubectl get daemonset promtail -n monitoring
kubectl logs -n monitoring -l app.kubernetes.io/name=promtail --tail=20

Connecting Loki to Grafana

Add Loki as a data source in Grafana:

  1. Open Grafana → Configuration → Data Sources → Add data source
  2. Select Loki
  3. Set URL: http://loki.monitoring.svc.cluster.local:3100
  4. Click Save & Test

Or use a ConfigMap to provision it automatically:

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasources
  namespace: monitoring
  labels:
    grafana_datasource: "1"
data:
  loki.yaml: |
    apiVersion: 1
    datasources:
      - name: Loki
        type: loki
        access: proxy
        url: http://loki.monitoring.svc.cluster.local:3100
        isDefault: false
        jsonData:
          maxLines: 1000

LogQL: Querying Logs

LogQL is Loki's query language — it's similar to PromQL but designed for log streams.

Basic log stream selector

logql
# All logs from the nginx app in production
{app="nginx", namespace="production"}
 
# All logs from any pod with the error label
{level="error"}
 
# Logs from a specific container
{container="api-server", namespace="default"}

Filter expressions

logql
# Lines containing "error"
{app="nginx"} |= "error"
 
# Lines NOT containing "health"
{app="api"} != "/health"
 
# Lines matching a regex
{app="backend"} |~ "status=5[0-9]{2}"
 
# Multiple filters chained
{app="nginx", namespace="production"} |= "error" != "404"

Parsing structured logs (JSON)

Most modern apps log in JSON. Loki can parse it:

logql
# Parse JSON and filter by field
{app="api"} | json | level="error"
 
# Extract specific fields
{app="api"} | json | line_format "{{.level}}: {{.message}}"
 
# Filter by parsed field
{app="backend"} | json | status_code >= 500

Log rate queries (metrics from logs)

logql
# Error rate per minute
rate({app="api"} |= "error" [1m])
 
# Count of 5xx errors per pod
sum by (pod) (
  count_over_time({app="nginx"} | json | status >= 500 [5m])
)

Useful Loki Dashboards in Grafana

Error rate panel

logql
sum(rate({namespace="production"} |= "error" [5m])) by (app)

Use this as a time-series panel to see which app generates the most errors.

Log volume by namespace

logql
sum(rate({namespace=~".+"} [5m])) by (namespace)

Recent errors (table panel)

logql
{namespace="production"} |= "error" | json | line_format "{{.timestamp}} [{{.app}}] {{.message}}"

Production Configuration: Using S3 as Storage

For production, use object storage instead of local filesystem:

yaml
# loki-production-values.yaml
loki:
  auth_enabled: true
  storage:
    type: s3
    s3:
      region: us-east-1
      bucketnames: my-loki-logs
      s3forcepathstyle: false
  storageConfig:
    aws:
      s3: s3://my-loki-logs
      region: us-east-1
    boltdb_shipper:
      active_index_directory: /loki/index
      shared_store: s3
      cache_location: /loki/cache
    filesystem:
      directory: /loki/chunks
 
serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/loki-s3-role

Create the IAM policy for Loki to write to S3:

json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-loki-logs",
        "arn:aws:s3:::my-loki-logs/*"
      ]
    }
  ]
}

Log Retention Policy

Set retention so old logs are automatically deleted:

yaml
loki:
  limits_config:
    retention_period: 30d    # delete logs older than 30 days
 
  compactor:
    retention_enabled: true
    working_directory: /loki/compactor

Alerting on Logs with Loki

Loki supports alerting rules similar to Prometheus:

yaml
# Create a PrometheusRule for Loki alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: loki-alerts
  namespace: monitoring
spec:
  groups:
    - name: loki-log-alerts
      interval: 1m
      rules:
        - alert: HighErrorRate
          expr: |
            sum(rate({namespace="production"} |= "error" [5m])) > 10
          for: 2m
          labels:
            severity: warning
          annotations:
            summary: "High error rate detected in production logs"
            description: "More than 10 errors/sec for 2 minutes"
 
        - alert: CriticalException
          expr: |
            count_over_time({namespace="production"} |= "CRITICAL" [1m]) > 0
          for: 0m
          labels:
            severity: critical
          annotations:
            summary: "Critical exception in production"

Loki vs Elasticsearch: When to Choose What

FactorLokiElasticsearch
CostLow (label index only)High (full-text index)
Search flexibilityLabel + filter basedFull-text search
Setup complexitySimpleComplex
Kubernetes-nativeYesNo (needs config)
Full-text searchNoYes
Good forKubernetes logs, structured logsAudit logs, compliance, complex search

Use Loki when: You're on Kubernetes, your logs are structured (JSON), and cost matters.

Use Elasticsearch when: You need full-text search, compliance/audit log retention, or complex aggregations across unstructured data.


Learn More

Want to go deeper on observability — Prometheus, Grafana, Loki, OpenTelemetry, and production monitoring — KodeKloud's hands-on DevOps courses walk you through real setups with actual Kubernetes clusters. No toy examples.

If you're deploying Loki on a cloud VPS or managed Kubernetes, DigitalOcean's managed Kubernetes is one of the most cost-effective ways to get started — clean UI, simple scaling, and great docs.


Summary

Grafana Loki gives you Kubernetes-native log aggregation at a fraction of the cost of Elasticsearch. With Promtail collecting logs automatically from every pod, LogQL for powerful queries, and native Grafana integration, it completes your observability stack alongside Prometheus and Tempo.

Start with single-binary mode for small clusters, use S3 for production storage, and set a 30-day retention policy. You'll have production-grade logging running within an hour.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments