🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

What Are Taints and Tolerations in Kubernetes? (2026)

Taints and tolerations control which pods can run on which nodes. Here's how they work, why you need them, and real examples for GPU nodes, spot instances, and dedicated workloads.

DevOpsBoysMay 2, 20264 min read
Share:Tweet

Taints and tolerations let you control which pods are allowed to run on which nodes. They're the way Kubernetes says "this node is reserved for special workloads only."


The Simple Explanation

Think of a taint as a "no entry" sign on a node. By default, no pods can run on a tainted node.

A toleration is like a special pass. A pod with the right toleration can ignore the taint and run on that node.


Why You Need Them

Common use cases:

  • GPU nodes should only run ML training jobs, not random web servers
  • Spot/preemptible instances should only run fault-tolerant workloads
  • Nodes with specialized hardware (fast SSD, large memory) should be reserved for specific services
  • System components like CNI plugins, logging agents, and GPU operators need to run on every node including "restricted" ones

Without taints, any pod could be scheduled on any node. A big ML training job could end up on your API server's node and starve it of resources.


How to Taint a Node

bash
# Syntax: kubectl taint nodes <node-name> <key>=<value>:<effect>
kubectl taint nodes gpu-node-1 gpu=true:NoSchedule
 
# Taint all nodes in a node group (with a label selector)
kubectl taint nodes -l node-type=gpu gpu=true:NoSchedule

The three effects:

  • NoSchedule — new pods without the toleration won't be scheduled here
  • PreferNoSchedule — Kubernetes tries to avoid scheduling here, but will if no other options
  • NoExecute — existing pods without toleration are evicted immediately; new ones aren't scheduled

How to Tolerate a Taint (Pod Side)

yaml
apiVersion: v1
kind: Pod
metadata:
  name: ml-training-job
spec:
  tolerations:
  - key: "gpu"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"
  containers:
  - name: trainer
    image: pytorch/pytorch:latest
    resources:
      limits:
        nvidia.com/gpu: "1"

This pod has a toleration that matches the taint gpu=true:NoSchedule. It can now be scheduled on the GPU node.

Operator types:

  • Equal — key AND value must match
  • Exists — only key must match (value is ignored)
yaml
# Match any taint with key "gpu" regardless of value
tolerations:
- key: "gpu"
  operator: "Exists"
  effect: "NoSchedule"

Real Example 1: GPU Nodes

bash
# Taint GPU nodes so only GPU workloads run there
kubectl taint nodes \
  gpu-node-1 gpu-node-2 gpu-node-3 \
  nvidia.com/gpu=true:NoSchedule
yaml
# GPU workload — tolerates the taint
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-inference
spec:
  template:
    spec:
      tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
      containers:
      - name: inference
        image: myml:latest
        resources:
          limits:
            nvidia.com/gpu: "1"

Without the toleration, a regular web app pod accidentally scheduled on a GPU node would waste expensive GPU hardware by never using it.


Real Example 2: Spot Instances

bash
# Mark spot instances with a taint
kubectl taint nodes spot-node-1 spot=true:NoSchedule
yaml
# Batch job that tolerates spot instances
spec:
  tolerations:
  - key: spot
    value: "true"
    operator: Equal
    effect: NoSchedule
  # Also handle sudden eviction gracefully
  terminationGracePeriodSeconds: 30

Critical services (databases, payment APIs) don't get this toleration, so they never land on spot instances that can be terminated with 2 minutes notice.


Real Example 3: System DaemonSets

System-level DaemonSets (Fluentd, node exporters, CNI plugins) need to run on every node — including tainted ones. They use broad tolerations:

yaml
# Tolerate everything — run on all nodes
tolerations:
- operator: Exists
  effect: NoSchedule
- operator: Exists
  effect: NoExecute
- operator: Exists
  effect: PreferNoSchedule

This is why when you look at system pods like kube-proxy or aws-node, they have very permissive tolerations.


Taints vs Node Affinity — What's the Difference?

People often confuse taints with node affinity. They work together but do opposite things:

Taint — repels pods FROM a node. The node says "I don't want random pods."

Node Affinity — attracts pods TO a node. The pod says "I want to run on nodes with this label."

For GPU workloads, you need both:

  • Taint GPU nodes so non-GPU pods don't land there
  • Use node affinity/selector to make sure GPU pods land on GPU nodes
yaml
spec:
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule
  nodeSelector:
    nvidia.com/gpu: "true"    # also positively select GPU nodes

Toleration alone lets the pod run on GPU nodes, but doesn't guarantee it. Node selector ensures it.


Removing a Taint

bash
# Add a minus at the end to remove the taint
kubectl taint nodes gpu-node-1 gpu=true:NoSchedule-
 
# Verify
kubectl describe node gpu-node-1 | grep Taints

Quick Reference

bash
# View taints on all nodes
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
 
# View taints on a specific node
kubectl describe node my-node | grep Taints
 
# Add a taint
kubectl taint nodes my-node key=value:NoSchedule
 
# Remove a taint
kubectl taint nodes my-node key=value:NoSchedule-
 
# Add taint to all nodes with a label
kubectl taint nodes -l node-role=worker dedicated=backend:NoSchedule
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments