What Are Taints and Tolerations in Kubernetes? (2026)

Taints and tolerations control which pods can run on which nodes. Here's how they work, why you need them, and real examples for GPU nodes, spot instances, and dedicated workloads.

Taints and tolerations let you control which pods are allowed to run on which nodes. They're the way Kubernetes says "this node is reserved for special workloads only."

The Simple Explanation

Think of a taint as a "no entry" sign on a node. By default, no pods can run on a tainted node.

A toleration is like a special pass. A pod with the right toleration can ignore the taint and run on that node.

Why You Need Them

Common use cases:

GPU nodes should only run ML training jobs, not random web servers
Spot/preemptible instances should only run fault-tolerant workloads
Nodes with specialized hardware (fast SSD, large memory) should be reserved for specific services
System components like CNI plugins, logging agents, and GPU operators need to run on every node including "restricted" ones

Without taints, any pod could be scheduled on any node. A big ML training job could end up on your API server's node and starve it of resources.

How to Taint a Node

bash

# Syntax: kubectl taint nodes <node-name> <key>=<value>:<effect>
kubectl taint nodes gpu-node-1 gpu=true:NoSchedule
 
# Taint all nodes in a node group (with a label selector)
kubectl taint nodes -l node-type=gpu gpu=true:NoSchedule

The three effects:

NoSchedule — new pods without the toleration won't be scheduled here
PreferNoSchedule — Kubernetes tries to avoid scheduling here, but will if no other options
NoExecute — existing pods without toleration are evicted immediately; new ones aren't scheduled

How to Tolerate a Taint (Pod Side)

yaml

apiVersion: v1
kind: Pod
metadata:
  name: ml-training-job
spec:
  tolerations:
  - key: "gpu"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"
  containers:
  - name: trainer
    image: pytorch/pytorch:latest
    resources:
      limits:
        nvidia.com/gpu: "1"

This pod has a toleration that matches the taint gpu=true:NoSchedule. It can now be scheduled on the GPU node.

Operator types:

Equal — key AND value must match
Exists — only key must match (value is ignored)

yaml

# Match any taint with key "gpu" regardless of value
tolerations:
- key: "gpu"
  operator: "Exists"
  effect: "NoSchedule"

Real Example 1: GPU Nodes

bash

# Taint GPU nodes so only GPU workloads run there
kubectl taint nodes \
  gpu-node-1 gpu-node-2 gpu-node-3 \
  nvidia.com/gpu=true:NoSchedule

yaml

# GPU workload — tolerates the taint
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-inference
spec:
  template:
    spec:
      tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
      containers:
      - name: inference
        image: myml:latest
        resources:
          limits:
            nvidia.com/gpu: "1"

Without the toleration, a regular web app pod accidentally scheduled on a GPU node would waste expensive GPU hardware by never using it.

Real Example 2: Spot Instances

bash

# Mark spot instances with a taint
kubectl taint nodes spot-node-1 spot=true:NoSchedule

yaml

# Batch job that tolerates spot instances
spec:
  tolerations:
  - key: spot
    value: "true"
    operator: Equal
    effect: NoSchedule
  # Also handle sudden eviction gracefully
  terminationGracePeriodSeconds: 30

Critical services (databases, payment APIs) don't get this toleration, so they never land on spot instances that can be terminated with 2 minutes notice.

Real Example 3: System DaemonSets

System-level DaemonSets (Fluentd, node exporters, CNI plugins) need to run on every node — including tainted ones. They use broad tolerations:

yaml

# Tolerate everything — run on all nodes
tolerations:
- operator: Exists
  effect: NoSchedule
- operator: Exists
  effect: NoExecute
- operator: Exists
  effect: PreferNoSchedule

This is why when you look at system pods like kube-proxy or aws-node, they have very permissive tolerations.

Taints vs Node Affinity — What's the Difference?

People often confuse taints with node affinity. They work together but do opposite things:

Taint — repels pods FROM a node. The node says "I don't want random pods."

Node Affinity — attracts pods TO a node. The pod says "I want to run on nodes with this label."

For GPU workloads, you need both:

Taint GPU nodes so non-GPU pods don't land there
Use node affinity/selector to make sure GPU pods land on GPU nodes

yaml

spec:
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule
  nodeSelector:
    nvidia.com/gpu: "true"    # also positively select GPU nodes

Toleration alone lets the pod run on GPU nodes, but doesn't guarantee it. Node selector ensures it.

Removing a Taint

bash

# Add a minus at the end to remove the taint
kubectl taint nodes gpu-node-1 gpu=true:NoSchedule-
 
# Verify
kubectl describe node gpu-node-1 | grep Taints

Quick Reference

bash

# View taints on all nodes
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
 
# View taints on a specific node
kubectl describe node my-node | grep Taints
 
# Add a taint
kubectl taint nodes my-node key=value:NoSchedule
 
# Remove a taint
kubectl taint nodes my-node key=value:NoSchedule-
 
# Add taint to all nodes with a label
kubectl taint nodes -l node-role=worker dedicated=backend:NoSchedule

What Are Taints and Tolerations in Kubernetes? (2026)

The Simple Explanation

Why You Need Them

How to Taint a Node

How to Tolerate a Taint (Pod Side)

Real Example 1: GPU Nodes

Real Example 2: Spot Instances

Real Example 3: System DaemonSets

Taints vs Node Affinity — What's the Difference?

Removing a Taint

Quick Reference

Stay ahead of the curve

Related Articles

Build a Kubernetes Cluster with kubeadm from Scratch (2026)

How to Build a DevOps Home Lab for Free in 2026

How to Crack the CKA Exam in 2026: Study Plan, Resources, and Tips

Comments