Kubernetes Job / CronJob Not Completing — Causes and Fixes (2026)

Kubernetes Job stuck in Running, CronJob never triggering, or pods completing but Job shows Failed? Here are the real causes and fixes for each scenario.

Jobs and CronJobs behave differently from regular Deployments. When they break, the error messages are often misleading. Here's every common failure mode and how to fix it.

Quick Diagnosis

bash

# Check Job status
kubectl get jobs -n my-namespace
kubectl describe job my-job -n my-namespace
 
# Check pods created by the Job
kubectl get pods -n my-namespace --selector=job-name=my-job
 
# Check CronJob status
kubectl get cronjobs -n my-namespace
kubectl describe cronjob my-cronjob -n my-namespace
 
# Last 3 jobs created by CronJob
kubectl get jobs -n my-namespace --selector=cronjob=my-cronjob

Cause 1: Pod Exits with Non-Zero Code → Job Marked Failed

A Job succeeds only when its pod exits with code 0. Any other exit code = failure.

Symptom:

bash

kubectl get jobs
# NAME      COMPLETIONS   DURATION   AGE
# my-job    0/1           5m         5m   ← never completes

Check pod logs:

bash

kubectl logs -n my-namespace -l job-name=my-job --previous
# Error: connection refused to postgres:5432

Fix: Find the real error in logs. Common causes:

App bug (unhandled exception)
Missing environment variable
Database/dependency not ready

yaml

# Ensure exit 0 in your script
command: ["sh", "-c", "python run_job.py && exit 0"]

Cause 2: backoffLimit Exhausted — Job Keeps Retrying

By default backoffLimit: 6 — Job retries 6 times before marking itself Failed.

bash

kubectl describe job my-job
# Warning  BackoffLimitExceeded  Job has reached the specified backoff limit

Fix 1: Increase backoffLimit for transient failures:

yaml

spec:
  backoffLimit: 3     # retry 3 times
  template:
    spec:
      restartPolicy: Never   # IMPORTANT: Never or OnFailure, not Always

Fix 2: Fix the underlying error — retrying a broken job wastes resources.

Fix 3: For jobs that should retry on failure but not indefinitely:

yaml

spec:
  backoffLimit: 5
  activeDeadlineSeconds: 300   # kill after 5 minutes regardless

Cause 3: Wrong restartPolicy

Jobs require restartPolicy: Never or restartPolicy: OnFailure. Using Always (Deployment default) causes an error.

bash

kubectl apply -f job.yaml
# Error: Job.batch "my-job" is invalid: 
# spec.template.spec.restartPolicy: Unsupported value: "Always"

Fix:

yaml

spec:
  template:
    spec:
      restartPolicy: Never      # pod won't restart — Job creates new pod
      # OR
      restartPolicy: OnFailure  # pod restarts in-place (cheaper)

Use Never when you need to inspect failed pod logs. Use OnFailure for simple retry behavior.

Cause 4: CronJob Suspended

bash

kubectl get cronjob my-cronjob
# NAME         SCHEDULE    SUSPEND   ACTIVE
# my-cronjob   */5 * * * * True      0      ← suspended!

Fix:

bash

kubectl patch cronjob my-cronjob -p '{"spec":{"suspend":false}}'

Cause 5: CronJob Timezone Issues

CronJob schedules run in the cluster timezone (UTC by default), not your local timezone.

Symptom: Job triggers at wrong time.

bash

# Check cluster timezone
kubectl get nodes -o jsonpath='{.items[0].status.nodeInfo.osImage}'

Fix: Set explicit timezone (Kubernetes 1.27+):

yaml

spec:
  schedule: "0 9 * * *"
  timeZone: "Asia/Kolkata"     # IST — runs at 9 AM IST

For older clusters, convert manually: 9 AM IST = 3:30 AM UTC → schedule: "30 3 * * *"

Cause 6: startingDeadlineSeconds Missed

If a CronJob misses its scheduled time (cluster was down, CronJob was suspended), it uses startingDeadlineSeconds to decide whether to catch up.

yaml

spec:
  schedule: "0 * * * *"
  startingDeadlineSeconds: 300    # only start if within 5 minutes of scheduled time
  # if cluster was down for 2 hours, jobs for those hours are SKIPPED

Symptom: Jobs missing after cluster downtime.

Fix: Set startingDeadlineSeconds based on your tolerance:

null (default): catch up on all missed runs (dangerous — can create many pods)
300: skip if more than 5 minutes late
For critical jobs: 0 means never skip, always catch up

Cause 7: concurrencyPolicy — Jobs Overlapping or Being Skipped

yaml

spec:
  concurrencyPolicy: Forbid    # skip new run if previous is still running
  # Allow   → run multiple jobs concurrently (default)
  # Replace → kill old job, start new one

Symptom with Forbid: CronJob scheduled but not running:

bash

kubectl describe cronjob my-cronjob
# Warning: Cannot determine if job needs to be started. Too many missed start times.

Fix: Use Replace if old job hangs:

yaml

spec:
  concurrencyPolicy: Replace
  activeDeadlineSeconds: 600   # kill job after 10 minutes

Cause 8: Job Completes but Pods Are Deleted Too Fast

By default, completed Job pods are kept for debugging. But if ttlSecondsAfterFinished is set too low:

yaml

spec:
  ttlSecondsAfterFinished: 10   # pods deleted 10s after job finishes

You run kubectl logs and get "pod not found."

Fix: Increase TTL or remove it:

yaml

spec:
  ttlSecondsAfterFinished: 3600   # keep pods for 1 hour after completion

Cause 9: Parallel Jobs — Not Enough Completions

For parallel jobs, completions and parallelism must be set correctly:

yaml

spec:
  completions: 10      # need 10 successful pods total
  parallelism: 3       # run 3 at a time
  backoffLimit: 5

Symptom: Job stuck at "5/10" completions — some pods failing, consuming backoff retries.

Fix: Check which pods are failing: kubectl get pods -l job-name=my-job and look for Error/OOMKilled pods.

Full Working Job Example

yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
spec:
  backoffLimit: 3
  activeDeadlineSeconds: 600       # fail after 10 min
  ttlSecondsAfterFinished: 3600    # keep pods 1 hour
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: migrate
        image: my-app:v1.2
        command: ["python", "manage.py", "migrate"]
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: DATABASE_URL
        resources:
          requests:
            cpu: "200m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

Full Working CronJob Example

yaml

apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-cleanup
spec:
  schedule: "0 0 * * *"
  timeZone: "Asia/Kolkata"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  startingDeadlineSeconds: 300
  jobTemplate:
    spec:
      backoffLimit: 2
      activeDeadlineSeconds: 1800
      ttlSecondsAfterFinished: 86400
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: cleanup
            image: my-cleanup:latest
            command: ["python", "cleanup.py"]

Debug Checklist

bash

# Job not completing?
kubectl describe job <name>           # check Events and Status
kubectl logs -l job-name=<name>       # app errors
kubectl get pods -l job-name=<name>   # pod states
 
# CronJob not triggering?
kubectl describe cronjob <name>       # check SUSPEND, last schedule
kubectl get events --field-selector reason=FailedCreate
 
# Check CronJob history
kubectl get jobs --selector=cronjob=<name>

Resources

KodeKloud Kubernetes Course — Job and CronJob hands-on labs
CKA Exam on Udemy — Jobs are tested on CKA

Kubernetes Job / CronJob Not Completing — Causes and Fixes (2026)

Quick Diagnosis

Cause 1: Pod Exits with Non-Zero Code → Job Marked Failed

Cause 2: backoffLimit Exhausted — Job Keeps Retrying

Cause 3: Wrong restartPolicy

Cause 4: CronJob Suspended

Cause 5: CronJob Timezone Issues

Cause 6: startingDeadlineSeconds Missed

Cause 7: concurrencyPolicy — Jobs Overlapping or Being Skipped

Cause 8: Job Completes but Pods Are Deleted Too Fast

Cause 9: Parallel Jobs — Not Enough Completions

Full Working Job Example

Full Working CronJob Example

Debug Checklist

Resources

Stay ahead of the curve

Related Articles

ArgoCD App of Apps Not Syncing — Every Fix (2026)

ArgoCD Image Updater Not Syncing — Fix Guide

ArgoCD Resource Hook Failed: How to Debug and Fix It

Comments