Kubernetes Job / CronJob Not Completing — Causes and Fixes (2026)
Kubernetes Job stuck in Running, CronJob never triggering, or pods completing but Job shows Failed? Here are the real causes and fixes for each scenario.
Jobs and CronJobs behave differently from regular Deployments. When they break, the error messages are often misleading. Here's every common failure mode and how to fix it.
Quick Diagnosis
# Check Job status
kubectl get jobs -n my-namespace
kubectl describe job my-job -n my-namespace
# Check pods created by the Job
kubectl get pods -n my-namespace --selector=job-name=my-job
# Check CronJob status
kubectl get cronjobs -n my-namespace
kubectl describe cronjob my-cronjob -n my-namespace
# Last 3 jobs created by CronJob
kubectl get jobs -n my-namespace --selector=cronjob=my-cronjobCause 1: Pod Exits with Non-Zero Code → Job Marked Failed
A Job succeeds only when its pod exits with code 0. Any other exit code = failure.
Symptom:
kubectl get jobs
# NAME COMPLETIONS DURATION AGE
# my-job 0/1 5m 5m ← never completesCheck pod logs:
kubectl logs -n my-namespace -l job-name=my-job --previous
# Error: connection refused to postgres:5432Fix: Find the real error in logs. Common causes:
- App bug (unhandled exception)
- Missing environment variable
- Database/dependency not ready
# Ensure exit 0 in your script
command: ["sh", "-c", "python run_job.py && exit 0"]Cause 2: backoffLimit Exhausted — Job Keeps Retrying
By default backoffLimit: 6 — Job retries 6 times before marking itself Failed.
kubectl describe job my-job
# Warning BackoffLimitExceeded Job has reached the specified backoff limitFix 1: Increase backoffLimit for transient failures:
spec:
backoffLimit: 3 # retry 3 times
template:
spec:
restartPolicy: Never # IMPORTANT: Never or OnFailure, not AlwaysFix 2: Fix the underlying error — retrying a broken job wastes resources.
Fix 3: For jobs that should retry on failure but not indefinitely:
spec:
backoffLimit: 5
activeDeadlineSeconds: 300 # kill after 5 minutes regardlessCause 3: Wrong restartPolicy
Jobs require restartPolicy: Never or restartPolicy: OnFailure. Using Always (Deployment default) causes an error.
kubectl apply -f job.yaml
# Error: Job.batch "my-job" is invalid:
# spec.template.spec.restartPolicy: Unsupported value: "Always"Fix:
spec:
template:
spec:
restartPolicy: Never # pod won't restart — Job creates new pod
# OR
restartPolicy: OnFailure # pod restarts in-place (cheaper)Use Never when you need to inspect failed pod logs. Use OnFailure for simple retry behavior.
Cause 4: CronJob Suspended
kubectl get cronjob my-cronjob
# NAME SCHEDULE SUSPEND ACTIVE
# my-cronjob */5 * * * * True 0 ← suspended!Fix:
kubectl patch cronjob my-cronjob -p '{"spec":{"suspend":false}}'Cause 5: CronJob Timezone Issues
CronJob schedules run in the cluster timezone (UTC by default), not your local timezone.
Symptom: Job triggers at wrong time.
# Check cluster timezone
kubectl get nodes -o jsonpath='{.items[0].status.nodeInfo.osImage}'Fix: Set explicit timezone (Kubernetes 1.27+):
spec:
schedule: "0 9 * * *"
timeZone: "Asia/Kolkata" # IST — runs at 9 AM ISTFor older clusters, convert manually: 9 AM IST = 3:30 AM UTC → schedule: "30 3 * * *"
Cause 6: startingDeadlineSeconds Missed
If a CronJob misses its scheduled time (cluster was down, CronJob was suspended), it uses startingDeadlineSeconds to decide whether to catch up.
spec:
schedule: "0 * * * *"
startingDeadlineSeconds: 300 # only start if within 5 minutes of scheduled time
# if cluster was down for 2 hours, jobs for those hours are SKIPPEDSymptom: Jobs missing after cluster downtime.
Fix: Set startingDeadlineSeconds based on your tolerance:
null(default): catch up on all missed runs (dangerous — can create many pods)300: skip if more than 5 minutes late- For critical jobs:
0means never skip, always catch up
Cause 7: concurrencyPolicy — Jobs Overlapping or Being Skipped
spec:
concurrencyPolicy: Forbid # skip new run if previous is still running
# Allow → run multiple jobs concurrently (default)
# Replace → kill old job, start new oneSymptom with Forbid: CronJob scheduled but not running:
kubectl describe cronjob my-cronjob
# Warning: Cannot determine if job needs to be started. Too many missed start times.Fix: Use Replace if old job hangs:
spec:
concurrencyPolicy: Replace
activeDeadlineSeconds: 600 # kill job after 10 minutesCause 8: Job Completes but Pods Are Deleted Too Fast
By default, completed Job pods are kept for debugging. But if ttlSecondsAfterFinished is set too low:
spec:
ttlSecondsAfterFinished: 10 # pods deleted 10s after job finishesYou run kubectl logs and get "pod not found."
Fix: Increase TTL or remove it:
spec:
ttlSecondsAfterFinished: 3600 # keep pods for 1 hour after completionCause 9: Parallel Jobs — Not Enough Completions
For parallel jobs, completions and parallelism must be set correctly:
spec:
completions: 10 # need 10 successful pods total
parallelism: 3 # run 3 at a time
backoffLimit: 5Symptom: Job stuck at "5/10" completions — some pods failing, consuming backoff retries.
Fix: Check which pods are failing: kubectl get pods -l job-name=my-job and look for Error/OOMKilled pods.
Full Working Job Example
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration
spec:
backoffLimit: 3
activeDeadlineSeconds: 600 # fail after 10 min
ttlSecondsAfterFinished: 3600 # keep pods 1 hour
template:
spec:
restartPolicy: Never
containers:
- name: migrate
image: my-app:v1.2
command: ["python", "manage.py", "migrate"]
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: app-secrets
key: DATABASE_URL
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"Full Working CronJob Example
apiVersion: batch/v1
kind: CronJob
metadata:
name: nightly-cleanup
spec:
schedule: "0 0 * * *"
timeZone: "Asia/Kolkata"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
startingDeadlineSeconds: 300
jobTemplate:
spec:
backoffLimit: 2
activeDeadlineSeconds: 1800
ttlSecondsAfterFinished: 86400
template:
spec:
restartPolicy: OnFailure
containers:
- name: cleanup
image: my-cleanup:latest
command: ["python", "cleanup.py"]Debug Checklist
# Job not completing?
kubectl describe job <name> # check Events and Status
kubectl logs -l job-name=<name> # app errors
kubectl get pods -l job-name=<name> # pod states
# CronJob not triggering?
kubectl describe cronjob <name> # check SUSPEND, last schedule
kubectl get events --field-selector reason=FailedCreate
# Check CronJob history
kubectl get jobs --selector=cronjob=<name>Resources
- KodeKloud Kubernetes Course — Job and CronJob hands-on labs
- CKA Exam on Udemy — Jobs are tested on CKA
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AWS EKS Pods Stuck in Pending State: Causes and Fixes
Pods stuck in Pending on EKS are caused by a handful of known issues — insufficient node capacity, taint mismatches, PVC problems, and more. Here's how to diagnose and fix each one.
cert-manager Certificate Not Ready: Causes and Fixes
cert-manager Certificate stuck in a non-Ready state is a common Kubernetes TLS issue. This guide covers every root cause — DNS challenges, RBAC, rate limits, and issuer problems — with step-by-step fixes.
CI/CD Pipeline Is Broken: How to Debug and Fix GitHub Actions, Jenkins & ArgoCD Failures (2026)
Your CI/CD pipeline failed and you don't know why. This complete debugging guide covers GitHub Actions, Jenkins, and ArgoCD failures with real error messages and step-by-step fixes.