Kubernetes CronJob Running Duplicate or Concurrent Jobs: How to Fix It
Kubernetes CronJob running the same job multiple times? Getting duplicate executions or jobs running concurrently when they shouldn't? Here are the fixes.
Duplicate CronJob executions are one of the more insidious Kubernetes bugs — your job runs twice, data gets processed twice, and errors cascade. Here's how to diagnose and fix it.
Why CronJobs Duplicate
Reason 1: concurrencyPolicy Allows It (Default Behavior)
By default, concurrencyPolicy: Allow means if a job hasn't finished when the next schedule fires, both run simultaneously.
If your daily backup job takes 2 hours and the schedule is 0 2 * * * (2 AM), it finishes at 4 AM. But the next day at 2 AM, a new job starts while potentially the previous day's slow run is still running from a controller restart.
Reason 2: Controller Restarts
When the Kubernetes controller-manager restarts, it can re-evaluate missed or in-progress jobs and fire them again.
Reason 3: Multiple Scheduler Instances
In HA clusters with multiple controller-manager instances, race conditions can cause duplicate job creation.
Fix 1: Set concurrencyPolicy: Forbid
This prevents a new job from starting if the previous one hasn't finished:
apiVersion: batch/v1
kind: CronJob
metadata:
name: daily-backup
spec:
schedule: "0 2 * * *"
concurrencyPolicy: Forbid # ← key fix
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: backup
image: backup-tool:latestWith Forbid, if the 2 AM job is still running at 2 AM the next day, the new trigger is skipped (not queued — skipped entirely).
If you want the new job to wait until the old one finishes instead of being skipped:
concurrencyPolicy: Replace # kills the running job and starts a new oneThe three options:
Allow(default) — multiple concurrent executions allowedForbid— skip new execution if previous is still runningReplace— cancel the running job and start a new one
Fix 2: Add Idempotency to Your Job
Even with Forbid, controller restarts can cause double execution. Make your job idempotent so running it twice has the same effect as once.
# Python example: check if already processed before doing work
import redis
import os
redis_client = redis.Redis(host="redis")
def main():
job_date = os.environ.get("JOB_DATE", datetime.now().strftime("%Y-%m-%d"))
lock_key = f"daily-backup:lock:{job_date}"
# Try to acquire lock (expires in 24 hours)
acquired = redis_client.set(lock_key, "running", nx=True, ex=86400)
if not acquired:
print(f"Job for {job_date} already running or completed. Skipping.")
return
try:
run_backup(job_date)
redis_client.set(lock_key, "completed", ex=86400)
except Exception as e:
redis_client.delete(lock_key) # allow retry on failure
raisePass the date as an environment variable:
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: backup-tool:latest
env:
- name: JOB_DATE
value: "$(date +%Y-%m-%d)" # set at schedule timeFix 3: Check for Stuck/Zombie Jobs
Sometimes jobs appear "running" but the pods are gone. Kubernetes still counts them:
# List all running Jobs from this CronJob
kubectl get jobs -l app=daily-backup -n production
# Check if pods are actually running
kubectl get pods -l job-name=daily-backup-12345 -n production
# Delete a stuck job manually (won't affect the CronJob schedule)
kubectl delete job daily-backup-12345 -n productionFix 4: startingDeadlineSeconds
If your cluster was down or the controller restarted and missed several schedules, Kubernetes may try to catch up and run multiple missed jobs. Control this with startingDeadlineSeconds:
spec:
schedule: "0 2 * * *"
startingDeadlineSeconds: 3600 # only allow starting within 1 hour of schedule time
concurrencyPolicy: ForbidWithout startingDeadlineSeconds, if your cluster was down for 3 days and comes back up, Kubernetes will try to run 3 missed daily jobs in rapid succession. With startingDeadlineSeconds: 3600, a missed job is skipped if more than 1 hour has passed since its scheduled time.
Important: If you set startingDeadlineSeconds to a value smaller than your typical start-up time, jobs will never run. Use a value slightly larger than schedule interval for most cases.
Fix 5: Add Job-Level Uniqueness with Labels
If you're seeing two identical jobs from the same CronJob trigger (rare but possible in some cluster configurations), use a unique label per execution:
jobTemplate:
metadata:
labels:
cronjob-name: daily-backup
spec:
# activeDeadlineSeconds kills the job if it runs too long
activeDeadlineSeconds: 7200 # 2 hours max
template:
spec:
restartPolicy: OnFailure
containers:
- name: backup
image: backup-tool:latestDebugging Current State
# See recent job history
kubectl get jobs -l app=daily-backup --sort-by=.metadata.creationTimestamp
# Check if CronJob is firing on schedule
kubectl describe cronjob daily-backup | grep -A 20 "Events"
# See the last schedule time and next schedule
kubectl get cronjob daily-backup -o jsonpath='{.status.lastScheduleTime}'
kubectl get cronjob daily-backup -o jsonpath='{.spec.schedule}'
# Check for currently active jobs
kubectl get cronjob daily-backup -o jsonpath='{.status.active}'Complete Fixed CronJob Spec
apiVersion: batch/v1
kind: CronJob
metadata:
name: daily-backup
namespace: production
spec:
schedule: "0 2 * * *"
concurrencyPolicy: Forbid # prevent concurrent runs
startingDeadlineSeconds: 3600 # skip if can't start within 1 hour
successfulJobsHistoryLimit: 7 # keep 1 week of successful job history
failedJobsHistoryLimit: 3 # keep 3 failed job records
jobTemplate:
spec:
activeDeadlineSeconds: 7200 # kill job if it runs > 2 hours (stuck protection)
backoffLimit: 2 # retry failed pods up to 2 times
template:
spec:
restartPolicy: OnFailure
containers:
- name: backup
image: backup-tool:v2.1.0
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"Monitoring CronJobs
Add Prometheus alerts for CronJob health:
- alert: CronJobNotRunning
expr: time() - kube_cronjob_status_last_schedule_time{cronjob="daily-backup"} > 90000
annotations:
summary: "CronJob daily-backup hasn't run in 25 hours"
- alert: CronJobFailed
expr: kube_job_status_failed > 0
for: 5m
annotations:
summary: "CronJob {{ $labels.job_name }} has failed pods"Key takeaway: concurrencyPolicy: Forbid + startingDeadlineSeconds + idempotent job logic = reliable CronJob execution.
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
ArgoCD App of Apps Not Syncing — Every Fix (2026)
Your ArgoCD App of Apps pattern stopped syncing. Child apps aren't created, parent shows OutOfSync, or sync is stuck. Here are every cause and the exact fix.
ArgoCD Image Updater Not Syncing — Fix Guide
ArgoCD Image Updater detects a new image tag but doesn't update the Application. Here's how to diagnose and fix annotation errors, registry auth issues, write-back problems, and sync failures.
ArgoCD Resource Hook Failed: How to Debug and Fix It
ArgoCD PreSync or PostSync hooks failing silently? Here's how to find the real error, fix hook job issues, and stop your deployments from getting stuck.