🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

Kubernetes CronJob Running Duplicate or Concurrent Jobs: How to Fix It

Kubernetes CronJob running the same job multiple times? Getting duplicate executions or jobs running concurrently when they shouldn't? Here are the fixes.

DevOpsBoys5 min read
Share:Tweet

Duplicate CronJob executions are one of the more insidious Kubernetes bugs — your job runs twice, data gets processed twice, and errors cascade. Here's how to diagnose and fix it.

Why CronJobs Duplicate

Reason 1: concurrencyPolicy Allows It (Default Behavior)

By default, concurrencyPolicy: Allow means if a job hasn't finished when the next schedule fires, both run simultaneously.

If your daily backup job takes 2 hours and the schedule is 0 2 * * * (2 AM), it finishes at 4 AM. But the next day at 2 AM, a new job starts while potentially the previous day's slow run is still running from a controller restart.

Reason 2: Controller Restarts

When the Kubernetes controller-manager restarts, it can re-evaluate missed or in-progress jobs and fire them again.

Reason 3: Multiple Scheduler Instances

In HA clusters with multiple controller-manager instances, race conditions can cause duplicate job creation.

Fix 1: Set concurrencyPolicy: Forbid

This prevents a new job from starting if the previous one hasn't finished:

yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: daily-backup
spec:
  schedule: "0 2 * * *"
  concurrencyPolicy: Forbid    # ← key fix
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: backup
              image: backup-tool:latest

With Forbid, if the 2 AM job is still running at 2 AM the next day, the new trigger is skipped (not queued — skipped entirely).

If you want the new job to wait until the old one finishes instead of being skipped:

yaml
  concurrencyPolicy: Replace  # kills the running job and starts a new one

The three options:

  • Allow (default) — multiple concurrent executions allowed
  • Forbid — skip new execution if previous is still running
  • Replace — cancel the running job and start a new one

Fix 2: Add Idempotency to Your Job

Even with Forbid, controller restarts can cause double execution. Make your job idempotent so running it twice has the same effect as once.

python
# Python example: check if already processed before doing work
import redis
import os
 
redis_client = redis.Redis(host="redis")
 
def main():
    job_date = os.environ.get("JOB_DATE", datetime.now().strftime("%Y-%m-%d"))
    lock_key = f"daily-backup:lock:{job_date}"
    
    # Try to acquire lock (expires in 24 hours)
    acquired = redis_client.set(lock_key, "running", nx=True, ex=86400)
    
    if not acquired:
        print(f"Job for {job_date} already running or completed. Skipping.")
        return
    
    try:
        run_backup(job_date)
        redis_client.set(lock_key, "completed", ex=86400)
    except Exception as e:
        redis_client.delete(lock_key)  # allow retry on failure
        raise

Pass the date as an environment variable:

yaml
jobTemplate:
  spec:
    template:
      spec:
        containers:
          - name: backup
            image: backup-tool:latest
            env:
              - name: JOB_DATE
                value: "$(date +%Y-%m-%d)"  # set at schedule time

Fix 3: Check for Stuck/Zombie Jobs

Sometimes jobs appear "running" but the pods are gone. Kubernetes still counts them:

bash
# List all running Jobs from this CronJob
kubectl get jobs -l app=daily-backup -n production
 
# Check if pods are actually running
kubectl get pods -l job-name=daily-backup-12345 -n production
 
# Delete a stuck job manually (won't affect the CronJob schedule)
kubectl delete job daily-backup-12345 -n production

Fix 4: startingDeadlineSeconds

If your cluster was down or the controller restarted and missed several schedules, Kubernetes may try to catch up and run multiple missed jobs. Control this with startingDeadlineSeconds:

yaml
spec:
  schedule: "0 2 * * *"
  startingDeadlineSeconds: 3600   # only allow starting within 1 hour of schedule time
  concurrencyPolicy: Forbid

Without startingDeadlineSeconds, if your cluster was down for 3 days and comes back up, Kubernetes will try to run 3 missed daily jobs in rapid succession. With startingDeadlineSeconds: 3600, a missed job is skipped if more than 1 hour has passed since its scheduled time.

Important: If you set startingDeadlineSeconds to a value smaller than your typical start-up time, jobs will never run. Use a value slightly larger than schedule interval for most cases.

Fix 5: Add Job-Level Uniqueness with Labels

If you're seeing two identical jobs from the same CronJob trigger (rare but possible in some cluster configurations), use a unique label per execution:

yaml
jobTemplate:
  metadata:
    labels:
      cronjob-name: daily-backup
  spec:
    # activeDeadlineSeconds kills the job if it runs too long
    activeDeadlineSeconds: 7200  # 2 hours max
    template:
      spec:
        restartPolicy: OnFailure
        containers:
          - name: backup
            image: backup-tool:latest

Debugging Current State

bash
# See recent job history
kubectl get jobs -l app=daily-backup --sort-by=.metadata.creationTimestamp
 
# Check if CronJob is firing on schedule
kubectl describe cronjob daily-backup | grep -A 20 "Events"
 
# See the last schedule time and next schedule
kubectl get cronjob daily-backup -o jsonpath='{.status.lastScheduleTime}'
kubectl get cronjob daily-backup -o jsonpath='{.spec.schedule}'
 
# Check for currently active jobs
kubectl get cronjob daily-backup -o jsonpath='{.status.active}'

Complete Fixed CronJob Spec

yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: daily-backup
  namespace: production
spec:
  schedule: "0 2 * * *"
  concurrencyPolicy: Forbid        # prevent concurrent runs
  startingDeadlineSeconds: 3600    # skip if can't start within 1 hour
  successfulJobsHistoryLimit: 7    # keep 1 week of successful job history
  failedJobsHistoryLimit: 3        # keep 3 failed job records
  jobTemplate:
    spec:
      activeDeadlineSeconds: 7200  # kill job if it runs > 2 hours (stuck protection)
      backoffLimit: 2              # retry failed pods up to 2 times
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: backup
              image: backup-tool:v2.1.0
              resources:
                requests:
                  memory: "256Mi"
                  cpu: "100m"
                limits:
                  memory: "512Mi"
                  cpu: "500m"

Monitoring CronJobs

Add Prometheus alerts for CronJob health:

yaml
- alert: CronJobNotRunning
  expr: time() - kube_cronjob_status_last_schedule_time{cronjob="daily-backup"} > 90000
  annotations:
    summary: "CronJob daily-backup hasn't run in 25 hours"
 
- alert: CronJobFailed
  expr: kube_job_status_failed > 0
  for: 5m
  annotations:
    summary: "CronJob {{ $labels.job_name }} has failed pods"

Key takeaway: concurrencyPolicy: Forbid + startingDeadlineSeconds + idempotent job logic = reliable CronJob execution.

🔧

Today I Fixed

Short real fixes from production — posted daily

Browse fixes
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments