Topic 12

Jobs and CronJobs

BatchScheduled

Not every workload runs forever. A Job runs one or more Pods until they complete successfully, then stops — for migrations, batch processing, and one-off tasks. A CronJob creates Jobs on a schedule, the Kubernetes answer to cron.

These are the controllers for work that has an end. The subtle parts are how they handle parallelism and retries, and the genuinely sharp edges of running cron in a distributed system.

Jobs: Run to Completion

A Job creates Pods and tracks them until a set number succeed. completions is how many successful runs you need; parallelism is how many Pods may run at once. With both at 1 you get a single task; raise parallelism for a fan-out of independent work items. The Job is complete when the required number of Pods exit successfully.

A parallel Job

apiVersion: batch/v1
kind: Job
metadata:
  name: import
spec:
  completions: 10         # need 10 successful runs
  parallelism: 3          # up to 3 at a time
  backoffLimit: 4         # give up after 4 failures
  template:
    spec:
      restartPolicy: OnFailure
      containers:
        - name: import
          image: importer:1.0

Retries and Restart Policy

Jobs must use restartPolicy: OnFailure or Never — never Always, which would loop forever and never let the Job complete. backoffLimit caps how many times the Job retries a failing Pod before it is marked failed; without thinking about it, a broken Job can churn indefinitely. A ttlSecondsAfterFinished setting cleans up finished Jobs automatically so they don't accumulate.

CronJobs: Jobs on a Schedule

A CronJob wraps a Job template with a cron schedule. Each firing creates a new Job. The schedule is evaluated in a configurable time zone, and you control what happens when runs overlap with concurrencyPolicy: Allow (the default, overlapping runs), Forbid (skip the new run if the previous is still going), or Replace (kill the old, start the new).

A CronJob

apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-report
spec:
  schedule: "0 2 * * *"        # 02:00 every day
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: report
              image: reporter:1.0

The Sharp Edges of Cron

CronJobs are best-effort, not guaranteed. If the control plane is down or busy at the scheduled moment, a run can be missed; startingDeadlineSeconds controls how late a missed run may still start. Long-running jobs with concurrencyPolicy: Allow can pile up on top of each other. And history limits matter — without successfulJobsHistoryLimit and failedJobsHistoryLimit, completed Jobs and their Pods accumulate until they clutter the namespace.

Job vs Deployment

Job — runs to completion and stops; success is defined by exit code. For batch and one-off tasks.

Deployment — runs forever, restarting Pods to maintain a count. For services that should never "finish."

Common Mistakes

Setting restartPolicy: Always on a Job, so it can never reach completion.
Omitting backoffLimit, letting a broken Job retry without bound.
Leaving concurrencyPolicy at Allow for slow jobs, so runs overlap and stack up.
Never setting history limits, so finished Jobs and Pods accumulate in the namespace.
Assuming CronJob fires are guaranteed and exactly on time, rather than best-effort with possible misses.

Best Practices

Use restartPolicy: OnFailure or Never for Jobs, and set a sensible backoffLimit.
Set ttlSecondsAfterFinished and history limits so completed work cleans itself up.
Choose concurrencyPolicy: Forbid for jobs that must not overlap.
Make job logic idempotent — a retried or duplicated run should be safe.
Set the CronJob time zone explicitly and use startingDeadlineSeconds to bound missed runs.

RelatedDeployment — the run-forever counterpartArgo Workflows — richer DAG-based batch orchestration on top of JobsCloud batch / schedulers — AWS Batch, Cloud Scheduler, EventBridge Scheduler

Knowledge Check

What restartPolicy must a Job use, and why?

OnFailure or Never — Always would restart forever and the Job could never complete
Always — so the kubelet retries any Pod that fails part-way
Any value works, because the restartPolicy field has no effect at all on how a Job behaves
Only Never is permitted; OnFailure is rejected by the API

What does concurrencyPolicy: Forbid do for a CronJob?

Skips a scheduled run if the previous run is still active
Runs every overlapping scheduled job side by side in parallel
Prevents the CronJob from ever firing more than twice total
Forbids triggering the Job manually with kubectl create

Why might a CronJob miss a scheduled run entirely?

CronJobs are best-effort; if the control plane is unavailable at the scheduled time a run can be missed
CronJobs are hard-capped at one run per calendar day, so any extra slots in the cron expression are silently dropped
Runs are only ever missed when the Job's parallelism field is set above one
The schedule field is purely advisory and is never actually enforced

You got correct