Topic 14

StatefulSets

StatefulData

A StatefulSet runs Pods that need stable identity and their own persistent storage — databases, message queues, and clustered systems where the replicas are not interchangeable. Where a Deployment's Pods are anonymous and disposable, a StatefulSet's Pods are named, ordered, and each keep their own disk.

It is the right tool for stateful workloads and the wrong tool for almost everything else. Most applications should be stateless Deployments backed by an external or managed data store; a StatefulSet is for when the state genuinely lives in the Pods.

Stable Identity

StatefulSet Pods get stable, predictable names — db-0, db-1, db-2 — that persist across rescheduling. Paired with a headless Service, each Pod also gets a stable DNS name, so db-0 is always reachable at the same address even after it restarts on a new node. Clustered systems rely on this: a database replica needs to know it is replica 0 and find replica 1 at a fixed name.

Per-Pod Storage

db-0

Bound to PVC data-db-0 — its own disk.

db-1

Bound to PVC data-db-1 — its own disk.

db-2

Bound to PVC data-db-2 — the disk follows the Pod identity on reschedule.

A StatefulSet uses volumeClaimTemplates to give each Pod its own PersistentVolumeClaim. db-0 gets its own disk, db-1 another, and that disk follows the Pod identity — when db-0 reschedules, it reattaches its own volume, not a fresh one. This per-replica persistence is the core of what a StatefulSet provides and what a Deployment cannot.

A StatefulSet with per-Pod storage

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: db
spec:
  serviceName: db          # the headless Service
  replicas: 3
  selector:
    matchLabels:
      app: db
  template:
    metadata:
      labels:
        app: db
    spec:
      containers:
        - name: db
          image: postgres:17
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 20Gi

Ordered Operations

StatefulSets deploy, scale, and update in order. Pods are created 0, 1, 2 and each waits for the previous to be ready; they are removed in reverse. Updates roll one Pod at a time, and a partition in the update strategy lets you stage a rollout to only the highest-numbered Pods — useful for canarying a database upgrade. This ordering is deliberate: clustered systems often require a stable bootstrap sequence.

When Not to Use One

A StatefulSet is not a database — it is scaffolding for running one. It does not handle replication, failover, or backups; your software or an operator must. For most teams the better answer to "we need a database" is a managed service or an operator (Topic 39), not a hand-rolled StatefulSet. And note that scaling down does not delete the PersistentVolumeClaims by default — the data is kept on purpose, which surprises people expecting cleanup.

StatefulSet vs Deployment

StatefulSet — stable identity, ordered operations, per-Pod persistent storage. For stateful, clustered systems.

Deployment — anonymous, interchangeable, disposable Pods. For stateless applications — which should be most of them.

Common Mistakes

Using a StatefulSet for a stateless app because it "feels more robust" — it just adds ordering overhead.
Expecting the StatefulSet to handle replication, failover, or backups — that is the application's or an operator's job.
Forgetting the required headless Service, so Pods get no stable DNS identity.
Assuming scale-down deletes the PVCs; by default the data volumes are retained.
Ignoring the ordered rollout during operations and being surprised when Pods update one at a time.

Best Practices

Reserve StatefulSets for workloads that truly need stable identity and per-Pod storage.
Prefer a managed database or a mature operator over a hand-built StatefulSet for production data.
Always pair a StatefulSet with its headless Service for stable per-Pod DNS.
Plan backups explicitly — the StatefulSet keeps the disks, not your recovery story.
Use update partitions to canary changes to a clustered system one replica at a time.

RelatedOperators — encode the real run-a-database logic on top (Topic 39)PersistentVolumeClaims — the per-Pod storage StatefulSets provision (Topic 17)Managed databases — usually the better answer than self-running state

Knowledge Check

What does a StatefulSet provide that a Deployment does not?

Stable per-Pod identity and persistent per-Pod storage that follows each Pod
Automatic built-in database replication and failover handling between its replicas
Faster rolling updates that replace all Pods at once
Guaranteed placement of exactly one Pod per node

What happens to the PersistentVolumeClaims when you scale a StatefulSet down?

They are retained by default — the data is kept, not deleted
They are deleted immediately to free up the underlying storage
They are converted into ConfigMaps holding the same data
They are relocated into the cluster's default namespace

Why is a hand-rolled StatefulSet usually not the best way to run a production database?

It provides identity and storage but not replication, failover, or backups — a managed service or operator does
StatefulSets cannot mount persistent volumes for their Pods
StatefulSets are hard-capped at a single replica, so a database can never run more than one Pod for high availability
Databases must always be deployed as a DaemonSet instead

You got correct