Topic 75

Docker Swarm

OrchestrationMulti-host

Docker Swarm is Docker's built-in multi-host orchestrator: docker swarm init turns a set of Docker hosts into one cluster you drive with the same CLI you already know. It schedules services across nodes, wires them together over overlay networks — the multi-host extension of the overlay driver from Chapter 7 — and ships secrets to containers, all without installing anything beyond Docker.

It is simpler than Kubernetes and largely superseded by it, and the honest framing matters more than the feature list. Swarm is the lowest-friction step past one host for a team already all-in on Docker — and, for most teams in 2025, also a path the wider ecosystem has stopped walking.

From Single Host to Cluster

docker swarm init on one node and docker swarm join on the others forms a cluster of managers — which keep cluster state via Raft consensus — and workers that run the tasks. The same docker CLI now schedules across all of them. For a team that already lives in the Docker CLI, that is the single shortest move from one machine to several, with no new tool to learn.

The shape of a swarm

Manager nodes

keep cluster state via a Raft quorum

Worker nodes

run the work the managers schedule

Services

declared desired state, e.g. N replicas

Tasks

individual containers placed on workers

Forming a Swarm and declaring a replicated service

$ docker swarm init --advertise-addr 10.0.0.1
# prints a `docker swarm join --token ...` line to run on each worker

$ docker network create -d overlay driftwood-net
$ docker secret create db_password ./db_password.txt

$ docker service create \
    --name driftwood-web \
    --replicas 3 \
    --network driftwood-net \
    --secret db_password \
    registry.driftwood.example/driftwood/web:1.4.0

$ docker service ls

That one service create declares desired state — three replicas of driftwood/web on the overlay network, with a secret mounted at runtime — and Swarm keeps it true, rescheduling tasks onto surviving nodes if one dies. No docker run loop, no manual placement.

Services and Desired State

Instead of docker run, you declare a service with a replica count, and Swarm keeps that many tasks running, rescheduling them if a node fails. This is declarative desired-state scheduling — the same idea Kubernetes is built on — scoped to Swarm's smaller feature set. You state what should be true; the scheduler works to make it so, rather than you issuing imperative start and stop commands per container.

Overlay Networks Across Hosts

Swarm extends the overlay network driver so a service on node A reaches a service on node B by name, with built-in load balancing through a routing mesh. This is the concrete payoff of the overlay concept introduced as single-host plumbing in Chapter 7: the same VXLAN-based encapsulation, now spanning the cluster and resolving service names to a virtual IP that load-balances across the live replicas.

Secrets and Configs, Built In

docker secret and docker config distribute sensitive files and configuration to the tasks that need them, mounted in at runtime — the multi-host version of the Compose and runtime secrets from Chapter 10, with no external system to stand up. The secret is encrypted in the Raft store and delivered only to nodes running a task that requested it, never baked into the service image.

The Honest Take — When (and When Not)

Swarm is genuinely simpler than Kubernetes and reasonable for a small cluster run by a team already on Docker Compose. But the ecosystem, the momentum, and the hiring pool have moved to Kubernetes. The operators, managed offerings, and tooling teams want to adopt increasingly target Kubernetes alone.

Choose Swarm only when its simplicity is the deciding factor and you accept being on a path most tooling no longer targets. For anything that will grow, needs broad integration, or has to be staffed long-term, starting on Kubernetes is the call — which is exactly the boundary the next topic crosses and names.

Swarm vs Kubernetes

Docker Swarm — built into Docker, learned in an afternoon by anyone who knows the Docker CLI, with overlay networking and secrets out of the box. Choose it for a small cluster of a handful of nodes where the team is already on Docker and operational simplicity beats ecosystem depth.

Kubernetes — a far larger system with self-healing, rich scheduling, a vast ecosystem, and the industry's tooling and hiring behind it. Choose it for anything that must scale, integrate broadly, or be staffed long-term. Both consume the same OCI driftwood/web image.

Common Mistakes

Choosing Swarm in 2025 and later for a workload that will grow or needs broad ecosystem integration, then hitting the wall where the tools, operators, and hires you want only target Kubernetes.
Confusing docker compose (single host, one machine) with docker stack deploy (Swarm, multi-host) — a Compose file runs in Swarm mode with different semantics for networks, secrets, and replicas.
Running a single-manager Swarm in production — losing that one manager loses the cluster's Raft quorum and control plane; managers need an odd number, 3 or 5, for fault tolerance.
Assuming Swarm overlay networking needs no firewall work — the overlay and control planes use specific gossip and VXLAN ports that must be open between nodes, or services silently fail to reach each other.

Best Practices

Reach for Swarm only on a small, stable cluster where the team is already fluent in Docker and the simplicity is the deciding factor — otherwise start on Kubernetes.
Run an odd number of manager nodes, 3 for most clusters, so Raft keeps quorum through a single manager failure.
Declare services with explicit replica counts and resource limits so Swarm's scheduler places and reschedules them deterministically.
Distribute sensitive values with docker secret, never baked into the service image, keeping the Chapter 10 secrets discipline intact across the cluster.

Comparable tools Kubernetes the dominant orchestrator the ecosystem has moved to HashiCorp Nomad a simpler scheduler in Swarm's spirit but more capable Docker Compose the single-host sibling Swarm extends to many hosts

Knowledge Check

What does declaring a service in Swarm give you over single-host docker run?

Declarative desired state — a replica count Swarm keeps true, rescheduling tasks onto surviving nodes if one fails
It repackages the image into a Swarm-specific manifest format optimized for faster cluster-wide startup
It encrypts all the container-to-container traffic that plain docker run otherwise leaves in plaintext on the wire
It removes the need to run a Docker daemon on each node, since the managers run every task remotely

How do Swarm overlay networks relate to the overlay driver from Chapter 7?

They extend the same overlay driver across hosts, so a service reaches another by name with built-in load balancing
They replace the overlay driver entirely with a wholly different cross-host tunneling mechanism built only for Swarm clusters
They require every container to be assigned its own public, internet-routable IP address before it can talk to peers
They only work on a single host, confined to one machine exactly like the default bridge network is

Why must a production Swarm run an odd number of manager nodes?

Managers keep cluster state via Raft, which needs a quorum — an odd count keeps quorum through a single manager failure
An odd count lets Swarm distribute scheduled tasks evenly across the worker nodes without ever leaving any single worker sitting idle
An odd number is required to balance a service's replicas evenly across every node in the cluster
The cluster overlay network only forms and stays routable when there is an odd number of manager nodes present

When is Swarm the right call rather than a dead end?

On a small, stable cluster where a Docker-fluent team values simplicity over the broader Kubernetes ecosystem
For any workload expected to grow large and integrate broadly with the wider tooling and operator ecosystem
When you want the largest possible hiring pool and the deepest catalog of third-party operators and managed offerings
Whenever you need the image itself to stay portable to other container runtimes and orchestrators later

You got correct