Topic 03

Cluster Architecture

ArchitectureControl plane

A cluster is made of two kinds of machines that work together: a control plane that decides what should happen, and worker nodes that do the work. Everything you do goes through one component — the API server — which is what makes the whole system feel consistent.

Knowing which component does what is the difference between debugging a cluster and guessing at it. When a Pod will not schedule, when state seems wrong, when the API is slow, the cause lives in a specific place. This topic gives you the map.

Control Plane and Worker Nodes

The control plane holds the desired state, makes decisions, and runs the reconcile loops. The worker nodes run your containers and report back. In a managed cluster the provider runs the control plane for you and you only see the nodes; in a self-managed cluster you run both. Either way the split is the same, and the rule of thumb is the same: you talk to the control plane, and it makes the nodes act.

Control plane

API server · etcd · scheduler · controller-manager. Decides and stores desired state.

Worker nodes

kubelet · kube-proxy · container runtime. Run the Pods and report status.

The API Server

The API server is the front door to the cluster. Everything talks to it: kubectl, the controllers, the kubelets, and any tool you add. It exposes the Kubernetes REST API, authenticates and authorizes every request, runs admission checks, and is the only component that reads and writes etcd directly. Because all traffic funnels through one validated gateway, the cluster behaves the same no matter who is asking or where it runs.

It is also stateless and horizontally scalable. You can run several API server replicas behind a load balancer; they all read and write the same etcd. That is the first piece of a highly available control plane.

etcd: The Cluster's Memory

etcd is a consistent, distributed key-value store, and it holds all cluster state — every object you have ever created, plus its current status. If etcd is lost and not backed up, the cluster's desired state is gone. It uses the Raft consensus algorithm and needs a quorum (a majority) of members to accept writes, which is why production runs an odd number — typically three or five — so a minority failure does not stop the cluster.

etcd is sensitive to disk and network latency; slow storage under etcd shows up as a slow, flaky control plane. Treat it as the database it is: give it fast disks, monitor it, and back it up. etcd operation and backup is a full topic in the cluster-operations chapter.

Scheduler and Controllers

The scheduler watches for Pods that have no node assigned and picks one for each, based on resource requests, constraints, and policies. It does not start the Pod — it just records the decision by writing the chosen node onto the Pod. How it filters and scores nodes is its own topic in the scheduling chapter.

The controller-manager runs the controllers — the reconcile loops that are the heart of Kubernetes. The Deployment controller, the ReplicaSet controller, the node controller, and many more each watch their objects and act to close the gap between desired and actual. When you create a Deployment, it is a controller, not the API server, that brings the Pods into being.

Component	Responsibility
API server	The single front door; auth, admission, the only writer to etcd
etcd	Consistent store of all cluster state
Scheduler	Assigns unscheduled Pods to nodes
Controller-manager	Runs the reconcile loops that drive desired state

The Worker Node

Each node runs three things. The kubelet is the agent that takes the Pods assigned to its node, tells the container runtime to start them, runs their health probes, and reports status back to the API server. The container runtime (containerd or CRI-O) actually runs the containers. kube-proxy programs the node's networking so traffic to a Service reaches the right Pods. Workloads live here; the control plane only decides.

The practical consequence: you should almost never SSH to a node to fix something. If a Pod is unhealthy, you change desired state through the API and let the kubelet reconcile. Nodes are cattle — interchangeable and replaceable — not pets you nurse back to health.

Anatomy of a Request

Anatomy of a request

kubectlsubmit object

→

API serverauth, admission, store

→

etcdpersist desired state

→

controllerReplicaSet → Pods

→

schedulerassign a node

→

kubeletstart containers

Tie it together with one kubectl apply -f deployment.yaml. kubectl sends the object to the API server, which authenticates it, validates it, runs admission, and stores it in etcd. The Deployment controller sees a new Deployment and creates a ReplicaSet; the ReplicaSet controller creates Pods. The scheduler sees Pods with no node and assigns each one. The kubelet on each chosen node sees its new Pod and tells the runtime to start the containers. Status flows back up the same path. No component did more than its one job, and the loop keeps running afterward.

Control plane vs data plane

Control plane — decides and remembers: API server, etcd, scheduler, controllers. Losing it stops changes and self-healing, though running Pods keep running for a while.

Data plane (nodes) — executes: kubelet, runtime, kube-proxy. This is where your workloads actually run and serve traffic.

Common Mistakes

SSHing to a node to restart or patch a Pod instead of changing desired state through the API and letting the kubelet reconcile.
Running a single control-plane node in production, so etcd and the API server are a single point of failure.
Running an even number of etcd members, which gives no extra fault tolerance and complicates quorum.
Scheduling application workloads onto control-plane nodes and starving etcd or the API server under load.
Ignoring etcd disk latency until the whole control plane turns slow and flaky.

Best Practices

Run a highly available control plane — at least three etcd members on fast, dedicated disks — for any cluster that matters.
Back up etcd on a schedule and rehearse the restore; it is the cluster's only durable memory.
Treat nodes as replaceable: drain and replace rather than repair in place.
Keep workloads off control-plane nodes (taint them) so application load cannot degrade the control plane.
Learn the request path; most control-plane debugging is figuring out which component in that chain is stuck.

RelatedEKS / GKE / AKS — managed control planes: the provider runs the API server and etcd for youkubeadm — the tool that bootstraps a self-managed control plane (Topic 49)

Knowledge Check

Which component actually assigns a Pod to a node?

The scheduler — it watches for unassigned Pods and records a node choice on each
The kubelet on the target node, choosing itself when it has spare capacity
The API server, picking a node during its admission and validation phase
The Deployment controller, which binds each replica to a node as it creates it

Where is all cluster state stored?

etcd, a consistent key-value store that only the API server reads and writes
On each node's local disk, aggregated on demand by the API server when queried
In the scheduler's memory, persisted to disk at each checkpoint interval
In the kubelet's config files, synced across nodes by the control plane

Why is SSHing into a node to restart a Pod an anti-pattern?

Desired state lives in the API; a manual change drifts from it and the kubelet may undo or conflict with it
Nodes refuse all SSH access by default, so the connection itself cannot even be established in the first place
It permanently deletes the Pod's logs and event history from the control plane
kube-proxy intercepts and blocks all node-level shell access for security reasons

You got correct