Probes
Kubernetes decides whether a container is healthy and ready by running probes against it. There are three — readiness, liveness, and startup — and each gates something different. Configured well, they make deployments safe and self-healing; configured badly, they turn a healthy app into a restart loop.
Probes are deceptively simple and a frequent source of self-inflicted outages. The key is understanding precisely what each one controls, because the failure modes differ sharply.
Readiness: Gating Traffic
A readiness probe answers "can this Pod serve requests right now?" While it fails, the Pod is removed from its Service's endpoints, so no traffic is sent to it — but the Pod is not restarted. This is what makes rolling updates safe: a new Pod receives traffic only once it reports ready. Readiness is also the right tool for a Pod that is temporarily busy or waiting on a dependency — it sheds traffic without being killed.
spec: containers: - name: app image: my-app:1.0 readinessProbe: httpGet: path: /ready port: 8080 periodSeconds: 5 livenessProbe: httpGet: path: /healthz port: 8080 periodSeconds: 10 startupProbe: httpGet: path: /healthz port: 8080 failureThreshold: 30 # allow up to ~5 min to start periodSeconds: 10
Liveness: Gating Restarts
A liveness probe answers "is this container still working, or wedged?" When it fails past its threshold, the kubelet restarts the container. This recovers from deadlocks and stuck processes a crash wouldn't catch. The danger: if the liveness probe checks something it shouldn't — a slow dependency, or the same heavy logic as readiness — a transient blip restarts a perfectly healthy container, and under load every replica restarts at once. Liveness should test only that the process itself is alive.
Startup: Protecting Slow Boots
A startup probe exists for apps that take a long time to initialize. While it is running, the liveness and readiness probes are suspended, so a slow boot is not mistaken for a wedged process and killed mid-startup. Once the startup probe succeeds, the other two take over. Without it, you would have to loosen the liveness probe's timing for everyone just to accommodate startup — the startup probe lets you keep liveness tight for the running state.
Probe Types and Tuning
Probes can be HTTP GET, TCP socket, exec (run a command), or gRPC. Each has thresholds: initialDelaySeconds, periodSeconds, failureThreshold, and timeouts. Exec probes are the most expensive — they fork a process each time — so prefer HTTP or gRPC where possible. The cardinal rule is to keep liveness and readiness distinct: readiness can depend on dependencies, liveness must not.
Readiness — fails → removed from Service endpoints (no traffic), not restarted. Gates traffic.
Liveness — fails → container restarted. Gates recovery from wedged processes.
Startup — runs first; suspends the other two so a slow boot isn't killed. Gates startup.
- Pointing the liveness probe at a dependency, so a dependency blip restarts healthy containers.
- Using the same heavy check for liveness and readiness, causing restart storms under load.
- No readiness probe, so rolling updates send traffic to Pods that haven't finished starting.
- No startup probe for a slow-booting app, so liveness kills it mid-initialization.
- Expensive exec probes on a tight interval, adding load and false failures.
- Keep liveness minimal — test only that the process itself is alive, never external dependencies.
- Use readiness to gate traffic and to shed load while a Pod is busy or waiting on a dependency.
- Add a startup probe for slow-booting apps so you can keep liveness tight for the running state.
- Prefer HTTP or gRPC probes over exec; tune thresholds to the app's real behavior.
- Always define readiness before relying on rolling updates to be zero-downtime.
Knowledge Check
What happens when a readiness probe fails?
- The Pod is removed from its Service's endpoints (no traffic) but not restarted
- The container is restarted in place right away by the kubelet on that same node
- The Pod is evicted and rescheduled onto another node that has free capacity
- Nothing happens — readiness is purely advisory
Why is pointing a liveness probe at an external dependency dangerous?
- A dependency blip makes liveness fail and restarts healthy containers, potentially all at once under load
- Liveness probes are simply forbidden from making any outbound network call to an external dependency at all
- It permanently removes the Pod from the Service's endpoint list
- It silently disables the container's readiness probe
What does a startup probe protect against?
- A slow-initializing app being killed by the liveness probe before it finishes booting
- Traffic reaching the Pod before its Service DNS name resolves
- The scheduler placing the Pod on a node that lacks capacity
- The Pod exceeding its configured memory limit and being OOM-killed during a traffic spike
You got correct