Events and Debugging
When something is wrong in a cluster, there is a fast, systematic path from symptom to cause using a handful of built-in tools: events, describe, logs, and exec or ephemeral debug containers. Most Kubernetes problems announce their cause if you know where to look.
The difference between minutes and hours of debugging is method, not cleverness. This topic is the first-response toolkit and a repeatable order to apply it in.
Events and describe First
Events are the cluster's running commentary — scheduling decisions, image pulls, probe failures, evictions. kubectl describe on an object shows its current state, conditions, and recent events in one view, which is where most causes are visible: "FailedScheduling: insufficient cpu," "Failed to pull image," "Liveness probe failed." Always describe before guessing. Note that events are short-lived (about an hour by default), so capture them while the problem is fresh.
Logs, Including Previous
kubectl logs shows a container's stdout; kubectl logs --previous shows the prior instance's logs, which is what you need for a crash loop — the current container may be too young to have logged the failure. Combined with the events from describe, logs usually pin down an application-level failure. For a Pod with multiple containers, specify the container.
kubectl get pod web-xyz -o wide # status, node, restarts kubectl describe pod web-xyz # events, conditions, reasons kubectl logs web-xyz --previous # crashed container's last words kubectl debug -it web-xyz \ # ephemeral container to poke around --image=busybox --target=web
Getting Inside a Container
kubectl exec runs a command in a running container — useful when the image has a shell. Many hardened or distroless images have no shell, so ephemeral debug containers (kubectl debug) attach a temporary container, sharing the target Pod's namespaces, with your choice of debugging image. This lets you inspect a minimal production container without baking debugging tools into it.
A Catalog of Common Failures
A few patterns cover most incidents, and each has a tell. CrashLoopBackOff — the container keeps exiting; check logs --previous. ImagePullBackOff — bad image name, tag, or registry credentials; check events. Pending — nothing can schedule it; describe shows insufficient resources or an unsatisfiable constraint. OOMKilled in the container status — memory limit too low. CreateContainerConfigError — a referenced ConfigMap or Secret is missing. The method is constant: describe for events and reason, logs --previous for the application's own account, then act.
describe / events — why Kubernetes did (or couldn't do) something — scheduling, pulls, probes. Start here.
logs (--previous) — what the application itself said, including the crashed instance.
exec / debug — get inside to inspect live; ephemeral debug containers work even on shell-less images.
- Guessing at causes instead of reading the events that
describealready shows. - Forgetting
--previouson a CrashLoopBackOff, so you read a container too young to have logged the failure. - Trying to
execinto a distroless image with no shell instead of using an ephemeral debug container. - Missing short-lived events because you didn't capture them while the problem was fresh.
- Ignoring the container status reason (OOMKilled, ConfigError) that names the cause directly.
- Debug describe-first: read the object's events and conditions before changing anything.
- Use
logs --previousfor crash loops to see the failing instance's output. - Use ephemeral debug containers (
kubectl debug) for hardened or distroless images. - Learn the common failure signatures so the status itself points you to the cause.
- Capture events promptly; ship them to the logging backend if you need them beyond their short TTL.
Knowledge Check
What should you check first when a Pod is misbehaving?
- kubectl describe — its events, conditions, and reasons usually name the cause
- The raw etcd contents holding the object's spec and status
- The node's BIOS and firmware boot logs read over its IPMI out-of-band console
- The Prometheus dashboard's aggregate CPU and memory panels for the namespace
Why use kubectl logs --previous on a CrashLoopBackOff?
- The current container may be too young to have logged the failure; --previous shows the crashed instance's output
- It aggregates and shows the combined logs from every Pod and sidecar container across the entire namespace all at once
- It restarts the container so the next attempt logs cleanly
- It is the only supported way to capture a container's stdout
How do you inspect a running distroless container with no shell?
- Attach an ephemeral debug container with kubectl debug, sharing the Pod's namespaces
- kubectl exec into it and launch an interactive /bin/sh session
- Rebuild the image with a shell and busybox tools baked in, then redeploy it to debug live
- Read its mounted filesystem layers directly out of etcd
You got correct