Events and Debugging
Topic 48

Events and Debugging

DebuggingOperations

When something is wrong in a cluster, there is a fast, systematic path from symptom to cause using a handful of built-in tools: events, describe, logs, and exec or ephemeral debug containers. Most Kubernetes problems announce their cause if you know where to look.

The difference between minutes and hours of debugging is method, not cleverness. This topic is the first-response toolkit and a repeatable order to apply it in.

Events and describe First

Events are the cluster's running commentary — scheduling decisions, image pulls, probe failures, evictions. kubectl describe on an object shows its current state, conditions, and recent events in one view, which is where most causes are visible: "FailedScheduling: insufficient cpu," "Failed to pull image," "Liveness probe failed." Always describe before guessing. Note that events are short-lived (about an hour by default), so capture them while the problem is fresh.

Logs, Including Previous

kubectl logs shows a container's stdout; kubectl logs --previous shows the prior instance's logs, which is what you need for a crash loop — the current container may be too young to have logged the failure. Combined with the events from describe, logs usually pin down an application-level failure. For a Pod with multiple containers, specify the container.

The first-response sequence
kubectl get pod web-xyz -o wide        # status, node, restarts
kubectl describe pod web-xyz            # events, conditions, reasons
kubectl logs web-xyz --previous         # crashed container's last words
kubectl debug -it web-xyz \             # ephemeral container to poke around
  --image=busybox --target=web

Getting Inside a Container

kubectl exec runs a command in a running container — useful when the image has a shell. Many hardened or distroless images have no shell, so ephemeral debug containers (kubectl debug) attach a temporary container, sharing the target Pod's namespaces, with your choice of debugging image. This lets you inspect a minimal production container without baking debugging tools into it.

A Catalog of Common Failures

A few patterns cover most incidents, and each has a tell. CrashLoopBackOff — the container keeps exiting; check logs --previous. ImagePullBackOff — bad image name, tag, or registry credentials; check events. Pending — nothing can schedule it; describe shows insufficient resources or an unsatisfiable constraint. OOMKilled in the container status — memory limit too low. CreateContainerConfigError — a referenced ConfigMap or Secret is missing. The method is constant: describe for events and reason, logs --previous for the application's own account, then act.

describe vs logs vs exec/debug

describe / events — why Kubernetes did (or couldn't do) something — scheduling, pulls, probes. Start here.

logs (--previous) — what the application itself said, including the crashed instance.

exec / debug — get inside to inspect live; ephemeral debug containers work even on shell-less images.

Common Mistakes
  • Guessing at causes instead of reading the events that describe already shows.
  • Forgetting --previous on a CrashLoopBackOff, so you read a container too young to have logged the failure.
  • Trying to exec into a distroless image with no shell instead of using an ephemeral debug container.
  • Missing short-lived events because you didn't capture them while the problem was fresh.
  • Ignoring the container status reason (OOMKilled, ConfigError) that names the cause directly.
Best Practices
  • Debug describe-first: read the object's events and conditions before changing anything.
  • Use logs --previous for crash loops to see the failing instance's output.
  • Use ephemeral debug containers (kubectl debug) for hardened or distroless images.
  • Learn the common failure signatures so the status itself points you to the cause.
  • Capture events promptly; ship them to the logging backend if you need them beyond their short TTL.
RelatedProbes — probe failures show up in events (Topic 29)Logging / metrics / tracing — the deeper layers behind first-response (Topics 45-47)Cloud troubleshooting consoles — managed Pod/node insight as analogs

Knowledge Check

What should you check first when a Pod is misbehaving?

  • kubectl describe — its events, conditions, and reasons usually name the cause
  • The raw etcd contents holding the object's spec and status
  • The node's BIOS and firmware boot logs read over its IPMI out-of-band console
  • The Prometheus dashboard's aggregate CPU and memory panels for the namespace

Why use kubectl logs --previous on a CrashLoopBackOff?

  • The current container may be too young to have logged the failure; --previous shows the crashed instance's output
  • It aggregates and shows the combined logs from every Pod and sidecar container across the entire namespace all at once
  • It restarts the container so the next attempt logs cleanly
  • It is the only supported way to capture a container's stdout

How do you inspect a running distroless container with no shell?

  • Attach an ephemeral debug container with kubectl debug, sharing the Pod's namespaces
  • kubectl exec into it and launch an interactive /bin/sh session
  • Rebuild the image with a shell and busybox tools baked in, then redeploy it to debug live
  • Read its mounted filesystem layers directly out of etcd

You got correct