Chapter Nine
Observability
Knowing what your cluster is doing — logging, metrics and monitoring with Prometheus, distributed tracing with OpenTelemetry, and a practical approach to events and debugging.
A cluster you cannot see into is a cluster you cannot operate. Observability is the difference between knowing why a deployment failed and guessing.
This chapter covers the three pillars as they appear in Kubernetes — logs, metrics, and traces — with Prometheus and Grafana for metrics, OpenTelemetry for tracing, and the events-and-kubectl workflow you use to debug the failures that show up first.
Topics in This Chapter
Topic 45
Logging
Container stdout, node-level collection, and shipping to a backend. Why the cluster keeps no logs for you and what fills the gap.
Topic 46
Metrics and Monitoring
The Prometheus model — scraping, PromQL, alerting — and Grafana on top. The metrics that actually predict trouble.
Topic 47
Tracing and OpenTelemetry
Following a request across services, and the OpenTelemetry standard that unifies instrumentation. Where tracing earns its overhead.
Topic 48
Events and Debugging
The first-response toolkit — events, describe, logs, exec — and a systematic path from a failing Pod to the root cause.