Chapter Nine

Observability

Knowing what your cluster is doing — logging, metrics and monitoring with Prometheus, distributed tracing with OpenTelemetry, and a practical approach to events and debugging.

4 topics

A cluster you cannot see into is a cluster you cannot operate. Observability is the difference between knowing why a deployment failed and guessing.

This chapter covers the three pillars as they appear in Kubernetes — logs, metrics, and traces — with Prometheus and Grafana for metrics, OpenTelemetry for tracing, and the events-and-kubectl workflow you use to debug the failures that show up first.

Topics in This Chapter

Topic 45

Logging

Container stdout, node-level collection, and shipping to a backend. Why the cluster keeps no logs for you and what fills the gap.

LogsCollection

Topic 46

Metrics and Monitoring

The Prometheus model — scraping, PromQL, alerting — and Grafana on top. The metrics that actually predict trouble.

MetricsAlerting

Topic 47

Tracing and OpenTelemetry

Following a request across services, and the OpenTelemetry standard that unifies instrumentation. Where tracing earns its overhead.

TracingTelemetry

Topic 48

Events and Debugging

The first-response toolkit — events, describe, logs, exec — and a systematic path from a failing Pod to the root cause.

DebuggingOperations