Topic 24

Service Mesh

MeshTraffic

A service mesh adds a layer of network capability above what Kubernetes provides: mutual TLS between services, fine-grained traffic control, retries and timeouts, and deep observability — without changing application code. It does this by routing all service-to-service traffic through proxies it manages.

A mesh is powerful and not free. It adds latency, resource cost, and operational complexity, and many clusters do not need one. The honest question is always whether the problems it solves are problems you actually have.

What Problems a Mesh Solves

A mesh centralizes cross-cutting network concerns that would otherwise be reimplemented in every service: encryption in transit (mTLS with automatic certificate rotation), traffic management (canary splits, retries, timeouts, circuit breaking), and observability (per-request metrics, traces, and a service dependency map). Because it works at the infrastructure layer, every service gets these uniformly, regardless of language.

Data Plane and Control Plane

Control plane

Configures the proxies and issues the certificates that make mTLS work.

Data plane

Proxies intercept every connection, carry the traffic, and gather per-request telemetry.

A mesh splits into a data plane — the proxies that actually carry traffic — and a control plane that configures them and issues certificates. The proxies intercept every connection in and out of a workload, which is how the mesh applies policy and gathers telemetry without the application participating. The control plane is where you declare intent (this route, this policy); the data plane enforces it.

Sidecar vs Ambient

The traditional model injects a proxy sidecar into every Pod (Istio's classic mode, Linkerd). It is proven and granular, but a proxy per Pod multiplies resource use and adds a hop to every call. Newer ambient or sidecar-less architectures move the proxy to a per-node component for L4 and add L7 proxies only where needed, cutting overhead. The choice is a trade-off between granularity and cost.

Model	Proxy placement	Trade-off
Sidecar	One proxy per Pod	Granular, proven; higher resource and latency cost
Ambient / sidecar-less	Per-node L4, L7 where needed	Lower overhead; newer, fewer features in places

The Cost, and When to Adopt

A mesh adds a network hop and CPU/memory for the proxies, makes debugging harder (there is now a proxy between every call), and is a substantial operational commitment. Adopt one when you have concrete needs it answers — mandatory mTLS across many services, sophisticated traffic shifting, or uniform L7 observability across a large fleet. For a handful of services, NetworkPolicy plus application-level TLS and tracing is usually enough. A mesh is a tool for scale and policy uniformity, not a default.

Sidecar vs ambient mesh

Sidecar — a proxy in every Pod — granular and battle-tested, at higher per-Pod resource and latency cost.

Ambient / sidecar-less — per-node L4 with L7 only where needed — lower overhead, newer, the emerging direction.

Common Mistakes

Adopting a mesh before having concrete problems it solves, paying the complexity for little gain.
Underestimating sidecar resource cost — a proxy per Pod adds up across a large fleet.
Assuming a mesh replaces NetworkPolicy; they operate at different layers and complement each other.
Ignoring the added debugging difficulty of a proxy sitting between every service call.
Treating mTLS as automatic everywhere without verifying the mesh actually enforces it.

Best Practices

Adopt a mesh for concrete needs — fleet-wide mTLS, advanced traffic shifting, uniform L7 observability.
Prefer ambient/sidecar-less modes to cut overhead when your features fit.
Keep NetworkPolicy for L3/L4 segmentation; let the mesh handle L7 — they layer.
Budget for the proxy resource cost and the operational learning curve before rolling out.
For a few services, start with NetworkPolicy plus app-level TLS and tracing instead of a mesh.

RelatedNetwork Policies — L3/L4 segmentation the mesh complements (Topic 23)Sidecars — the injection mechanism for the data plane (Topic 15)API gateway — north-south entry vs the mesh's east-west traffic

Knowledge Check

What does a service mesh add beyond core Kubernetes networking?

mTLS, traffic management (retries, canary), and L7 observability — without app code changes
Pod scheduling and horizontal autoscaling driven by CPU and memory pressure across the fleet
Persistent block storage and volume provisioning for stateful services
Container image building and pushing to the registry

What is the trade-off of the sidecar mesh model versus ambient?

Sidecars are granular and proven but add a proxy (and cost/latency) to every Pod; ambient lowers overhead
Sidecars are cheaper to run per Pod but offer weaker mTLS guarantees than the ambient per-node proxy model
Ambient still requires an injected proxy in every container of the Pod
There is no measurable resource difference between the two models

When is adopting a service mesh most justified?

When you need fleet-wide mTLS, advanced traffic shifting, or uniform L7 observability across many services
For any cluster running more than one Pod, since every cross-Pod call always benefits from mesh-managed routing
Whenever you already enforce any NetworkPolicy rules
Only on single-node clusters with no cross-node traffic

You got correct