CNI and Network Plugins
Topic 22

CNI and Network Plugins

PluginsNetworking

The Container Network Interface (CNI) is the standard Kubernetes uses to set up Pod networking, and a CNI plugin is what actually implements the flat network model — assigning each Pod an IP and arranging routes between nodes. Calico, Cilium, Flannel, and the cloud-native plugins are all CNI implementations.

The plugin is a foundational choice: it determines how traffic flows, whether NetworkPolicy is enforced, how much overhead the network adds, and how the cluster scales. It is invisible when it works and the first suspect when Pods can't talk.

What CNI Does

When the kubelet creates a Pod, it calls the CNI plugin to set up that Pod's network namespace: allocate an IP, create the virtual interface, and program routes so the Pod can reach others. The CNI spec is small and runtime-agnostic, which is why the same plugins work across container runtimes. Without a plugin installed, the kubelet has nothing to call, and Pods never get networking — the classic stuck-on-startup symptom.

Overlay vs Native Routing

Plugins take two broad approaches to cross-node traffic. An overlay encapsulates Pod packets (for example in VXLAN) and tunnels them between nodes — it works on almost any underlying network but adds encapsulation overhead. Native (or routed) networking gives Pod IPs that the underlying network routes directly, often via BGP or the cloud's own routing — lower overhead and better performance, but it requires the underlying network to cooperate.

ApproachHowTrade-off
Overlay (VXLAN)Encapsulate and tunnel between nodesWorks anywhere; encapsulation overhead
Native / routedUnderlying network routes Pod IPs directlyFaster; needs network cooperation (BGP/cloud)

The Common Plugins

Calico offers routed networking and mature NetworkPolicy enforcement. Cilium uses eBPF in the Linux kernel for high-performance networking, policy, and deep observability, and has become a popular default. Flannel is a simple overlay, easy to start with but light on policy. The big clouds also ship their own VPC-native plugins (the AWS VPC CNI, Azure CNI, GKE's dataplane) that give Pods real VPC IPs.

IPAM and Exhaustion

The plugin also handles IP address management — how Pod IPs are allocated. This is where cloud VPC-native plugins can bite: when each Pod consumes a real VPC IP, large clusters exhaust the subnet, and Pods fail to schedule for lack of addresses. Mitigations include prefix delegation, larger subnets, or overlay modes that decouple Pod IPs from the VPC. Plan IPAM for scale, because changing the CNI on a live cluster is disruptive.

Overlay vs native-routing CNI

Overlay — encapsulates traffic; runs on any network; pays an encapsulation cost. Simplest to deploy.

Native / routed — routes Pod IPs directly for lower overhead, but depends on BGP or cloud routing support.

Common Mistakes
  • Forgetting to install a CNI plugin and wondering why every Pod hangs in ContainerCreating.
  • Choosing an overlay and ignoring its throughput/latency overhead for network-heavy workloads.
  • Hitting VPC IP exhaustion with a cloud-native CNI on a large cluster and not planning IPAM.
  • Picking a plugin that does not enforce NetworkPolicy, then writing policies that are silently ignored.
  • Trying to swap the CNI on a running production cluster without planning for the disruption.
Best Practices
  • Install and verify the CNI plugin as the first step of cluster setup.
  • Choose a plugin that enforces NetworkPolicy if you intend to use policies (Topic 23).
  • Prefer native/routed or eBPF dataplanes for performance-sensitive clusters when the network supports it.
  • Plan IPAM and subnet sizing up front on VPC-native plugins to avoid IP exhaustion.
  • Treat the CNI as a long-term decision — migrating it later is disruptive.
RelatedThe network model — the contract CNI fulfills (Topic 20)Network Policies — enforced (or not) by the chosen plugin (Topic 23)Cloud CNIs — AWS VPC CNI, Azure CNI, GKE dataplane

Knowledge Check

What is the role of a CNI plugin?

  • It implements the network model — assigning each Pod an IP and routing traffic between nodes
  • It scores and schedules each Pod onto the best-fit node based on resource requests and affinity rules
  • It answers in-cluster DNS queries, resolving Service names to their ClusterIPs
  • It persists all network and cluster state to a replicated key-value store

What is the main trade-off of an overlay (VXLAN) CNI versus native routing?

  • Overlay works on almost any network but adds encapsulation overhead; native routing is faster but needs network cooperation
  • Overlay is faster on every underlay but only works on networks that already run BGP peering between all participating nodes
  • Only overlay mode can enforce NetworkPolicy, because native routing skips the iptables hooks entirely
  • There is no measurable difference in throughput or latency between the two modes

Why can a VPC-native CNI cause Pod scheduling failures on a large cluster?

  • Each Pod consumes a real VPC IP, so the subnet can run out of addresses
  • It encrypts every Pod IP before assignment, and the added crypto step slows allocation until the scheduler times out
  • It disables CoreDNS resolution for every newly created Pod
  • It supports only a single node per cluster by design

You got correct