Capacity Planning and Growth
Networks that are not planned for growth fail in predictable, embarrassing ways: the subnet runs out of IPs mid-deploy, the conntrack table fills and starts dropping connections, the uplink saturates at peak, the NAT gateway exhausts its ephemeral ports. None of these is a surprise — each is a finite resource that was sized for today and quietly consumed by tomorrow. Capacity planning means sizing the address space, the state tables, and the bandwidth for where you will be, with enough headroom that growth is a config change rather than a migration.
The four resources below have very different cost curves when you hit the wall. Running low on bandwidth is a procurement problem solved in days. Running out of conntrack entries is a sysctl change. Running out of address space, after you have deployed into it, is a renumbering project measured in weeks — which is why address planning has to happen first, and with the most headroom.
Address-Space Planning
Size the CIDR for the lifetime of the deployment, not its launch. A /24 gives 254 usable hosts — fine until autoscaling, sidecars, and per-pod IPs push you past it, and now you are renumbering a live environment. The renumber is the expensive part: every static reference, peering range, firewall rule, and on-prem route keyed to the old block has to change at once, and overlapping CIDRs make peering refuse outright (CIDR planning, Topic 13).
Plan top-down from a large aggregate and hand out non-overlapping blocks with deliberate slack — reserve more than you need per region and per environment so expansion is carving from reserved space, not squeezing between live ranges. The cost of an over-sized private block is zero; the cost of an under-sized one is a renumbering project. A worked allocation, sized for growth from the start:
# one /16 per region, subnetted with room to grow — not packed tight 10.0.0.0/16 # region us-east (65,534 hosts of headroom) 10.0.0.0/20 # prod (4,094 hosts) — room for autoscale + sidecars 10.0.16.0/20 # staging (4,094 hosts) 10.0.32.0/20 # reserved — future tier, do not allocate yet 10.1.0.0/16 # region eu-west — separate aggregate, no overlap
State-Table Capacity
Connection scale is capped by state tables you rarely think about until they fill. A stateful firewall or NAT tracks every flow in a conntrack table, and when it hits nf_conntrack_max it drops new connections silently — no application error, just intermittent failures under load (conntrack saturation, Topic 49). The default is often a few hundred thousand entries, which a busy proxy or a SYN flood can exhaust in seconds.
Two related limits sit beside it. Ephemeral ports — the source-port pool a host or NAT draws from for outbound connections — number roughly 28,000 by default per destination tuple, so a NAT fronting many clients to one backend can run out and refuse new flows (NAT mappings, Topic 15). And a load balancer has its own per-instance connection ceiling. Size all three for peak concurrent connections with headroom, and graph utilization so you see the curve before it hits the ceiling.
# headroom check — count vs max should leave generous slack at peak sysctl net.netfilter.nf_conntrack_max # e.g. 262144 — the ceiling cat /proc/sys/net/netfilter/nf_conntrack_count # current in-use entries # widen the ephemeral pool if a busy NAT/proxy nears exhaustion sysctl -w net.ipv4.ip_local_port_range="10000 65535"
Bandwidth and Headroom
Provision links to a utilization target, not to capacity. A link planned to run at 90% average has no room for the microbursts that a one-second or one-minute average hides — traffic that is 40% saturated over a minute can be 100% saturated for the 50-millisecond burst that actually queues and drops packets. Target 50–70% average on critical links so bursts have somewhere to go, and measure at sub-second granularity if you want to see them at all.
Headroom is also a function of the growth curve. Linear growth gives you time to react; exponential growth eats a comfortable margin between two capacity reviews. Track the trend, not just the current number, and order the next increment of capacity when the curve — not the present utilization — says you will hit the target. Planning for the average while microbursts saturate the link is the classic version of this mistake.
Scaling Patterns
When a single box or link reaches its ceiling, you either scale up or scale out. Networking favors scaling out: a tier of load balancers fronting horizontally-scaled backends, ECMP spreading flows across parallel paths, and anycast (Topic 23) distributing one address across many sites all add capacity by adding parallel units rather than buying a bigger one. Horizontal tiers also remove the single big box as a failure domain.
Regional expansion is the largest increment — a second region multiplies capacity and shrinks blast radius, but only if the address plan left room for it and the data layer can span it. The signal to re-architect rather than add more units is when a single dimension stops scaling linearly: a stateful component every flow must traverse, a shared lock, a global table. Treat these as cloud quotas in spirit — limits to plan against, raised deliberately, never discovered at peak.
Scale up means a bigger box or a fatter link — more vCPUs on the NAT instance, a 100 Gbps uplink replacing a 10 Gbps one. It is simple and needs no rearchitecting, but it has a hard ceiling (the largest instance, the fastest port), it does nothing for the single-failure-domain problem, and the upgrade is often a disruptive cutover.
Scale out means more parallel units — more backends behind a load balancer, more ECMP paths, more anycast sites. Capacity grows in increments with no fixed ceiling and the failure domain shrinks, but it demands stateless or carefully-sharded design and a load-distribution layer that itself scales. In networking, scale out is the default; scale up is the stopgap until you can.
- Sizing a subnet with no growth headroom (Topic 13) — a /24 that fits today's hosts fills as autoscaling and sidecars arrive, forcing a renumber of a live environment instead of a config change.
- Ignoring conntrack and ephemeral-port limits until they drop traffic under load (Topic 49) — the table fills or the port pool exhausts with no app-side error, surfacing as intermittent, hard-to-diagnose failures.
- Planning bandwidth for the per-minute average while microbursts saturate the link — a 40%-average link can be fully congested for the 50-millisecond burst that actually queues and drops packets.
- Treating a /24 as "enough forever" and packing CIDR blocks tight with no reserved space, so the next tier or region has nowhere to expand without overlapping an existing range.
- Watching current utilization instead of the growth trend, so capacity is ordered only after the ceiling is hit rather than when the curve predicts it — a reactive scramble instead of a planned increment.
- Allocate address space top-down from a large aggregate before deployment (Topic 13), reserving generous per-region and per-environment slack so expansion carves from reserved space, never from between live ranges.
- Size conntrack, ephemeral ports, and load-balancer connection limits for peak concurrency with headroom (Topics 15, 49), and graph utilization so you see the curve approach the ceiling.
- Target 50–70% average utilization on critical links and measure at sub-second granularity, so microbursts have room and you can actually see them queue.
- Scale out with horizontal LB tiers, ECMP, and anycast (Topic 23) by default, reserving scale-up as the stopgap, so capacity grows in increments and the failure domain shrinks.
- Track the growth trend and order the next capacity increment when the curve predicts the target, treating every state-table and quota limit as a number to plan against rather than discover at peak.
Knowledge Check
Why does address-space planning have to happen before deployment, more urgently than bandwidth or state-table sizing?
- An undersized CIDR forces renumbering a live environment, far costlier than adding bandwidth
- Because the address space turns out to be by far the cheapest resource to expand once you are already in production
- Because the subnet's CIDR size directly determines the link bandwidth available
- Because a larger subnet automatically raises the conntrack table limit
A link shows 40% average utilization over each minute, yet packets are dropping at peak. What is the most likely cause?
- Microbursts saturate the link for milliseconds, which the per-minute average hides
- The link is genuinely overloaded on a sustained basis and needs an immediate upgrade
- The conntrack table is full and is dropping packets on existing flows
- The ephemeral-port pool is exhausted, so the link rejects packets
A NAT gateway fronting thousands of clients to one backend starts refusing new connections under load. Which limit is the likely cap?
- The ephemeral-port pool for that tuple, exhausted by too many concurrent mappings
- The subnet CIDR behind the gateway simply ran out of free host addresses to assign to all of the clients
- The IP time-to-live expired before the connections could complete
- The uplink bandwidth was saturated by the number of clients
You got correct