Chapter 8: Managing Real Infrastructure
Topic 47

Networking — VPC and Subnets

Networking

Almost every AWS deployment starts with a network: a VPC, public and private subnets spread across availability zones, an internet gateway, a NAT gateway, and the route tables that wire them together. This is where most people first feel Terraform's value — a whole regional network from one config, reproducible in a second account — and first hit its sharp edges.

The sharpest edge is immutability. A VPC's CIDR block can't change without recreating the VPC, and a subnet's CIDR or availability zone can't change without recreating the subnet. Get the address layout wrong and the fix is a teardown, not an edit. So the lesson of this topic is as much about planning the network on paper as about the HCL.

A regional VPC, subdivided
VPC
10.0.0.0/16 · one IPv4 range for the region
Public subnets
10.0.0.0/24 (az-a)   10.0.1.0/24 (az-b)   10.0.2.0/24 (az-c)
route table → internet gateway
Private subnets
10.0.10.0/24 (az-a)   10.0.11.0/24 (az-b)   10.0.12.0/24 (az-c)
route table → NAT gateway → internet gateway

The VPC and Its CIDR

A VPC is one aws_vpc resource with a CIDR block — the private IPv4 range every subnet carves from. Pick it deliberately: 10.0.0.0/16 gives you 65,536 addresses, enough room to subdivide across three or four AZs with space left over. The number that matters is that cidr_block forces replacement when changed, so a range you outgrow or that overlaps a network you later need to peer is not a tweak — it's a rebuild of everything inside it.

vpc.tf — the VPC and an internet gateway
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  tags                 = { Name = "app-vpc" }
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
}

The internet gateway references aws_vpc.main.id rather than a hardcoded ID, so Terraform infers the dependency and creates the VPC first. Every resource in this topic is wired that way — the references are the dependency graph.

Subnets Across Availability Zones

Production subnets span at least two AZs so a single zone failure doesn't take the application down. Read the live zone list from the aws_availability_zones data source instead of hardcoding us-east-1a; the same config then works unchanged in eu-west-1. Iterate with for_each over a map keyed by AZ, and derive each subnet's range with cidrsubnet so the ranges never overlap by hand-arithmetic mistake.

subnets.tf — public subnets per AZ, ranges from cidrsubnet
data "aws_availability_zones" "available" {
  state = "available"
}

locals {
  # first three AZs, indexed 0,1,2 -> 10.0.0.0/24, 10.0.1.0/24, 10.0.2.0/24
  azs = { for i, az in slice(data.aws_availability_zones.available.names, 0, 3) : az => i }
}

resource "aws_subnet" "public" {
  for_each                = local.azs
  vpc_id                  = aws_vpc.main.id
  availability_zone       = each.key
  cidr_block              = cidrsubnet(aws_vpc.main.cidr_block, 8, each.value)
  map_public_ip_on_launch = true
  tags                    = { Name = "public-${each.key}" }
}

cidrsubnet(cidr, 8, n) adds 8 bits to the /16, producing /24 blocks, and the index n selects which one — so the function computes the ranges and they cannot collide. The reason to use for_each over a map rather than count over a list is identity: each subnet is keyed by its AZ name, so inserting or removing an AZ touches only that one subnet. With count, dropping the middle AZ renumbers every subnet after it and Terraform destroys and recreates them all.

Gateways, NAT, and Routing

Public subnets reach the internet through the internet gateway; private subnets reach out (for package updates, API calls) through a NAT gateway that lives in a public subnet and holds an Elastic IP. Routing is explicit: a route table per tier, a default route to the right gateway, and an association joining each subnet to its table. A common cost surprise is one NAT gateway per AZ for high availability — each one bills hourly plus per-GB, so a three-AZ layout means three NAT gateways running around the clock.

routing.tf — public route table and associations
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }
}

resource "aws_route_table_association" "public" {
  for_each       = aws_subnet.public
  subnet_id      = each.value.id
  route_table_id = aws_route_table.public.id
}

The association iterates over aws_subnet.public directly — each.value.id is each subnet's ID — so the set of associations always matches the set of subnets. Add an AZ and the matching association appears with it.

The VPC Module vs Hand-Rolling

Everything above is what the terraform-aws-modules/vpc module does for you, and for a standard topology it is the right call: it has handled the edge cases — flow logs, NAT-per-AZ toggles, IPv6, dozens of tag knobs — across thousands of deployments. Hand-roll the network only when you have a non-standard requirement the module fights you on, or when the learning is the point. Reaching for the module is not laziness; it is declining to re-debug a solved problem.

Immutability Gotchas

The attributes that force replacement are the ones tied to the network's addressing: the VPC cidr_block, and a subnet's cidr_block and availability_zone. Changing any of them destroys and recreates the resource, and recreating a subnet that an ASG or load balancer sits in is an outage. This is why the CIDR layout is a design decision made once, up front, and written down — not something you iterate on in production.

Common Mistakes
  • Choosing a VPC CIDR that overlaps another VPC or an on-prem range you later need to peer or connect — peering refuses to establish, and the only fix is renumbering and recreating one side.
  • Creating subnets with count over an AZ list, then having that list change and renumber every index, so Terraform destroys and recreates subnets that were fine.
  • Hardcoding AZ names like us-east-1a instead of reading the aws_availability_zones data source, so the config breaks the moment you run it in another region.
  • Computing subnet CIDRs by hand and producing silent overlaps, instead of deriving them with cidrsubnet from the VPC range.
  • Running one NAT gateway per AZ for HA without noticing each bills hourly plus per-GB, turning a quiet network into a four-figure monthly line item.
Best Practices
  • Plan the CIDR layout deliberately up front — it is effectively immutable — and document which range each VPC, environment, and on-prem network owns.
  • Use for_each over a stable AZ-keyed map and cidrsubnet to derive subnet ranges, so adding or removing an AZ touches only that subnet.
  • Read AZs from the aws_availability_zones data source so the same config is region-portable.
  • Use the maintained terraform-aws-modules/vpc module for standard topologies unless you have a concrete reason to hand-roll.
  • Reference IDs through attributes (aws_vpc.main.id), never hardcoded strings, so the dependency graph and create/destroy order build themselves.
Comparable tools CloudFormation VPC templates and the AWS VPC quickstart Pulumi awsx higher-level VPC component terraform-aws-modules/vpc the de facto standard module

Knowledge Check

Why is a VPC's cidr_block described as effectively immutable?

  • Changing it forces replacement of the VPC, which recreates everything inside it
  • AWS bills a one-time reconfiguration fee each time a VPC's CIDR range is edited in place
  • Terraform refuses to plan any change to a VPC after the first apply
  • The CIDR is stored in state and state values can never be updated

Why is for_each over an AZ-keyed map preferred to count for subnets?

  • Each subnet keeps a stable identity by AZ, so adding or removing one AZ doesn't renumber and recreate the others
  • count cannot create more than one subnet from a single resource block at a time
  • for_each creates the subnets fully in parallel while count is forced to create each one strictly one after another
  • count-based subnets cannot reference aws_vpc.main.id for the parent VPC

What problem does cidrsubnet(aws_vpc.main.cidr_block, 8, n) solve?

  • It derives non-overlapping subnet ranges from the VPC CIDR, avoiding hand-arithmetic overlaps
  • It validates that the VPC CIDR does not overlap with any on-prem range you plan to peer with
  • It assigns a public IP to each instance launched into the resulting subnet automatically
  • It converts the VPC CIDR into the list of availability zones the range can span

When is hand-rolling the network preferable to the terraform-aws-modules/vpc module?

  • When you have a non-standard requirement the module fights you on, or the learning is the point
  • Always — community modules like this one are unmaintained and unsafe to depend on in production
  • Whenever the VPC needs to span more than two availability zones in the region
  • Only when the configuration is not using a remote backend to store its state

You got correct