Topic 47

Networking — VPC and Subnets

Networking

Almost every AWS deployment starts with a network: a VPC, public and private subnets spread across availability zones, an internet gateway, a NAT gateway, and the route tables that wire them together. This is where most people first feel Terraform's value — a whole regional network from one config, reproducible in a second account — and first hit its sharp edges.

The sharpest edge is immutability. A VPC's CIDR block can't change without recreating the VPC, and a subnet's CIDR or availability zone can't change without recreating the subnet. Get the address layout wrong and the fix is a teardown, not an edit. So the lesson of this topic is as much about planning the network on paper as about the HCL.

A regional VPC, subdivided

VPC

10.0.0.0/16 · one IPv4 range for the region

Public subnets

10.0.0.0/24 (az-a) 10.0.1.0/24 (az-b) 10.0.2.0/24 (az-c)

route table → internet gateway

Private subnets

10.0.10.0/24 (az-a) 10.0.11.0/24 (az-b) 10.0.12.0/24 (az-c)

route table → NAT gateway → internet gateway

The VPC and Its CIDR

A VPC is one aws_vpc resource with a CIDR block — the private IPv4 range every subnet carves from. Pick it deliberately: 10.0.0.0/16 gives you 65,536 addresses, enough room to subdivide across three or four AZs with space left over. The number that matters is that cidr_block forces replacement when changed, so a range you outgrow or that overlaps a network you later need to peer is not a tweak — it's a rebuild of everything inside it.

vpc.tf — the VPC and an internet gateway

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  tags                 = { Name = "app-vpc" }
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
}

The internet gateway references aws_vpc.main.id rather than a hardcoded ID, so Terraform infers the dependency and creates the VPC first. Every resource in this topic is wired that way — the references are the dependency graph.

Subnets Across Availability Zones

Production subnets span at least two AZs so a single zone failure doesn't take the application down. Read the live zone list from the aws_availability_zones data source instead of hardcoding us-east-1a; the same config then works unchanged in eu-west-1. Iterate with for_each over a map keyed by AZ, and derive each subnet's range with cidrsubnet so the ranges never overlap by hand-arithmetic mistake.

subnets.tf — public subnets per AZ, ranges from cidrsubnet

data "aws_availability_zones" "available" {
  state = "available"
}

locals {
  # first three AZs, indexed 0,1,2 -> 10.0.0.0/24, 10.0.1.0/24, 10.0.2.0/24
  azs = { for i, az in slice(data.aws_availability_zones.available.names, 0, 3) : az => i }
}

resource "aws_subnet" "public" {
  for_each                = local.azs
  vpc_id                  = aws_vpc.main.id
  availability_zone       = each.key
  cidr_block              = cidrsubnet(aws_vpc.main.cidr_block, 8, each.value)
  map_public_ip_on_launch = true
  tags                    = { Name = "public-${each.key}" }
}

cidrsubnet(cidr, 8, n) adds 8 bits to the /16, producing /24 blocks, and the index n selects which one — so the function computes the ranges and they cannot collide. The reason to use for_each over a map rather than count over a list is identity: each subnet is keyed by its AZ name, so inserting or removing an AZ touches only that one subnet. With count, dropping the middle AZ renumbers every subnet after it and Terraform destroys and recreates them all.

Gateways, NAT, and Routing

Public subnets reach the internet through the internet gateway; private subnets reach out (for package updates, API calls) through a NAT gateway that lives in a public subnet and holds an Elastic IP. Routing is explicit: a route table per tier, a default route to the right gateway, and an association joining each subnet to its table. A common cost surprise is one NAT gateway per AZ for high availability — each one bills hourly plus per-GB, so a three-AZ layout means three NAT gateways running around the clock.

routing.tf — public route table and associations

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }
}

resource "aws_route_table_association" "public" {
  for_each       = aws_subnet.public
  subnet_id      = each.value.id
  route_table_id = aws_route_table.public.id
}

The association iterates over aws_subnet.public directly — each.value.id is each subnet's ID — so the set of associations always matches the set of subnets. Add an AZ and the matching association appears with it.

The VPC Module vs Hand-Rolling

Everything above is what the terraform-aws-modules/vpc module does for you, and for a standard topology it is the right call: it has handled the edge cases — flow logs, NAT-per-AZ toggles, IPv6, dozens of tag knobs — across thousands of deployments. Hand-roll the network only when you have a non-standard requirement the module fights you on, or when the learning is the point. Reaching for the module is not laziness; it is declining to re-debug a solved problem.

Immutability Gotchas

The attributes that force replacement are the ones tied to the network's addressing: the VPC cidr_block, and a subnet's cidr_block and availability_zone. Changing any of them destroys and recreates the resource, and recreating a subnet that an ASG or load balancer sits in is an outage. This is why the CIDR layout is a design decision made once, up front, and written down — not something you iterate on in production.

Common Mistakes

Choosing a VPC CIDR that overlaps another VPC or an on-prem range you later need to peer or connect — peering refuses to establish, and the only fix is renumbering and recreating one side.
Creating subnets with count over an AZ list, then having that list change and renumber every index, so Terraform destroys and recreates subnets that were fine.
Hardcoding AZ names like us-east-1a instead of reading the aws_availability_zones data source, so the config breaks the moment you run it in another region.
Computing subnet CIDRs by hand and producing silent overlaps, instead of deriving them with cidrsubnet from the VPC range.
Running one NAT gateway per AZ for HA without noticing each bills hourly plus per-GB, turning a quiet network into a four-figure monthly line item.

Best Practices

Plan the CIDR layout deliberately up front — it is effectively immutable — and document which range each VPC, environment, and on-prem network owns.
Use for_each over a stable AZ-keyed map and cidrsubnet to derive subnet ranges, so adding or removing an AZ touches only that subnet.
Read AZs from the aws_availability_zones data source so the same config is region-portable.
Use the maintained terraform-aws-modules/vpc module for standard topologies unless you have a concrete reason to hand-roll.
Reference IDs through attributes (aws_vpc.main.id), never hardcoded strings, so the dependency graph and create/destroy order build themselves.

Comparable tools CloudFormation VPC templates and the AWS VPC quickstart Pulumi awsx higher-level VPC component terraform-aws-modules/vpc the de facto standard module

Knowledge Check

Why is a VPC's cidr_block described as effectively immutable?

Changing it forces replacement of the VPC, which recreates everything inside it
AWS bills a one-time reconfiguration fee each time a VPC's CIDR range is edited in place
Terraform refuses to plan any change to a VPC after the first apply
The CIDR is stored in state and state values can never be updated

Why is for_each over an AZ-keyed map preferred to count for subnets?

Each subnet keeps a stable identity by AZ, so adding or removing one AZ doesn't renumber and recreate the others
count cannot create more than one subnet from a single resource block at a time
for_each creates the subnets fully in parallel while count is forced to create each one strictly one after another
count-based subnets cannot reference aws_vpc.main.id for the parent VPC

What problem does cidrsubnet(aws_vpc.main.cidr_block, 8, n) solve?

It derives non-overlapping subnet ranges from the VPC CIDR, avoiding hand-arithmetic overlaps
It validates that the VPC CIDR does not overlap with any on-prem range you plan to peer with
It assigns a public IP to each instance launched into the resulting subnet automatically
It converts the VPC CIDR into the list of availability zones the range can span

When is hand-rolling the network preferable to the terraform-aws-modules/vpc module?

When you have a non-standard requirement the module fights you on, or the learning is the point
Always — community modules like this one are unmaintained and unsafe to depend on in production
Whenever the VPC needs to span more than two availability zones in the region
Only when the configuration is not using a remote backend to store its state

You got correct