Topic 70

Cloud Networking — VPCs and Security Groups

Cloud

Cloud networking is the same protocols wearing an API. A VPC — Virtual Private Cloud — is your private, software-defined network inside a provider: you pick a CIDR block like 10.0.0.0/16, carve it into subnets, and attach gateways and rules, all through API calls instead of cables and rack screws. Everything from the first twelve chapters still applies. The CIDR math, the routing table, the NAT in chapter 3, the DNS in chapter 6 — none of it changed; it just got provisioned by Terraform and billed by the hour.

What does change is the model around the edges, and it differs per provider. An AWS VPC is regional and you build subnets per Availability Zone; an Azure VNet is regional with subnets that are not zone-bound; a GCP VPC is global, with subnets that are themselves regional. Security groups, route tables, peering, and private service access each have provider-specific shapes — and the single most expensive mistake in this topic, overlapping CIDR ranges, is one you make on day one and pay for two years later when peering refuses to establish.

A VPC nests down to the instance

VPC

10.0.0.0/16 — your private address space

Subnet

10.0.1.0/24 — per-zone slice

Route table

default route → IGW or NAT

Security group

stateful allow-list on the interface

Instance

10.0.1.23 — private IP from the subnet

The VPC Model

A VPC is an isolation boundary with an address range you choose. Inside it, instances get private IPs from your CIDR; nothing routes in from the internet unless you explicitly attach a gateway and write a route. The default posture is private, which is the opposite of plugging a server into a flat office LAN — and the correct one. The defining design choice is the CIDR block, because it is effectively permanent: you can add secondary ranges later on every cloud, but you cannot renumber a live VPC without recreating everything attached to it.

Scope is where the three clouds diverge in a way that bites cross-region designs. On AWS a VPC lives in one region and its subnets each pin to one AZ; spanning regions means separate VPCs joined by peering or a transit gateway. On GCP a single VPC is global — one network object spans every region, with regional subnets inside it — so a VM in us-central1 and one in europe-west1 share the same private network without any peering at all. Assuming AWS-style regional isolation on GCP, or GCP-style global reach on AWS, produces designs that either over-peer or silently fail to connect.

Subnets, Route Tables, and Gateways

A subnet is a slice of the VPC's CIDR tied to a zone or region, and what makes it "public" or "private" is nothing about the subnet itself — it is the route table attached to it. A public subnet is one whose route table sends 0.0.0.0/0 to an internet gateway; a private subnet sends its default route to a NAT gateway (egress only) or nowhere at all. Instances in a private subnet can reach the internet for updates through NAT but cannot be reached from it — the same one-way asymmetry NAT gave you in chapter 3, now a managed box.

The internet gateway is a horizontally-scaled, no-bandwidth-limit construct that performs 1:1 NAT between a private IP and an attached public IP. The NAT gateway is the many-to-one variant, and it is the one with limits worth knowing: a single AWS NAT gateway caps around 55,000 simultaneous connections per public IP to one destination and bills per gigabyte processed, so a chatty private fleet hammering one external API can exhaust ports or run up a surprising egress bill. Size and place NAT per AZ, not as one shared chokepoint.

# a route table makes a subnet public: default route to the IGW
# a private subnet instead points 0.0.0.0/0 at a NAT gateway
Destination      Target
10.0.0.0/16      local            # intra-VPC, always present
0.0.0.0/0        igw-0a1b2c3d     # <- this line = public subnet

Security Groups versus NACLs

Two filtering layers stack on every AWS VPC, and confusing them is a rite of passage. A security group attaches to an instance's interface, is stateful, and is allow-only: you list permitted inbound and outbound traffic, return packets for an allowed flow are let back automatically, and there is no way to write a deny rule. A NACL (network ACL) attaches to a subnet, is stateless, and is an ordered numbered list of allow and deny rules evaluated until one matches — and because it is stateless, you must explicitly allow the ephemeral-port return traffic in both directions.

The practical consequence: security groups are where you do almost all your real work, because stateful allow-lists are simple to reason about. NACLs earn their keep for coarse, explicit denial — blocking a hostile CIDR at the subnet edge, something a security group physically cannot express. GCP collapses both into a single stateful VPC firewall with priority-ordered allow and deny rules; Azure's network security groups are stateful and support deny, sitting somewhere between the two AWS objects. Knowing which model you are in tells you whether "I can't block that IP here" is a real limit or a wrong tool.

Connecting VPCs and On-Prem

VPCs are islands by design, and you join them deliberately. Peering is a direct, non-transitive link between two VPCs: A peers with B and A peers with C, but B and C still cannot talk — there is no transit through A. A transit hub (AWS Transit Gateway, Azure Virtual WAN, GCP Network Connectivity Center) is the answer to peering's non-transitivity, a hub-and-spoke router that every VPC attaches to once. To reach on-prem you run an IPsec VPN over the internet or a private circuit — AWS Direct Connect, Azure ExpressRoute, GCP Cloud Interconnect — for predictable bandwidth and lower latency.

Every one of these connection methods shares one hard prerequisite: the address ranges on both sides must not overlap. This is the CIDR-overlap trap from chapter 3, and it is unforgiving. If your VPC is 10.0.0.0/16 and the VPC or on-prem network you later want to connect is also 10.0.0.0/16, peering refuses to establish and a VPN tunnel cannot route — the gateway has two equally-valid destinations for the same prefix and no way to choose. There is no flag to fix it; you renumber one entire side, which on a production network is a multi-week migration. Plan the address space across every VPC, region, and on-prem block before you create the first one.

Security Group vs NACL vs On-Prem Firewall

Security group — stateful, allow-only, attached to an instance interface. Return traffic for an allowed flow is automatic; you cannot write a deny. Use it as your primary control: it is simple to reason about and scoped to exactly the workloads it protects.

NACL — stateless, ordered allow/deny rules, attached to a subnet. Because it is stateless you must allow ephemeral return ports yourself. Reach for it only for coarse subnet-edge denial, such as blocking a hostile CIDR — the one thing a security group cannot do.

On-prem firewall — a stateful physical or virtual appliance inspecting traffic at a network boundary, with full allow and deny, zones, and often L7 inspection. The cloud splits its job across the SG (instance allow-list) and the NACL (subnet deny), with the provider operating the box.

Common Mistakes

Reusing the same CIDR (often 10.0.0.0/16 or the default 172.31.0.0/16) across VPCs you later want to peer or connect to on-prem. Peering refuses to establish and there is no fix but renumbering one entire side — the chapter-3 overlap trap, now permanent.
Launching instances into a public subnet by accident — a route table with a 0.0.0.0/0 route to an internet gateway plus a public IP leaves them internet-exposed, scanned within minutes. Default to private subnets and add public reachability only where you mean it.
Security-group sprawl: hundreds of one-off rules accreted over years until no one can say what is actually open. The blast radius of a single overly-broad 0.0.0.0/0 ingress rule hides in the noise.
Assuming peering is transitive. A peered to B and A peered to C does not let B reach C; traffic does not transit through A. Teams wire a hub VPC, peer everything to it, and are baffled that spokes can't talk — they need a transit gateway, not peering.
Believing cross-region or cross-cloud traffic rides one private fabric. Two AWS VPCs in different regions, or a VPC and an Azure VNet, are separate networks; without explicit peering, a transit hub, or a VPN they reach each other only over the public internet.

Best Practices

Allocate a single non-overlapping address plan across every VPC, region, and on-prem block before creating the first VPC — give each environment its own slice of 10.0.0.0/8 with room to grow, because renumbering later is a migration, not an edit.
Default every subnet to private and place workloads there; expose only what must be reached through a load balancer or a deliberately public subnet, so a forgotten instance is never directly on the internet.
Do real filtering with security groups and reserve NACLs for coarse subnet-edge denial — reference security groups by ID as sources instead of IP ranges, so rules track membership instead of accreting hardcoded addresses.
Run a NAT gateway per AZ and watch its connection and port-allocation metrics, because a single shared NAT becomes a 55,000-connection chokepoint and a per-gigabyte egress bill the moment a private fleet gets chatty.
Use a transit hub from the start when you expect more than three VPCs to interconnect — peering's non-transitivity turns into an N-squared mesh of links you will eventually tear down to adopt the hub anyway.

Comparable conceptsAWS VPCAzure VNetGCP VPCOn-prem physical network

Knowledge Check

You peer VPC-A with VPC-B and VPC-A with VPC-C. Why still can't VPC-B reach VPC-C?

Peering is non-transitive; traffic does not transit through A to a peer of a peer
A security group on VPC-A is statefully blocking the forwarded return traffic between the two peers
VPC-A lacks a NAT gateway to forward the packets between its two peers
The three VPCs share an overlapping CIDR block that breaks the route

What is the practical difference between a security group and a NACL on AWS?

The SG is stateful, allow-only, per instance; the NACL is stateless, supports deny, per subnet
The SG is stateless per subnet, while the NACL is the stateful per-instance allow-list
The SG only balances load across instances; the NACL is what actually filters traffic
Both support deny rules, but the security group is the one that evaluates them in numbered priority order

What makes a cloud subnet "public" rather than "private"?

Its route table sends the default route to an internet gateway rather than a NAT gateway
The subnet is allocated a larger CIDR block so it has room for public-facing hosts
Its instances are attached to a permissive security group that allows all inbound traffic from the internet
A provider flag on the subnet object marks it public at creation time

You got correct