All Articles

AWS VPC Networking: The Complete Guide for DevOps Engineers (2026)

Understand AWS VPC from the ground up — subnets, route tables, security groups, NACLs, VPC peering, Transit Gateway, and real-world architectures for production workloads.

DevOpsBoysMar 13, 202610 min read
Share:Tweet

Networking is the foundation of everything in AWS. Your EC2 instances, EKS clusters, RDS databases, Lambda functions — they all live inside a VPC. If you don't understand how VPC networking works, you'll spend hours debugging mysterious connectivity issues that have simple explanations.

This guide walks through every core VPC concept — not just what it is, but why it exists and how to use it correctly.


What Is a VPC?

A Virtual Private Cloud (VPC) is your own isolated network inside AWS. Think of it as renting a private section of the AWS data center where you control:

  • The IP address range
  • How traffic flows between resources
  • What reaches the internet and what stays private
  • Firewall rules at multiple levels

Every AWS account gets a default VPC in each region (172.31.0.0/16). But production workloads should always use a custom VPC that you've designed deliberately.


CIDR Blocks: Planning Your IP Space

Before you create a VPC, you choose a CIDR block — the IP address range for your entire network. This is permanent. Once a VPC is created with a CIDR, you can add secondary CIDRs but you cannot change the primary one.

Common choices:

  • 10.0.0.0/16 → 65,536 IP addresses (most common)
  • 172.16.0.0/16 → 65,536 IP addresses
  • 10.0.0.0/8 → 16 million IP addresses (for large enterprises)

Why this matters: If you plan to connect multiple VPCs or connect to your on-premises network via VPN, their CIDR blocks must not overlap. Plan your IP ranges before you deploy anything.

A /16 VPC gives you 65,536 IPs to divide into subnets. This is enough for almost any workload.


Subnets: Dividing Your VPC

A subnet is a range of IP addresses within your VPC. Resources (EC2, RDS, EKS nodes) live in subnets — not directly in the VPC.

Two types of subnets:

Public subnets — have a route to an Internet Gateway. Resources here can be reached from the internet (if their security group allows it) and can reach the internet directly.

Private subnets — no direct route to the internet. Resources here can only be reached from within the VPC (or via VPN/Direct Connect). They need a NAT Gateway to initiate outbound internet connections.

How to Split a /16 Into Subnets

If your VPC is 10.0.0.0/16, a practical layout for a 3-AZ production setup:

VPC: 10.0.0.0/16

Public subnets (one per AZ):
  10.0.1.0/24   → us-east-1a  (256 IPs)
  10.0.2.0/24   → us-east-1b
  10.0.3.0/24   → us-east-1c

Private subnets (app tier):
  10.0.11.0/24  → us-east-1a
  10.0.12.0/24  → us-east-1b
  10.0.13.0/24  → us-east-1c

Private subnets (data tier):
  10.0.21.0/24  → us-east-1a
  10.0.22.0/24  → us-east-1b
  10.0.23.0/24  → us-east-1c

Note: AWS reserves 5 IPs in each subnet (network address, router, DNS, future use, broadcast). A /24 gives you 251 usable IPs.

Always spread across multiple Availability Zones. If one AZ goes down and all your resources are in a single subnet, your entire application is offline.


Internet Gateway: The Door to the Internet

An Internet Gateway (IGW) is a horizontally-scaled, redundant, highly available VPC component that enables communication between your VPC and the internet.

One IGW per VPC. It's either attached or it isn't.

For a subnet to be "public," two things must be true:

  1. The VPC has an Internet Gateway attached
  2. The subnet's route table has a route 0.0.0.0/0 → igw-xxxxxxxx

Just having an IGW isn't enough — the route table must point to it.

hcl
# Terraform: create and attach IGW
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
}
 
# Route table for public subnets
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
 
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }
}
 
resource "aws_route_table_association" "public" {
  count          = length(var.public_subnet_ids)
  subnet_id      = var.public_subnet_ids[count.index]
  route_table_id = aws_route_table.public.id
}

NAT Gateway: Private Subnets Reaching the Internet

Resources in private subnets often need outbound internet access — to pull container images, reach AWS APIs, install packages, or send data to external services. But they shouldn't be directly reachable from the internet.

A NAT (Network Address Translation) Gateway sits in a public subnet and translates outbound traffic from private subnets. Private resources send traffic to the NAT Gateway → it forwards to the internet → returns the response.

Important: NAT Gateways are expensive. A single NAT Gateway costs ~$32/month plus $0.045 per GB of data processed. For a high-traffic application, NAT Gateway costs can exceed your EC2 costs.

Deploying NAT Gateway in Terraform

hcl
# Elastic IP for the NAT Gateway
resource "aws_eip" "nat" {
  domain = "vpc"
}
 
# NAT Gateway goes in a PUBLIC subnet
resource "aws_nat_gateway" "main" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public[0].id
 
  depends_on = [aws_internet_gateway.main]
}
 
# Route table for private subnets — points to NAT GW
resource "aws_route_table" "private" {
  vpc_id = aws_vpc.main.id
 
  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main.id
  }
}

Cost optimization tip: Use one NAT Gateway per region (instead of one per AZ) in non-production environments. Use VPC Endpoints for AWS services (S3, ECR, DynamoDB) to avoid NAT Gateway data charges for those services.


Route Tables: How Traffic Finds Its Way

Every subnet is associated with exactly one route table. The route table contains rules about where traffic should go.

A route table is evaluated top-to-bottom with the most specific route winning:

Destination         Target
10.0.0.0/16        local           ← All VPC-internal traffic stays in VPC
10.0.0.0/8         tgw-12345       ← Traffic to 10.x.x.x goes to Transit Gateway
0.0.0.0/0          nat-0a1b2c3d    ← Everything else goes to NAT (for private subnets)

The local route is automatically added and cannot be removed — it ensures all traffic within the VPC CIDR routes locally.


Security Groups: Stateful Firewalls

Security groups are virtual firewalls attached to individual resources (EC2 instances, RDS databases, EKS nodes, load balancers). They control inbound and outbound traffic at the resource level.

Key characteristics:

Stateful: If you allow inbound TCP on port 80, the response traffic is automatically allowed outbound — you don't need a separate outbound rule. This is the most important thing to understand about security groups.

Allow-only: Security groups only have allow rules. There are no deny rules. Traffic that doesn't match any rule is denied by default.

VPC-scoped: A security group belongs to a VPC. It can only be applied to resources in the same VPC.

Good Security Group Design

hcl
# Web tier — allow HTTP/HTTPS from internet
resource "aws_security_group" "web" {
  name   = "web-sg"
  vpc_id = aws_vpc.main.id
 
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
 
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
 
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}
 
# App tier — only allow traffic FROM the web tier security group
resource "aws_security_group" "app" {
  name   = "app-sg"
  vpc_id = aws_vpc.main.id
 
  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.web.id]  # Reference SG, not CIDR
  }
 
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}
 
# Database tier — only allow from app tier
resource "aws_security_group" "db" {
  name   = "db-sg"
  vpc_id = aws_vpc.main.id
 
  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id]
  }
}

Using security group IDs instead of CIDR ranges is safer — when your app tier scales, new instances automatically have access without updating rules.


NACLs: Stateless Subnet-Level Firewalls

Network ACLs (NACLs) are a second layer of security at the subnet level. Unlike security groups, they are stateless — return traffic is not automatically allowed. You need explicit allow rules for both inbound and outbound directions.

NACLs are evaluated in rule number order (lowest first). The first matching rule wins, including DENY rules.

For most applications, security groups are sufficient. Use NACLs only when you need:

  • An extra layer of defense at the subnet boundary
  • Explicit DENY rules (security groups can't deny, only allow)
  • Blocking traffic from specific CIDRs that might be compromised
hcl
resource "aws_network_acl" "private" {
  vpc_id     = aws_vpc.main.id
  subnet_ids = aws_subnet.private[*].id
 
  # Allow inbound from VPC CIDR
  ingress {
    protocol   = "-1"
    rule_no    = 100
    action     = "allow"
    cidr_block = "10.0.0.0/16"
    from_port  = 0
    to_port    = 0
  }
 
  # Deny everything else inbound
  ingress {
    protocol   = "-1"
    rule_no    = 32766
    action     = "deny"
    cidr_block = "0.0.0.0/0"
    from_port  = 0
    to_port    = 0
  }
 
  # Allow all outbound
  egress {
    protocol   = "-1"
    rule_no    = 100
    action     = "allow"
    cidr_block = "0.0.0.0/0"
    from_port  = 0
    to_port    = 0
  }
}

VPC Endpoints: Private Access to AWS Services

By default, when your private subnet connects to S3, DynamoDB, ECR, or other AWS services, that traffic goes through the NAT Gateway — costing you money per GB.

VPC Endpoints create a private connection between your VPC and AWS services without going through the internet or NAT Gateway.

Two types:

Gateway Endpoints (free): S3 and DynamoDB only. Work by adding a route to your route table.

hcl
resource "aws_vpc_endpoint" "s3" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.us-east-1.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = [aws_route_table.private.id]
}

Interface Endpoints ($7-8/month each): All other AWS services (ECR, SSM, Secrets Manager, etc.). Create an ENI in your subnet.

hcl
resource "aws_vpc_endpoint" "ecr_api" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.us-east-1.ecr.api"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.vpc_endpoint.id]
  private_dns_enabled = true
}

For EKS clusters running in private subnets, VPC Endpoints for ECR, S3, EC2, and STS are essentially mandatory — otherwise your nodes can't pull container images or register with the control plane.


VPC Peering vs Transit Gateway

When you have multiple VPCs (common in multi-account setups), you need a way for them to communicate.

VPC Peering

Direct, private connection between two VPCs. Works across accounts and regions.

Limitations:

  • Does not support transitive routing. If VPC A peers with VPC B, and VPC B peers with VPC C — VPC A cannot reach VPC C through VPC B.
  • You need a separate peering connection for every pair of VPCs. 10 VPCs = 45 peering connections.

Best for: Small setups with 2-5 VPCs that need to connect directly.

Transit Gateway

A hub-and-spoke model. All VPCs attach to the Transit Gateway and can route to each other through it.

  • Supports transitive routing
  • One attachment per VPC (instead of N-1 peering connections)
  • Works across accounts and regions
  • Supports VPN and Direct Connect attachments

Best for: 5+ VPCs, multi-account setups, hybrid cloud.

hcl
resource "aws_ec2_transit_gateway" "main" {
  description = "Main Transit Gateway"
  default_route_table_association = "enable"
  default_route_table_propagation = "enable"
}
 
resource "aws_ec2_transit_gateway_vpc_attachment" "vpc1" {
  subnet_ids         = aws_subnet.private[*].id
  transit_gateway_id = aws_ec2_transit_gateway.main.id
  vpc_id             = aws_vpc.main.id
}

Production VPC Architecture (3-Tier)

Here's the standard 3-tier architecture you'll see in most production AWS environments:

Internet
    |
Internet Gateway
    |
┌───────────────────────────────────────────────────────┐
│  PUBLIC SUBNETS (AZ-a, AZ-b, AZ-c)                   │
│  - Application Load Balancer                          │
│  - NAT Gateway                                        │
│  - Bastion host (if needed)                           │
└───────────────────────────────────────────────────────┘
    |
┌───────────────────────────────────────────────────────┐
│  PRIVATE APP SUBNETS (AZ-a, AZ-b, AZ-c)              │
│  - EKS worker nodes / EC2 instances                   │
│  - ECS tasks                                          │
│  - Lambda (VPC-attached)                              │
└───────────────────────────────────────────────────────┘
    |
┌───────────────────────────────────────────────────────┐
│  PRIVATE DATA SUBNETS (AZ-a, AZ-b, AZ-c)             │
│  - RDS databases                                      │
│  - ElastiCache                                        │
│  - OpenSearch                                         │
└───────────────────────────────────────────────────────┘

Traffic flows:

  • Users → ALB (public subnet) → App tier (private subnet) → Database (private subnet)
  • App tier → NAT Gateway → Internet (for outbound calls to APIs, etc.)
  • App tier → VPC Endpoints → AWS services (no NAT Gateway charges)

Common VPC Mistakes

Using the default VPC in production. The default VPC has all subnets as public. Never run production workloads there.

Not enabling VPC Flow Logs. Flow logs capture all network traffic metadata. Essential for security investigation and debugging. Enable them on every production VPC.

hcl
resource "aws_flow_log" "main" {
  vpc_id          = aws_vpc.main.id
  traffic_type    = "ALL"
  iam_role_arn    = aws_iam_role.flow_logs.arn
  log_destination = aws_cloudwatch_log_group.flow_logs.arn
}

Overlapping CIDRs. Plan your CIDR ranges before deploying. You cannot peer or connect VPCs with overlapping IP ranges.

Single AZ subnets. Always deploy to at least two AZs. One AZ failure should not take down your application.

Too-small subnets. Running an EKS cluster in a /28 (11 usable IPs) doesn't work. EKS assigns one IP per pod on many CNI configurations. Plan for growth — /22 or larger for EKS node subnets.


Keep Learning

AWS networking is deep — this guide covers the core concepts but there's much more: PrivateLink, Route 53 Resolver, Network Firewall, Direct Connect, and multi-region architectures. If you want hands-on labs that let you actually build these architectures in a real AWS environment:

👉 AWS and DevOps courses at KodeKloud

👉 Get your own AWS VPS for lab practice at DigitalOcean

VPC networking is one of those topics where the concepts are simple once they click — but they only click when you build it yourself.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments