Service 65

Common Anti-Patterns

ArchitecturePitfallsPractices

A pattern is a recipe with known trade-offs; an anti-pattern is a recipe with known regret. These are the mistakes that show up most often in AWS workloads — some obvious in hindsight, some that look reasonable until they break.

Knowing about them ahead of time is cheaper than discovering them during an incident. Most have a corresponding AWS service or built-in feature that does the right thing.

Identity, Network, and Data

Identity: long-lived IAM keys for humans, wildcard policies 'to be easier later', using IAM for application end-user auth, and skipping MFA on the root account. Network: opening port 22 to the world, single-AZ production, hard-coded CIDRs in security groups, one NAT Gateway for all AZs, forgetting NLB cross-zone load balancing, and NAT egress for AWS-service traffic.

Data: single-AZ databases, untested backups, forever log retention, CloudWatch metric cardinality explosions, day-one single-table DynamoDB, hot partition keys, storing files in DynamoDB instead of S3, and forgetting LSIs are creation-only.

Compute and Operations

Compute: running EC2 for everything out of habit, manually patching instances instead of Systems Manager, long-running Lambdas that should be Step Functions, Spot for stateful workloads, and a single Region for global users with no CloudFront.

Operations: console-only changes drifting from IaC, skipping the change set on production CloudFormation updates, one CI/CD pipeline for everything, manual rollbacks, pager fatigue from over-alerting, and no runbook for common incidents.

Architecture, Cost, and Security

Architecture: premature microservices, premature multi-Region, premature CQRS/event-sourcing, building a custom thing AWS already offers managed, and lift-and-shift without rethinking. The recurring theme: premature optimization is more common and more costly than under-optimization. Start simple; add complexity when measurements demand it.

Cost: untagged resources, dev running 24/7, On-Demand for steady workloads, Multi-AZ in dev, never deleting old snapshots. Security: secrets in source code, accidentally public buckets, disabling CloudTrail to save money, skipping GuardDuty in non-production, and no SCPs.

Premature optimization vs under-optimization

Premature optimization — microservices, multi-Region, CQRS adopted before measurement justifies them — the more common and costlier error.

Start simple — a well-structured monolith, Multi-AZ single Region, a normal database — add complexity when it hurts.

Under-optimization — rarer; usually caught by the bill or an incident before it becomes existential.

Common Mistakes

Long-lived IAM access keys for humans and wildcard policies granted 'to be easier later'.
Opening port 22 (or the database) to 0.0.0.0/0 and running single-AZ production.
Untested backups, forever log retention, and CloudWatch metric cardinality explosions.
Premature microservices, multi-Region, or CQRS before measurement justifies the complexity.
Storing secrets in source code and leaving S3 buckets accidentally public.
Disabling CloudTrail to save money or skipping GuardDuty and SCPs in non-production accounts.

Best Practices

Use Identity Center and roles; least-privilege policies tightened from real usage; MFA on root.
Default to private subnets, security groups referencing other groups, NAT per AZ, and VPC endpoints.
Make Multi-AZ the production floor, set log retention deliberately, and aggregate metrics before publishing.
Start simple — monolith, single Region, normal database — and add complexity only when measured.
Keep secrets in Secrets Manager, Block Public Access on, CloudTrail on, and GuardDuty everywhere.
Use change sets, automated rollback, per-service pipelines, and practiced runbooks.

Comparable services GCP Architecture Framework anti-patterns guidanceAzure Well-Architected antipatterns guidance

Knowledge Check

What is the recurring theme across AWS architecture anti-patterns?

Premature optimization (microservices, multi-Region, CQRS) is costlier than under-optimization — start simple
Under-provisioning capacity, not over-engineering, is by far the single dominant and costliest mistake teams make
Leaning on managed services like RDS and SQS is usually the wrong call and adds avoidable cost
Multi-AZ deployment is needless over-engineering for production workloads

Which is a common identity anti-pattern?

Long-lived IAM access keys for human engineers instead of Identity Center with MFA
Using IAM roles for EC2 instances instead of embedding long-lived access keys in the AMI
Enabling MFA on the root account and locking away its credentials
Granting least privilege and scoping each policy narrowly

Why is 'disabling CloudTrail to save money' an anti-pattern?

The audit log is the evidence base for every incident response and compliance audit; management events are cheap
CloudTrail simply cannot be disabled once a trail has been created
Capturing every management event measurably improves request performance but quietly weakens your account's security posture
The audit trail only matters in the production account and Region

What is the right response to most of these anti-patterns?

Use the corresponding AWS service or built-in feature that does the right thing, and start simple
Build a custom in-house solution from scratch to avoid vendor lock-in
Adopt every advanced pattern up front so the architecture never has to be refactored or revisited later on
Ignore them until a production incident forces an emergency fix

You got correct