Common Anti-Patterns
A pattern is a recipe with known trade-offs; an anti-pattern is a recipe with known regret. These are the mistakes that show up most often in AWS workloads — some obvious in hindsight, some that look reasonable until they break.
Knowing about them ahead of time is cheaper than discovering them during an incident. Most have a corresponding AWS service or built-in feature that does the right thing.
Identity, Network, and Data
Identity: long-lived IAM keys for humans, wildcard policies 'to be easier later', using IAM for application end-user auth, and skipping MFA on the root account. Network: opening port 22 to the world, single-AZ production, hard-coded CIDRs in security groups, one NAT Gateway for all AZs, forgetting NLB cross-zone load balancing, and NAT egress for AWS-service traffic.
Data: single-AZ databases, untested backups, forever log retention, CloudWatch metric cardinality explosions, day-one single-table DynamoDB, hot partition keys, storing files in DynamoDB instead of S3, and forgetting LSIs are creation-only.
Compute and Operations
Compute: running EC2 for everything out of habit, manually patching instances instead of Systems Manager, long-running Lambdas that should be Step Functions, Spot for stateful workloads, and a single Region for global users with no CloudFront.
Operations: console-only changes drifting from IaC, skipping the change set on production CloudFormation updates, one CI/CD pipeline for everything, manual rollbacks, pager fatigue from over-alerting, and no runbook for common incidents.
Architecture, Cost, and Security
Architecture: premature microservices, premature multi-Region, premature CQRS/event-sourcing, building a custom thing AWS already offers managed, and lift-and-shift without rethinking. The recurring theme: premature optimization is more common and more costly than under-optimization. Start simple; add complexity when measurements demand it.
Cost: untagged resources, dev running 24/7, On-Demand for steady workloads, Multi-AZ in dev, never deleting old snapshots. Security: secrets in source code, accidentally public buckets, disabling CloudTrail to save money, skipping GuardDuty in non-production, and no SCPs.
Premature optimization — microservices, multi-Region, CQRS adopted before measurement justifies them — the more common and costlier error.
Start simple — a well-structured monolith, Multi-AZ single Region, a normal database — add complexity when it hurts.
Under-optimization — rarer; usually caught by the bill or an incident before it becomes existential.
- Long-lived IAM access keys for humans and wildcard policies granted 'to be easier later'.
- Opening port 22 (or the database) to 0.0.0.0/0 and running single-AZ production.
- Untested backups, forever log retention, and CloudWatch metric cardinality explosions.
- Premature microservices, multi-Region, or CQRS before measurement justifies the complexity.
- Storing secrets in source code and leaving S3 buckets accidentally public.
- Disabling CloudTrail to save money or skipping GuardDuty and SCPs in non-production accounts.
- Use Identity Center and roles; least-privilege policies tightened from real usage; MFA on root.
- Default to private subnets, security groups referencing other groups, NAT per AZ, and VPC endpoints.
- Make Multi-AZ the production floor, set log retention deliberately, and aggregate metrics before publishing.
- Start simple — monolith, single Region, normal database — and add complexity only when measured.
- Keep secrets in Secrets Manager, Block Public Access on, CloudTrail on, and GuardDuty everywhere.
- Use change sets, automated rollback, per-service pipelines, and practiced runbooks.
Knowledge Check
What is the recurring theme across AWS architecture anti-patterns?
- Premature optimization (microservices, multi-Region, CQRS) is costlier than under-optimization — start simple
- Under-provisioning capacity, not over-engineering, is by far the single dominant and costliest mistake teams make
- Leaning on managed services like RDS and SQS is usually the wrong call and adds avoidable cost
- Multi-AZ deployment is needless over-engineering for production workloads
Which is a common identity anti-pattern?
- Long-lived IAM access keys for human engineers instead of Identity Center with MFA
- Using IAM roles for EC2 instances instead of embedding long-lived access keys in the AMI
- Enabling MFA on the root account and locking away its credentials
- Granting least privilege and scoping each policy narrowly
Why is 'disabling CloudTrail to save money' an anti-pattern?
- The audit log is the evidence base for every incident response and compliance audit; management events are cheap
- CloudTrail simply cannot be disabled once a trail has been created
- Capturing every management event measurably improves request performance but quietly weakens your account's security posture
- The audit trail only matters in the production account and Region
What is the right response to most of these anti-patterns?
- Use the corresponding AWS service or built-in feature that does the right thing, and start simple
- Build a custom in-house solution from scratch to avoid vendor lock-in
- Adopt every advanced pattern up front so the architecture never has to be refactored or revisited later on
- Ignore them until a production incident forces an emergency fix
You got correct