Common Anti-Patterns
Service 65

Common Anti-Patterns

ArchitecturePitfallsPractices

A pattern is a recipe with known trade-offs; an anti-pattern is a recipe with known regret. These are the mistakes that show up most often in AWS workloads — some obvious in hindsight, some that look reasonable until they break.

Knowing about them ahead of time is cheaper than discovering them during an incident. Most have a corresponding AWS service or built-in feature that does the right thing.

Identity, Network, and Data

Identity: long-lived IAM keys for humans, wildcard policies 'to be easier later', using IAM for application end-user auth, and skipping MFA on the root account. Network: opening port 22 to the world, single-AZ production, hard-coded CIDRs in security groups, one NAT Gateway for all AZs, forgetting NLB cross-zone load balancing, and NAT egress for AWS-service traffic.

Data: single-AZ databases, untested backups, forever log retention, CloudWatch metric cardinality explosions, day-one single-table DynamoDB, hot partition keys, storing files in DynamoDB instead of S3, and forgetting LSIs are creation-only.

Compute and Operations

Compute: running EC2 for everything out of habit, manually patching instances instead of Systems Manager, long-running Lambdas that should be Step Functions, Spot for stateful workloads, and a single Region for global users with no CloudFront.

Operations: console-only changes drifting from IaC, skipping the change set on production CloudFormation updates, one CI/CD pipeline for everything, manual rollbacks, pager fatigue from over-alerting, and no runbook for common incidents.

Architecture, Cost, and Security

Architecture: premature microservices, premature multi-Region, premature CQRS/event-sourcing, building a custom thing AWS already offers managed, and lift-and-shift without rethinking. The recurring theme: premature optimization is more common and more costly than under-optimization. Start simple; add complexity when measurements demand it.

Cost: untagged resources, dev running 24/7, On-Demand for steady workloads, Multi-AZ in dev, never deleting old snapshots. Security: secrets in source code, accidentally public buckets, disabling CloudTrail to save money, skipping GuardDuty in non-production, and no SCPs.

Premature optimization vs under-optimization

Premature optimization — microservices, multi-Region, CQRS adopted before measurement justifies them — the more common and costlier error.

Start simple — a well-structured monolith, Multi-AZ single Region, a normal database — add complexity when it hurts.

Under-optimization — rarer; usually caught by the bill or an incident before it becomes existential.

Common Mistakes
  • Long-lived IAM access keys for humans and wildcard policies granted 'to be easier later'.
  • Opening port 22 (or the database) to 0.0.0.0/0 and running single-AZ production.
  • Untested backups, forever log retention, and CloudWatch metric cardinality explosions.
  • Premature microservices, multi-Region, or CQRS before measurement justifies the complexity.
  • Storing secrets in source code and leaving S3 buckets accidentally public.
  • Disabling CloudTrail to save money or skipping GuardDuty and SCPs in non-production accounts.
Best Practices
  • Use Identity Center and roles; least-privilege policies tightened from real usage; MFA on root.
  • Default to private subnets, security groups referencing other groups, NAT per AZ, and VPC endpoints.
  • Make Multi-AZ the production floor, set log retention deliberately, and aggregate metrics before publishing.
  • Start simple — monolith, single Region, normal database — and add complexity only when measured.
  • Keep secrets in Secrets Manager, Block Public Access on, CloudTrail on, and GuardDuty everywhere.
  • Use change sets, automated rollback, per-service pipelines, and practiced runbooks.
Comparable services GCP Architecture Framework anti-patterns guidanceAzure Well-Architected antipatterns guidance

Knowledge Check

What is the recurring theme across AWS architecture anti-patterns?

  • Premature optimization (microservices, multi-Region, CQRS) is costlier than under-optimization — start simple
  • Under-provisioning capacity, not over-engineering, is by far the single dominant and costliest mistake teams make
  • Leaning on managed services like RDS and SQS is usually the wrong call and adds avoidable cost
  • Multi-AZ deployment is needless over-engineering for production workloads

Which is a common identity anti-pattern?

  • Long-lived IAM access keys for human engineers instead of Identity Center with MFA
  • Using IAM roles for EC2 instances instead of embedding long-lived access keys in the AMI
  • Enabling MFA on the root account and locking away its credentials
  • Granting least privilege and scoping each policy narrowly

Why is 'disabling CloudTrail to save money' an anti-pattern?

  • The audit log is the evidence base for every incident response and compliance audit; management events are cheap
  • CloudTrail simply cannot be disabled once a trail has been created
  • Capturing every management event measurably improves request performance but quietly weakens your account's security posture
  • The audit trail only matters in the production account and Region

What is the right response to most of these anti-patterns?

  • Use the corresponding AWS service or built-in feature that does the right thing, and start simple
  • Build a custom in-house solution from scratch to avoid vendor lock-in
  • Adopt every advanced pattern up front so the architecture never has to be refactored or revisited later on
  • Ignore them until a production incident forces an emergency fix

You got correct