Service 18

Amazon DynamoDB

NoSQLKey-ValueServerless

DynamoDB is AWS's fully managed NoSQL key-value and document database. You create a table, define a primary key, and read and write items; AWS handles partitioning, replication, scaling, and patching with single-digit-millisecond latency regardless of table size — petabyte tables exist in production, with no maintenance windows.

The framing that matters: DynamoDB is not a drop-in for an RDBMS. Design your access patterns up front and it delivers predictable speed at any scale; expect flexible ad-hoc queries and a forgiving schema and you will be unhappy.

Primary Keys and Partitions

A table is a collection of items (collections of attributes). Only the primary-key attributes must exist on every item; everything else is schemaless. A key is either partition-key-only or partition-key-plus-sort-key — items sharing a partition key are stored together, sorted, which is what makes "the last 50 orders for customer 42" cheap.

High-cardinality keyeven distribution

partition key = user#…

P1 · 33%P2 · 33%P3 · 34%

Traffic spreads across partitions. Partition on user, device, or order IDs — millions of distinct values.

Low-cardinality keyhot partition

partition key = status

P1 · 98% 🔥P2 · 1%P3 · 1%

One partition catches fire. A status enum or country code throttles at ~1000 WCU / 3000 RCU regardless of table capacity.

DynamoDB hashes the partition key to assign items to physical partitions. The hot partition problem is the biggest pitfall: a low-cardinality key, or one value far hotter than the rest, hits the per-partition limit (about 1000 WCU / 3000 RCU) no matter how much table capacity you provision. Partition on something with millions of distinct values.

Secondary Indexes

A Global Secondary Index (GSI) gives any attribute its own partition/sort key, has independent capacity, and can be added any time. A Local Secondary Index (LSI) shares the table's partition key and capacity and — critically — can only be created with the table. If there is any chance you need an LSI, define it up front.

Indexes cost storage and write capacity, since every base write propagates to indexes that project the changed attributes. Project only the attributes your queries need, and design indexes for queries you actually run.

Capacity, Consistency, and Cost

Capacity is measured in RCUs and WCUs. Reads are eventually consistent by default; strongly consistent reads cost 2× and are unavailable on GSIs. Writes are always strongly consistent. On-Demand mode bills per request with no planning and is the default for new tables; Provisioned mode is cheaper for steady, describable load and supports auto-scaling and reserved capacity.

The classic cost trap is a Scan: it reads every item and consumes capacity for all of them regardless of how many match the filter. One unguarded Scan on a critical path can spike a bill by orders of magnitude — use Query on an index instead.

Streams, Global Tables, and Operational Features

DynamoDB Streams emit a change-data-capture feed (24-hour retention), most often consumed by Lambda for fan-out, denormalization, or search indexing. Global Tables give active-active multi-Region replication with last-writer-wins conflict resolution — fine for independent regional data, wrong when you need a single global truth at write time.

ACID transactions cover up to 100 items in one Region at 2× capacity. Point-in-Time Recovery covers the last 35 days; TTL auto-deletes expired items at no capacity cost (deletion can lag up to 48 hours). DAX adds microsecond cached reads for hot items.

DynamoDB vs RDS/Aurora

DynamoDB — known access patterns needing predictable single-digit-ms latency at any scale, with no idle instance cost. Cheaper than RDS at low and moderate traffic.

RDS / Aurora — ad-hoc queries, joins, aggregations, and flexible or changing schemas. The right choice when query shapes are not fixed up front.

Common Mistakes

Choosing a low-cardinality partition key (a status enum, country code, or tier) — one partition catches fire while the rest sit idle. Partition on user, device, or order IDs.
Running a Scan on a critical path — it consumes capacity for every item in the table, not just matches. Query an index instead.
Assuming you can add an LSI later — LSIs exist only at table creation; plan them up front or use a GSI.
Picking Provisioned capacity blind — start On-Demand and switch only once you can describe traffic in numbers.
Relying on Global Tables where a single global write-time truth is required — conflict resolution is last-writer-wins and silently overwrites.
Treating DynamoDB like SQL and bolting on query shapes after launch instead of modeling access patterns first.

Best Practices

Design access patterns first, schema second — query shapes are expensive to add later.
Pick a partition key with high cardinality and even traffic; it is the #1 design decision.
Use Query on indexes, never Scan on a hot path.
Define any LSIs at table creation; add GSIs later as needed.
Start in On-Demand mode; move to Provisioned only with a measured workload.
Enable PITR on every production table and use TTL for naturally expiring data.

Comparable services GCP Cloud Bigtable, FirestoreAzure Cosmos DB

Knowledge Check

What is the hot-partition problem in DynamoDB?

A low-cardinality or skewed partition key concentrates traffic on one partition, hitting per-partition limits regardless of capacity
Defining too many global secondary indexes on a single table slows down every single write to the base table as all the updates propagate out
Issuing too many strongly consistent reads overheats the shared in-memory cache and throttles the whole table
The table silently runs out of its provisioned storage once it grows past a few terabytes of stored data

Which statement about DynamoDB secondary indexes is correct?

GSIs can be added any time; LSIs can only be created with the table
LSIs can be added at any time; GSIs can only be created together with the table
Both index types can only ever be created at the moment the table is created
Secondary indexes are free and add no storage or write-capacity cost at all

Why is a Scan dangerous on a critical path?

It reads and bills for every item in the table regardless of how many match the filter
It takes an exclusive lock on the entire table and blocks all writes the whole time it runs
It can only be run once per day per table because of a hard service quota
It quietly bypasses the table's encryption at rest while reading items

What is the conflict-resolution behavior of DynamoDB Global Tables?

Last-writer-wins at the item level — fine for independent regional data, wrong when a single global truth is required
Strongly consistent across every replica Region on each write, so all Regions always agree instantly on the latest value
A write is rejected and rolled back unless every other replica Region acknowledges it first
The earliest write by timestamp always wins and later conflicting writes are discarded

What is the recommended capacity mode for a new, unpredictable workload?

On-Demand — it absorbs spikes with no planning and is the default for new tables
Provisioned mode set to a high fixed capacity ceiling to be safe
Provisioned mode with three years of reserved read and write capacity purchased up front
There is only one capacity mode, so the choice does not matter

You got correct