Amazon DynamoDB
DynamoDB is AWS's fully managed NoSQL key-value and document database. You create a table, define a primary key, and read and write items; AWS handles partitioning, replication, scaling, and patching with single-digit-millisecond latency regardless of table size — petabyte tables exist in production, with no maintenance windows.
The framing that matters: DynamoDB is not a drop-in for an RDBMS. Design your access patterns up front and it delivers predictable speed at any scale; expect flexible ad-hoc queries and a forgiving schema and you will be unhappy.
Primary Keys and Partitions
A table is a collection of items (collections of attributes). Only the primary-key attributes must exist on every item; everything else is schemaless. A key is either partition-key-only or partition-key-plus-sort-key — items sharing a partition key are stored together, sorted, which is what makes "the last 50 orders for customer 42" cheap.
DynamoDB hashes the partition key to assign items to physical partitions. The hot partition problem is the biggest pitfall: a low-cardinality key, or one value far hotter than the rest, hits the per-partition limit (about 1000 WCU / 3000 RCU) no matter how much table capacity you provision. Partition on something with millions of distinct values.
Secondary Indexes
A Global Secondary Index (GSI) gives any attribute its own partition/sort key, has independent capacity, and can be added any time. A Local Secondary Index (LSI) shares the table's partition key and capacity and — critically — can only be created with the table. If there is any chance you need an LSI, define it up front.
Indexes cost storage and write capacity, since every base write propagates to indexes that project the changed attributes. Project only the attributes your queries need, and design indexes for queries you actually run.
Capacity, Consistency, and Cost
Capacity is measured in RCUs and WCUs. Reads are eventually consistent by default; strongly consistent reads cost 2× and are unavailable on GSIs. Writes are always strongly consistent. On-Demand mode bills per request with no planning and is the default for new tables; Provisioned mode is cheaper for steady, describable load and supports auto-scaling and reserved capacity.
The classic cost trap is a Scan: it reads every item and consumes capacity for all of them regardless of how many match the filter. One unguarded Scan on a critical path can spike a bill by orders of magnitude — use Query on an index instead.
Streams, Global Tables, and Operational Features
DynamoDB Streams emit a change-data-capture feed (24-hour retention), most often consumed by Lambda for fan-out, denormalization, or search indexing. Global Tables give active-active multi-Region replication with last-writer-wins conflict resolution — fine for independent regional data, wrong when you need a single global truth at write time.
ACID transactions cover up to 100 items in one Region at 2× capacity. Point-in-Time Recovery covers the last 35 days; TTL auto-deletes expired items at no capacity cost (deletion can lag up to 48 hours). DAX adds microsecond cached reads for hot items.
DynamoDB — known access patterns needing predictable single-digit-ms latency at any scale, with no idle instance cost. Cheaper than RDS at low and moderate traffic.
RDS / Aurora — ad-hoc queries, joins, aggregations, and flexible or changing schemas. The right choice when query shapes are not fixed up front.
- Choosing a low-cardinality partition key (a status enum, country code, or tier) — one partition catches fire while the rest sit idle. Partition on user, device, or order IDs.
- Running a
Scanon a critical path — it consumes capacity for every item in the table, not just matches. Query an index instead. - Assuming you can add an LSI later — LSIs exist only at table creation; plan them up front or use a GSI.
- Picking Provisioned capacity blind — start On-Demand and switch only once you can describe traffic in numbers.
- Relying on Global Tables where a single global write-time truth is required — conflict resolution is last-writer-wins and silently overwrites.
- Treating DynamoDB like SQL and bolting on query shapes after launch instead of modeling access patterns first.
- Design access patterns first, schema second — query shapes are expensive to add later.
- Pick a partition key with high cardinality and even traffic; it is the #1 design decision.
- Use Query on indexes, never Scan on a hot path.
- Define any LSIs at table creation; add GSIs later as needed.
- Start in On-Demand mode; move to Provisioned only with a measured workload.
- Enable PITR on every production table and use TTL for naturally expiring data.
Knowledge Check
What is the hot-partition problem in DynamoDB?
- A low-cardinality or skewed partition key concentrates traffic on one partition, hitting per-partition limits regardless of capacity
- Defining too many global secondary indexes on a single table slows down every single write to the base table as all the updates propagate out
- Issuing too many strongly consistent reads overheats the shared in-memory cache and throttles the whole table
- The table silently runs out of its provisioned storage once it grows past a few terabytes of stored data
Which statement about DynamoDB secondary indexes is correct?
- GSIs can be added any time; LSIs can only be created with the table
- LSIs can be added at any time; GSIs can only be created together with the table
- Both index types can only ever be created at the moment the table is created
- Secondary indexes are free and add no storage or write-capacity cost at all
Why is a Scan dangerous on a critical path?
- It reads and bills for every item in the table regardless of how many match the filter
- It takes an exclusive lock on the entire table and blocks all writes the whole time it runs
- It can only be run once per day per table because of a hard service quota
- It quietly bypasses the table's encryption at rest while reading items
What is the conflict-resolution behavior of DynamoDB Global Tables?
- Last-writer-wins at the item level — fine for independent regional data, wrong when a single global truth is required
- Strongly consistent across every replica Region on each write, so all Regions always agree instantly on the latest value
- A write is rejected and rolled back unless every other replica Region acknowledges it first
- The earliest write by timestamp always wins and later conflicting writes are discarded
What is the recommended capacity mode for a new, unpredictable workload?
- On-Demand — it absorbs spikes with no planning and is the default for new tables
- Provisioned mode set to a high fixed capacity ceiling to be safe
- Provisioned mode with three years of reserved read and write capacity purchased up front
- There is only one capacity mode, so the choice does not matter
You got correct