Azure Cosmos DB
Service 18

Azure Cosmos DB

NoSQLMulti-model

Azure Cosmos DB is a globally distributed, multi-model NoSQL database with single-digit-millisecond reads, elastic throughput, and turnkey replication to any number of regions. You pick an API, a partition key, and a throughput model; Cosmos handles distribution, indexing, and the consistency guarantee you choose.

It is powerful and easy to misprice. Two decisions — the partition key and the throughput model — determine whether Cosmos is fast and economical or slow and ruinous. Most Cosmos horror stories trace back to a hot partition key or provisioned throughput set without understanding request units.

APIs

Cosmos exposes several APIs over the same engine. The NoSQL API (its native document API) gets features first and is the default for new work. The MongoDB, Cassandra, Gremlin (graph), and Table APIs provide wire compatibility so existing applications can move with little change. Choose the native NoSQL API unless you are migrating an app already written for one of the others.

Partitioning

Every container is partitioned by a key you choose. Cosmos hashes the key to distribute data and load across physical partitions. A good key spreads reads and writes evenly; a poor one concentrates them on a single logical partition — a hot partition — that throttles no matter how much throughput you provision. The partition key is effectively permanent, so this is the decision to get right first.

Partition Key — Even Distribution vs a Hot Partition
Good keyload spreads evenly
~25%~25%~25%~25%
Every partition takes a share of the requests, so provisioned RU/s is used fully and nothing throttles.
Hot keyload concentrates
90%idleidleidle
One partition takes most of the traffic and throttles at its share of RU/s while the others sit idle — adding throughput does not help.

Throughput

Throughput is measured in request units per second (RU/s) — every read and write costs RUs, and you pay for the RU/s you make available. Provisioned throughput reserves a fixed RU/s; autoscale ranges between 10% and 100% of a ceiling to follow load; serverless bills only for the request units each operation actually consumes, fitting spiky or dev workloads. Picking provisioned for a bursty app overpays; picking serverless for steady high load underperforms.

ModelBills forUse for
ProvisionedReserved RU/sSteady, predictable load
AutoscaleUp to a ceiling, scales 10–100%Variable or unpredictable load
ServerlessPer RU consumedSpiky, dev/test, low average traffic

Consistency Levels

Cosmos offers five consistency levels — Strong, Bounded Staleness, Session, Consistent Prefix, and Eventual — trading latency and availability against how fresh reads are. Session is the default and the right choice for most apps: it guarantees a client reads its own writes without paying for global Strong consistency. Strong is only available within constraints and costs latency and RUs.

Global Distribution

Adding a region is a checkbox; Cosmos replicates the data and serves local reads at low latency. Multi-region writes (multi-master) let every region accept writes, with conflict resolution policies for the inevitable collisions. This is Cosmos's headline strength — and each added region multiplies the provisioned-throughput cost, so global distribution is a deliberate spend, not a default.

Provisioned vs Autoscale vs Serverless throughput

Provisioned — Fixed reserved RU/s at the lowest unit price. Choose it for steady, predictable, high-volume load.

Autoscale — Scales between 10% and 100% of a ceiling automatically. Choose it for variable or unpredictable traffic.

Serverless — Billed per request unit consumed, no reservation. Choose it for spiky, low-average, or dev/test workloads.

Common Mistakes
  • Choosing a partition key that concentrates load — a hot partition throttles requests no matter how much throughput is provisioned, and the key cannot be changed in place.
  • Provisioning fixed throughput for a bursty workload, paying for peak RU/s around the clock when autoscale or serverless would cost a fraction.
  • Defaulting to Strong consistency out of caution, paying latency and RUs for a guarantee most apps do not need over Session.
  • Adding regions without accounting for cost — each region multiplies the provisioned-throughput bill.
  • Picking the MongoDB or Cassandra API for a new app instead of the native NoSQL API, forgoing the newest features for compatibility you do not need.
  • Ignoring the RU cost of queries — cross-partition and unindexed queries burn RUs and silently drive the bill.
Best Practices
  • Model the partition key for even read and write distribution before anything else — it is effectively permanent.
  • Use autoscale for variable load and serverless for spiky or dev workloads; reserve provisioned throughput for steady high volume.
  • Default to Session consistency unless a specific requirement justifies stronger; it gives read-your-writes without Strong's cost.
  • Treat each added region as a deliberate cost and capability decision, and define a conflict-resolution policy for multi-region writes.
  • Use the native NoSQL API for new applications; reserve the compatibility APIs for migrations.
  • Watch RU consumption per query and design indexes and access patterns to keep queries single-partition.
Comparable servicesAWS DynamoDBGCP Firestore / Spanner

Knowledge Check

What most directly causes a Cosmos DB container to throttle despite high provisioned throughput?

  • A hot partition — a partition key that concentrates load on one logical partition
  • Using Session consistency instead of Strong for every read issued against the container
  • Enabling autoscale throughput that raises the RU/s ceiling automatically
  • Deploying to only a single region instead of several

Which throughput model fits a spiky workload with low average traffic?

  • Serverless — billed for the request units each operation consumes, with no reserved RU/s
  • Provisioned — a fixed reserved RU/s ceiling billed continuously around the clock whether or not traffic arrives
  • Strong consistency mode for globally synchronized reads
  • Multi-region writes across several geographies

Why is Session the recommended default consistency level for most apps?

  • It guarantees a client reads its own writes without paying the latency and RU cost of global Strong consistency
  • It is the only consistency level that allows multi-region writes to be enabled
  • It disables the automatic indexer to save RUs on every write
  • It provides the strongest possible global consistency guarantee available, linearizing every read across all regions

What is the cost consequence of adding regions to a Cosmos DB account?

  • Each added region multiplies the provisioned-throughput cost
  • Regions are free to add; only the stored data is ever billed
  • Adding regions reduces RU cost by sharing one throughput pool across them
  • It converts the account to serverless billing automatically

You got correct