Topic 22

Warehouses, Caches, and More

Concept

The databases you have met so far — relational and NoSQL — handle everyday application traffic: customer records, orders, user profiles. But some data problems are shaped differently. Crunching years of sales data for a business report is a different job from answering a mobile app's request in under a millisecond. The cloud offers specialist data stores tuned for exactly those jobs.

This is a quick tour, not a deep dive. The goal is recognition: when you hear "data warehouse" or "Redis cache," you will know roughly what it is for, even if you have never configured one.

A useful image: a research library for deep, slow, thorough study (a warehouse) versus the stack of books sitting on your desk that you reach for constantly without leaving your chair (a cache). Same institution, very different purposes.

Data Warehouses: Built for Analysis

A data warehouse is a database optimized for analytical queries over very large amounts of historical data. Where a regular database is built for fast, frequent, small reads and writes (an order being placed, a user logging in), a warehouse is built for large, sweeping queries: "total sales by region for the last five years," "which product categories are declining month over month." These queries scan enormous amounts of data and return aggregate results for reports and dashboards.

The two workloads are so different that they use different internal designs. A regular database organizes data row by row, optimized for fetching single records quickly. A warehouse organizes data column by column, which is far more efficient when you are summing or averaging one field across billions of rows. Running heavy analytical queries on a regular application database slows down your live users — so organizations keep a separate warehouse fed from the main database on a regular schedule.

Caches: Built for Speed

A cache is a data store that keeps frequently-needed data in memory — the computer's fast, temporary working space — rather than on a disk. Reading from memory is orders of magnitude faster than reading from disk. When an application needs the same piece of data many times per second (the front page of a popular website, a user's session information, a leaderboard), a cache can answer those requests in microseconds instead of milliseconds.

The trade-off is that memory is temporary. A cache is not for permanent storage; it is a speed layer that sits in front of your real database. When the cached copy expires or the server restarts, the data is gone — but that is fine, because the real copy lives in the database. Caches are explicitly a shortcut, not a replacement. The most widely-used cache software has been Redis; a 2024 license change spawned an open-source fork called Valkey, which AWS ElastiCache and Google Memorystore now offer as Valkey options, while Azure Cache for Redis is still Redis-based. Check the engine/version choices when creating a managed cache. All three are Redis-compatible.

A Few Other Specialists

Beyond warehouses and caches, the cloud offers several other specialized data stores worth naming without going deep. A search engine (such as Elasticsearch or OpenSearch) indexes text so you can run full-text searches — "find all customer tickets mentioning 'billing error'" — far faster than a regular database can. A time-series database is optimized for sequences of measurements over time — server metrics, sensor readings, financial price history. A graph database models data as a network of nodes and relationships, ideal for things like social connections or fraud detection where the links between records matter as much as the records themselves.

The pattern across all of them is the same: pick the store whose internal design matches the shape of your problem. Most real systems run several stores at once — a relational database for core records, a warehouse for analytics, a cache for speed.

Why Big Systems Use Several Stores

There is no single data store that is best for everything. A hammer is not a screwdriver. Using a regular database for analytics is slow; using a warehouse for live application traffic is wrong. The cloud makes it practical to run the right tool for each job, because all of them can be managed services you connect to — you do not have to build or own the infrastructure for each one separately.

The data store landscape

Data stores

right tool for each job

Relational

structured records, strict rules, SQL

NoSQL

free-form records, high scale

Warehouse

analytics and reporting at scale

Cache

in-memory speed layer

Specialized

search, time-series, graph, and more

Data warehouseAWS RedshiftGoogle Cloud BigQueryAzure Azure Synapse Analytics

Managed cache (Redis / Valkey)AWS ElastiCacheGoogle Cloud MemorystoreAzure Azure Cache for Redis

Common Confusions

"A data warehouse is just a very large database." It is optimized for a fundamentally different kind of query — sweeping analytics over historical data, not fast reads and writes for live traffic. The internal design is different, which is why running analytics queries on a regular database slows down your application.
"A cache stores data permanently." A cache is fast but temporary. It is a speed layer in front of your real database. When the cached entry expires or the server restarts, the data goes away — which is fine, because the real copy is always in the durable database behind it.
"Every system needs all of these." Only as the problem calls for them. A small application may need nothing beyond a single managed relational database. Warehouses, caches, and specialists appear when a specific need emerges — analytics at scale, a performance bottleneck, full-text search.

Why It Matters

"BigQuery," "Redshift," "Redis cache," and "data warehouse" come up constantly in cloud teams. Knowing roughly what each is for — analytics, speed — lets you follow those conversations without a search engine.
The warehouse-versus-database distinction explains why analytical queries are kept separate from live application traffic — a design decision every data-heavy organization makes.
The cache concept appears everywhere: web pages, APIs, mobile apps. Understanding that a cache is a temporary speed layer — not storage — prevents the common mistake of treating it as a backup.

Knowledge Check

What is a data warehouse designed to do?

Answer individual record lookups as quickly as possible
Keep frequently-used data in memory so it can be served to users in under a millisecond
Run large analytical queries over historical data for reports and dashboards
Back up a production database to a safe secondary location

What makes a cache different from a regular database?

A cache stores data permanently with stronger durability guarantees
It keeps data in memory for fast reads, but data is temporary
It enforces stricter data rules than a relational database
It is optimized for sweeping analytical queries over years of data

Why do large organizations keep a separate data warehouse instead of running analytics on their main application database?

Warehouses cost less per gigabyte than regular managed databases
The main database only holds temporary data; the warehouse keeps history
Large analytical queries on the main database would slow down live users
Warehouse queries require a different team with special security clearance

You got correct