Service 23

Amazon Timestream

DatabaseTime-SeriesManaged

Timestream is AWS's database for time-series data — measurements that arrive over time and are queried with time at the center: sensor readings, server metrics, IoT telemetry, event logs, market ticks. Storing this in a relational table works until the table has billions of rows and queries crawl.

Since 2024 it comes in two variants: Timestream for LiveAnalytics, the original serverless engine with SQL and tiered storage, and Timestream for InfluxDB, a managed deployment of open-source InfluxDB. Pick LiveAnalytics to start fresh with SQL; pick InfluxDB to keep existing InfluxDB code and its Flux/Telegraf ecosystem.

Timestream for LiveAnalytics

LiveAnalytics is serverless and AWS-native. Each row is one measurement at one moment with three parts: dimensions (string labels like device ID or host), measures (numeric or string values like temperature or CPU percent), and a time column. The model is schemaless like DynamoDB — rows can carry different dimensions.

Tiered storage keeps recent data in a fast memory store (retention from hours to days) and ages older data automatically to a cheaper magnetic store (days to years). Queries use SQL with time-series functions for interpolation, binning, fill-missing, and rate.

Timestream for InfluxDB

This variant runs managed open-source InfluxDB — a real InfluxDB instance, not a clone — so the entire ecosystem (Flux, Telegraf, the InfluxDB CLI, dashboards) works unchanged. It is instance-based rather than serverless: you pick an instance class and storage size. Choose it when ecosystem fit matters more than SQL or the serverless model.

Ingestion and Visualization

For LiveAnalytics, writes go through the WriteRecords API (up to 100 records per call), and a common streaming pattern is Kinesis Data Streams to Lambda to Timestream. Reads use SQL via a JDBC driver, and QuickSight, Grafana, and the console can all query it. For InfluxDB, standard InfluxDB tooling applies.

Timestream vs relational vs Redshift

Timestream — high-volume time-stamped measurements queried with time at the center, where tiered hot/cold storage and time functions matter.

RDS / Aurora — modest time-series volumes that still fit comfortably in a relational table with normal indexes.

Redshift — broad analytical queries across many dimensions of warehouse data, not specifically time-centered telemetry.

Common Mistakes

Forcing high-volume time-series into a relational table until billions of rows make queries crawl — that is what Timestream is for.
Choosing the LiveAnalytics variant when the team already runs InfluxDB code and tooling, or vice versa.
Misconfiguring memory-store vs magnetic-store retention, so recent-data queries are slow or hot storage costs balloon.
Writing records one at a time instead of batching up to 100 per WriteRecords call.
Using Timestream for relational workloads with joins and updates — it is built for append-mostly time-stamped data.
Ignoring the schemaless dimension model and assuming fixed columns like a relational table.

Best Practices

Pick LiveAnalytics for new SQL-based work; pick InfluxDB to reuse existing InfluxDB code and ecosystem.
Tune memory-store and magnetic-store retention to your hot/cold query pattern.
Batch writes (up to 100 records per call) for efficient ingestion.
Use the Kinesis → Lambda → Timestream pattern for streaming ingestion at scale.
Visualize with QuickSight or Grafana rather than building custom dashboards.

Comparable services GCP Bigtable (time-series patterns), BigQueryAzure Azure Data Explorer

Knowledge Check

What are the three parts of a Timestream LiveAnalytics row?

Dimensions (string labels), measures (the values), and a time column
A partition key, a sort key, and an opaque binary payload blob
A leader node, several parallel compute nodes, and separate storage nodes
An RDF-style subject, a predicate, and an object triple

How does LiveAnalytics tiered storage work?

Recent data stays in a fast memory store and ages automatically to a cheaper magnetic store based on retention settings
All ingested data is kept resident in the fast memory store forever, regardless of its age
Data is automatically sharded across multiple AWS Regions according to each row's timestamp value
Old data is simply deleted outright the very moment the in-memory store fills up, with nothing ever moved to cheaper storage

When would you choose Timestream for InfluxDB over LiveAnalytics?

When the team already has InfluxDB code and wants the Flux/Telegraf ecosystem rather than SQL and serverless
When the team simply wants a fully serverless time-series engine that is queried with plain, familiar standard SQL
When the workload needs rich relational joins and multi-row ACID transactions
When the workload needs a dedicated graph traversal query language

What kind of workload is Timestream purpose-built for?

High-volume, append-mostly, time-stamped measurements queried with time at the center
Transactional workloads with frequent in-place row updates and many-table relational joins
Deep many-hop relationship traversals across a connected graph of nodes
Microsecond single-key key-value cache lookups by exact key

You got correct