Topic 02

Observability: Logs, Metrics, Traces

Concept

Once software is running, you can't just open it up and look inside — it's off on a server somewhere, serving thousands of users. So it has to tell you how it's doing. Observability is the ability to understand what a running system is doing from the outside, built from three kinds of signal: logs, metrics, and traces.

Each signal answers a different question, and together they let you understand and debug a system you can't pause or poke at directly. This is core vocabulary for anyone who runs software.

A car dashboard is the perfect picture: warning lights flag specific events (logs), gauges show numbers changing over time (metrics), and a trip computer follows the whole journey (traces). You drive by what the dashboard tells you, not by opening the engine.

The three pillars of observability

Logs

Timestamped records of individual events — what happened, and when. Best for the detail of a specific moment.

Metrics

Numbers measured over time — request rate, error rate, latency. Best for trends and alerting.

Traces

The path of one request across services. Best for finding where, in a chain, the time or failure went.

Logs

Logs are timestamped records of individual events — "user 123 logged in", "failed to send email at 9:04". They're the first thing you read when something breaks, because they tell you what happened and when, in detail. A log is the system writing down notable moments as they occur, so you can reconstruct the story afterward.

Metrics

Metrics are numbers measured continuously over time — requests per second, error rate, response time, memory used. Where a log is one event, a metric is a trend you can graph and watch. Metrics answer "how much" and "how often", and they're what tell you, at a glance, whether the system is healthy right now or drifting toward trouble.

Traces

Traces follow a single request as it travels through the system — touching this service, then that database, then this other piece — and show how long each step took. In a system made of many parts, a trace answers "where did this one request slow down or fail?" It stitches the pieces together into the story of one user's journey through the system.

Why All Three

You need all three because each answers a question the others can't. Metrics tell you something is wrong ("error rate spiked"); logs tell you what happened ("database connection failed"); traces tell you where ("the slowdown was in the payment service"). Together they make a system observable — understandable from the outside. Relying on just one leaves blind spots that hurt most exactly when something is breaking.

When Cadence's reminders start firing late, the team uses all three. The logs show an error at send time. The metrics reveal a latency spike every morning at 9 AM. And a trace follows one slow reminder and pinpoints the culprit: a slow database query under the morning load. No single signal would have solved it; together, they turn a vague "reminders are late" into an exact, fixable cause.

Common Confusions

"Logs are enough on their own." Logs tell you what happened, but metrics show trends over time and traces show where in the system a problem is. Each answers a question logs can't.
"Observability is just another word for monitoring." Monitoring watches known things you set up in advance; observability is being able to ask new questions of a running system, including ones you didn't anticipate.
"You add observability after launch if there's a problem." It's built in from the start — a system has to be designed to emit logs, metrics, and traces. Bolting it on mid-crisis is far harder.

Why It Matters

Observability is how teams understand live systems they can't pause — without it, running software is flying blind.
Logs, metrics, and traces are core vocabulary on any operations team, and the doorway into observability as a deep field of its own.
Knowing each signal answers a different question helps you reach for the right one when something breaks, instead of guessing.

Knowledge Check

What are logs?

Timestamped records of individual events, showing what happened and when
Numbers measured continuously over time, like requests per second
The full path that one single request takes as it moves through the system
The original source code of the application before it was built

What are metrics?

Numbers measured continuously over time that you can graph and watch
Timestamped records of single events, such as one single failed email send
The path a single request takes as it travels through the system
The application's source code, which defines how it all behaves

What does a trace show?

A single request's path through the system and how long each step took
A continuous number tracked over time, such as the error rate
A single timestamped record of just one event that occurred in the system
The full source code of every part the request passes through

Why do teams need all three signals, not just one?

Each answers a different question — that, what, and where something is wrong
Because collecting all three makes the finished program run faster
Because a strict rule requires every system to collect exactly three signals
Because three signals is the most a system is ever allowed to collect

How does observability differ from monitoring?

Monitoring watches known things; observability lets you ask new questions
They are two identical words that mean exactly the same activity
Observability is just a way to make the running software run faster
Observability writes the code, while monitoring is what tests that same code

You got correct