Topic 28

Logs, Metrics, and Alerts

Concept

The last topic said Pageturn has to be watched once it's live. But Sam can't sit and stare at the site all day, and staring wouldn't tell him much anyway. Watching software in practice comes down to three plain things that do the looking for him — and once you have names for them, every monitoring tool you'll ever see starts to make sense.

Those three things are logs, metrics, and alerts. Logs are the recorded diary of what happened. Metrics are numbers measured over time. Alerts are the automatic taps on the shoulder when one of those numbers crosses a line you care about. Think of a car's dashboard: the trip history is the log, the live gauges are the metrics, and the warning light that flicks on is an alert. (We'll drop the dashboard shortly and use the real terms.)

Three plain things that watch the running app, and what each one answers

Logs

Timestamped records of events, written down as they happen — "user logged in," "error saving favorite." Answer: what happened, and exactly when?

Metrics

Numbers measured over time — requests per second, error rate, how long a page takes. Answer: how is the app doing, as a trend?

Alerts

Automatic warnings that fire when a metric crosses a set line — errors too high, the site too slow. Answer: do I need to look right now?

Logs: the diary of what happened

A log is a recorded event — a single line written down the moment something happens, stamped with the time it occurred. "3:42:07 — Maya's reader logged in." "3:42:09 — error saving a favorite." Strung together, these lines form a log: a timestamped diary of everything the app did, in order.

Logs are where you go to find out exactly what happened, after you already know something is wrong. When a reader reports that saving a favorite failed, Sam opens the logs around that time and reads the actual events leading up to it. Logs are detailed and specific — which is their strength for digging in, and, as we'll see, their weakness if you drown in them.

Metrics: numbers over time

A metric is a number measured repeatedly over time. Not a one-off reading — a value sampled again and again so you can watch it move: how many requests Pageturn handles per second, what fraction of them fail (the error rate), how many milliseconds a page takes to load. Each is a number, tracked as a line on a chart.

Where a log tells you about one event, a metric tells you about the trend. A single failed request is one line in a log; the error rate climbing from one in a thousand to one in ten is a metric, and that's the shape of a problem forming. Metrics are how you see the health of the whole app at a glance, without reading every individual event.

Alerts: automatic warnings

Nobody can watch the charts every minute, so the third piece does it for you. An alert is an automatic warning that fires when a metric crosses a line you set in advance — "tell me if the error rate goes above two percent," "tell me if a page takes longer than three seconds." When the number crosses, the alert reaches out: a message, a page, a notification.

The key word is automatic. An alert means the machine is watching the metrics so a human doesn't have to, and it speaks up only when something actually needs attention. Crucially, a good alert is an early warning — it fires while a problem is forming, not after Pageturn is already down. The point is to look before users feel it.

Putting It Together: Monitoring and Observability

Used together, these three answer different questions: a metric tells you something is wrong, an alert makes sure you notice, and the logs let you dig in and find out why. That after-the-fact insight — understanding what's going on inside a running system from the outside, just from what it reports — is roughly what people mean by observability. One honest caveat: tools usually name the core observability signals as logs, metrics, and a third called traces, which follows a single request across the system. Traces are beyond this beginner course, and an alert is really a notification built on top of a metric rather than a separate signal — but logs, metrics, and alerts are the three you'll meet first.

This is the simplified picture — observability runs deeper than three words, and there are whole tools and a whole craft built on top of it. That's a future course: the Observability Deep Dive picks up exactly here and teaches how real teams collect, store, and read these signals. For now, the three terms — logs, metrics, alerts — are the vocabulary that makes every monitoring screen you'll ever meet readable.

Common Confusions

"Logs and metrics are the same thing." A log is a record of one event at a moment in time; a metric is a single number sampled over and over so you can watch it trend. One is the diary, the other is the gauge.
"An alert means the site is already down." A good alert is an early warning that fires while a problem is still forming — the whole point is to look before users feel it, not after everything has failed.
"More logging is always better." Past a point, more logs just bury the important lines in noise. The signal you need gets lost in the flood — useful logging is selective, not maximal.
"Observability is a separate fourth tool you buy." It isn't a fourth thing — it's what you get by using logs, metrics, and alerts together to understand the system from the outside.

Why It Matters

Logs, metrics, and alerts are the everyday vocabulary of monitoring — name them once and the dashboards stop being a mystery.
Knowing which one answers which question — what happened, how's the trend, do I look now — is how you actually diagnose a live problem instead of guessing.
Alerts are what let a small team run software without anyone staring at a screen all day; the machine watches and only speaks up when it matters.
This is the foundation the Observability Deep Dive builds on — the future course where you'll learn to collect and read these signals for real.

Knowledge Check

What is the difference between a log and a metric?

A log records one event in time; a metric is a number tracked over time
A metric records one event; a log is a number tracked as a trend
They are two names for exactly the same recorded data
Both are automatic warnings that fire the moment something goes wrong in production

A well-designed alert fires when Pageturn's error rate starts climbing, before the site is fully down. Why is that the goal?

It warns early, so the team can act before users are hit hard
Alerts should only fire once the site is completely unreachable
The alert automatically fixes the rising error rate on its own
An alert is the detailed, timestamped record of each failed request

A metric shows Pageturn's error rate spiking. Which signal does Sam turn to next to find out why?

The logs, to read the actual events around the time of the spike
Another metric, since metrics explain the underlying cause behind every spike
The alert, because the alert contains the full explanation
Nothing — a spike in a metric can't be investigated further

You got correct