Topic 58

Vulnerability Scanning

SecurityCI gate

An image is a frozen snapshot of an OS userland plus your dependencies, and the day after you build it, new CVEs get published against packages already inside it. A scanner — docker scout, trivy, or grype — reads the image's layers, enumerates the OS and language packages, and matches each version against CVE databases. The work is to run that scan in CI, fail the build on critical findings, and accept that base-image CVEs are inherited, so patching usually means rebuilding on an updated base rather than editing the image.

From scan to patched rebuild

scan image layers

→

match against the CVE database

→

fail the build on CRITICAL

→

rebuild on an updated base

Nothing about scanning runs the container — it inspects static layers. That makes it cheap to wire into a pipeline and cheap to run on a schedule, which matters because the threat is not a bug you wrote but a disclosure published after you shipped. This topic is where the slim-base choice from Chapter 4 pays off in fewer findings.

What a Scanner Reads

A scanner walks the image layers, lists the installed OS packages (apt or apk) and language dependencies (pip, npm, and the like), and compares each version against CVE feeds, producing findings keyed by severity — CRITICAL, HIGH, MEDIUM, LOW. It is a static inspection: nothing executes the image, it reads the package metadata baked into the layers. That is why the same scan gives the same answer whether the container is running or has never started.

Inherited Base-Image CVEs

Most findings come from the base, not your code. A python:3.12-slim base drags in its own system libraries, and a CVE in any one of them is yours by inheritance the moment you build on it. This is the direct payoff of the minimal-base choice from Chapter 4 topic 12: less userland means fewer inherited CVEs, and a distroless or -slim base can cut the finding count by an order of magnitude against a full ubuntu base before you write a line of Dockerfile.

Scan in CI and Fail on Critical

Wiring trivy image or docker scout cves into the driftwood-io/app pipeline with a threshold turns scanning from a report into a gate. Fail the build on CRITICAL, warn on HIGH: a vulnerable driftwood/web is then caught before it is tagged and pushed, not discovered in production weeks later.

A report step and a critical-CVE gate in the driftwood-io/app pipeline

# .github/workflows/release.yml — scan steps, before tag-and-push
- name: Report HIGH and CRITICAL CVEs       # visibility only — never fails the job
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: registry.driftwood.example/driftwood/web@${{ steps.build.outputs.digest }}
    severity: HIGH,CRITICAL
    exit-code: "0"           # report, do not block
    ignore-unfixed: true     # skip CVEs with no available fix yet
- name: Fail the build on CRITICAL CVEs      # the gate
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: registry.driftwood.example/driftwood/web@${{ steps.build.outputs.digest }}
    severity: CRITICAL
    exit-code: "1"           # only a CRITICAL finding fails the job
    ignore-unfixed: true

The two steps split reporting from gating, because exit-code fires on any severity in the filter, not a subset: the first runs with exit-code: 0 so HIGH findings are surfaced without blocking, and the second sets exit-code: 1 on severity: CRITICAL alone, so only a critical finding fails the job and nothing downstream tags or pushes the image. Collapse them into one step with severity: CRITICAL,HIGH and exit-code: 1 and HIGH blocks the build too; drop the gate entirely and you are back to a scan nobody reads.

Rebuild to Patch, Don't Edit

Because base-image CVEs are inherited, the durable fix is to bump and re-pin the base digest and rebuild on it, then re-tag and re-scan. An ad-hoc RUN apt upgrade tacked on the end does clear the CVEs it covers — scanners read the final package state, not the bytes in lower layers — but it is a poor substitute: it is non-reproducible (you can't say which versions a future build will pull), it bloats the image because the old package bytes still sit in the layers below, and it papers over a stale base instead of moving to a maintained one. Editing a running container fixes nothing durable, since the change dies with the container (Chapter 1's disposability point applied to security).

Triage and Suppression

Not every CVE is reachable or fixable today — a finding may be in a code path the app never calls, or simply have no patched version available yet. Scanners support allowlisting a reviewed CVE with a reason and an expiry, so a known-unexploitable finding does not block every build forever while still surfacing for review when the expiry lapses. The discipline is time-boxed, documented suppression — not silently ignoring findings, and not letting one stale CVE halt the pipeline indefinitely.

Common Mistakes

Scanning only once at release and never again — CVEs are published against packages already in shipped images, so an image clean at build time accrues vulnerabilities while it sits; rescanning on a schedule, not just at build, is how inherited CVEs surface.
Patching a CVE by editing the running container, or leaning on an ad-hoc RUN apt upgrade instead of updating the base — editing a container dies with it, and an appended upgrade is non-reproducible and bloats the image; the durable fix is a rebuild on an updated, re-pinned base.
Running the scan as a non-blocking step that only prints findings — without a fail threshold the pipeline ships critical CVEs anyway, and the scan becomes a report nobody reads.
Choosing a fat ubuntu base then fighting the resulting CVE count — a smaller base (-slim, distroless) inherits far fewer system-package CVEs, so the base choice in Chapter 4 directly sets the scanning workload.

Best Practices

Run trivy, docker scout, or grype in the driftwood-io/app pipeline with a fail-on-CRITICAL threshold so a vulnerable driftwood/web never gets tagged and pushed.
Rescan published images on a schedule, not only at build, so newly disclosed CVEs against already-shipped packages are caught and trigger a rebuild.
Patch by bumping and re-pinning the base digest and rebuilding (Chapter 4 topic 12), then re-scanning, rather than editing layers or containers in place.
Triage findings with time-boxed, documented suppressions for reviewed unexploitable CVEs, so the gate blocks real risk without one stale finding halting every build.

Comparable tools trivy · grype the open scanners; docker scout Docker's built-in one Harbor · ECR/GCR/ACR offer registry-side scanning on push syft generates the SBOM that scanners and provenance consume Snyk · Clair cover the same CVE-matching ground

Knowledge Check

What does a vulnerability scanner actually inspect, and where do most findings come from?

It statically reads the image layers' package metadata against CVE feeds, and most findings come from the base image, not your code
It runs the container live and watches its system calls and outbound network traffic for signs of active exploits
It statically analyzes your own application source code line by line, which is where the large majority of an image's reported CVEs actually originate
It actively fuzzes the running application in order to discover previously unknown, never-before-disclosed vulnerabilities live

Why does the choice of base image directly change an image's CVE count?

Base-image CVEs are inherited, so a smaller base ships less userland and inherits far fewer system-package CVEs
The scanner weights and tallies its findings differently depending on exactly which base image it detects underneath your layers
A smaller base automatically downgrades every inherited CVE's severity from critical down to medium across the report
A minimal base causes the scanner to skip over your application's own language dependencies entirely during the scan

Why does a scan need to be a build gate and also run on a schedule?

A fail threshold blocks a vulnerable image at build, and scheduled rescans catch CVEs disclosed after the image shipped
A single non-blocking scan run once at build time is entirely enough on its own; the recurring schedule is just a redundant backup
The scheduled scan automatically rebuilds and then redeploys the affected image on its own every single time it runs
Build gates are inherently unreliable, so the recurring schedule exists mainly to retry the scans that randomly fail in CI

Why does patching a base-image CVE mean rebuilding rather than editing the image?

Editing a container dies with it; a re-pinned base rebuild is the reproducible fix, while an ad-hoc upgrade is non-reproducible and bloats the image
A RUN apt upgrade appended to the very end of the Dockerfile cleanly removes the vulnerable lower-layer bytes underneath it once and for good
Editing the package inside the running container patches the underlying image permanently for every future pull of that tag
The registry can rewrite and patch the affected layer in place on the server side, propagating the fix to every existing tag, once you report the specific CVE to it directly

You got correct