Packaging Concepts and Repositories
Topic 35

Packaging Concepts and Repositories

PackagingConcept

A package is an archive of files plus a block of metadata: the package name and version, the list of other packages it depends on, the maintainer scripts that run before and after install, and the absolute paths each file unpacks to. A package manager reads that metadata, resolves the full dependency graph, downloads everything from a repository, and records exactly what it installed in a local database. On Debian and Ubuntu that is apt on top of dpkg; on RHEL, Fedora, and Rocky it is dnf on top of rpm. The mechanics differ; the model is identical.

This is the whole reason a Linux server stays patchable. Every file on a managed system traces back to a package, every package traces back to a signed repository, and one command — apt list --upgradable — tells you what is out of date across the entire machine. The moment you install software by piping a script from the internet into a shell, that file belongs to no package, shows up in no upgrade list, and quietly stops receiving security fixes.

Package Contents and Metadata

A .deb is an ar archive containing two compressed tarballs: data.tar holds the actual files in their final tree layout, and control.tar holds the metadata. The control file names the package, its version, its architecture, and its Depends, Recommends, and Conflicts relationships. An .rpm packs the same information into a binary header followed by a cpio payload. You can look inside either without installing it.

The dangerous part of the metadata is the maintainer scripts. A package can ship preinst, postinst, prerm, and postrm scripts that run as root during install and removal — they create users, start services, and migrate config. Installing a package is therefore not just copying files; it is executing arbitrary code as root, which is exactly why the trust chain on the next sections matters.

# Inspect a .deb without installing it
dpkg-deb --info nginx_1.24.0-1_amd64.deb     # metadata + maintainer scripts
dpkg-deb --contents nginx_1.24.0-1_amd64.deb # every file and where it lands

# The RHEL-family equivalents
rpm -qip nginx-1.24.0-1.x86_64.rpm           # info
rpm -qlp nginx-1.24.0-1.x86_64.rpm           # file list

Repositories, Indexes, and the Cache

A repository is an HTTP or HTTPS server holding the package files plus index files that list every package, its version, its dependencies, and a checksum. The client never trusts the network round-trip directly — it downloads the index, verifies its signature, and only then knows what is available. On Debian/Ubuntu, apt update refreshes those indexes into /var/lib/apt/lists/; it downloads no packages, which is why update and upgrade are two distinct steps.

Downloaded .deb files land in /var/cache/apt/archives/ and stay there until you clear them. That cache lets you reinstall offline and lets a whole fleet pull through a local mirror or caching proxy instead of hammering the public mirrors. The cost is disk: on a long-lived server the archive cache grows to gigabytes, and apt clean is the command that reclaims it.

# Debian/Ubuntu: refresh indexes, then see what's stale
apt update
apt list --upgradable

# Where the metadata and cached packages live
ls /var/lib/apt/lists/          # downloaded indexes
du -sh /var/cache/apt/archives/ # cached .deb files

Dependency Resolution

Shared libraries are why dependency resolution exists. A web server links against libssl, which links against libc, and each of those is its own package shipped and patched independently. When you ask for one package, the manager computes the transitive closure of everything it needs, picks versions that satisfy every constraint at once, and installs them in an order that never leaves a half-configured system. dpkg alone does none of this — hand it a single .deb with unmet dependencies and it stops with an error.

Resolution can fail honestly. Two packages may both demand a different exclusive version of a third, an unsatisfiable conflict the manager refuses to paper over. The fix is to change what you are asking for, not to force past it — dpkg -i --force-depends will install the broken set and leave you with a system that throws missing-symbol errors at runtime, which is far harder to debug than the upfront refusal.

Signing and Trust

Every legitimate repository signs its index with a GPG key, and the client verifies that signature against a key you have explicitly trusted before it acts on a single byte. This is the only thing standing between you and a tampered mirror: without it, anyone who controls the network path or the mirror can serve a malicious package and your machine will run its root-level maintainer scripts without complaint. On modern Debian/Ubuntu, trust is scoped per-repository — the key lives in /etc/apt/keyrings/ and is bound to one source with the signed-by option, so a third-party repo can never authenticate a package from the main archive.

The deprecated pattern, apt-key add, dropped every key into one global keyring that vouched for all repositories at once. That is why it was removed: a single compromised vendor key could sign anything. Adding an unsigned repository, or disabling signature checks with [trusted=yes], throws this entire guarantee away — the package may install cleanly and still be exactly the supply-chain attack the signing was designed to stop.

# Modern Debian/Ubuntu: a key scoped to one repository
curl -fsSL https://example.com/repo.gpg \
  | gpg --dearmor -o /etc/apt/keyrings/example.gpg
echo "deb [signed-by=/etc/apt/keyrings/example.gpg] https://example.com/apt stable main" \
  | tee /etc/apt/sources.list.d/example.list

# RHEL family imports the key for rpm/dnf
rpm --import https://example.com/RPM-GPG-KEY-example

Versioning and Pinning

Package versions are ordered, not just labeled, and the manager compares them to decide what counts as an upgrade. A Debian version like 1:2.4.52-1ubuntu4.7 has three parts: the epoch (1:), which overrides normal comparison when an upstream renumbering would otherwise look like a downgrade; the upstream version (2.4.52); and the Debian revision (-1ubuntu4.7) for distro-side packaging fixes. Understanding that order is what lets you reason about whether apt upgrade will move a package and in which direction.

Pinning freezes that decision. apt-mark hold nginx tells apt never to upgrade or remove nginx automatically, and an APT preferences file in /etc/apt/preferences.d/ can pin a package to a specific version or source by priority. Pinning is the right tool when a newer version breaks your workload — but every held package is a package that stops getting security patches, so a hold is a debt you must track and eventually pay down, not set and forget.

# Freeze a package, list holds, then release it
apt-mark hold nginx
apt-mark showhold
apt-mark unhold nginx

# RHEL family equivalent
dnf versionlock add nginx
Distro Packages vs Language Managers vs Containers

Distro packages (apt/dnf) — system-wide software installed into /usr, one version shared by every user and service, patched by the distribution. Use them for anything the OS or a system service depends on: the web server, the database, the language runtime itself.

Language package managers (pip, npm, cargo) — per-project dependencies for code you are building, versioned independently of the OS. Use them inside a virtualenv or project directory; never pip install system-wide on top of distro-managed Python, where the two trackers collide and break each other.

Containers — a frozen image bundling an application with its own userland and dependency set, isolated from the host. Use them when you need a different version than the host ships, or reproducibility across machines, and accept that you now patch each image yourself instead of letting the host's apt do it.

Common Mistakes
  • Installing system software by curl https://... | sudo bash — the files belong to no package, appear in no upgrade list, and never receive the security patches the package manager would have pushed. Months later you are running a known-vulnerable binary nobody can even find.
  • Adding a third-party repository with [trusted=yes] or an unverified key — you have disabled the only check that proves the packages are the ones the vendor built, turning every install into a blind trust of whoever controls that mirror.
  • Running pip install or npm install -g as root on top of the distro-managed runtime — the language tool overwrites files dpkg believes it owns, and the next apt upgrade either reverts your packages or fails outright with file conflicts.
  • Forcing past a dependency error with dpkg -i --force-depends instead of fixing the request — the package installs but its libraries are missing, so it dies at runtime with missing-symbol errors that are far harder to diagnose than the refusal would have been.
  • Holding a package with apt-mark hold to dodge a breaking change and then forgetting it — the held package silently stops getting security updates, and the hold only surfaces when an unrelated upgrade is mysteriously kept back.
  • Running apt upgrade and assuming everything moved — upgrade installs new dependencies but never removes a package, so any upgrade that would require removing something is held back until you run apt full-upgrade, leaving a half-patched system if you stop at the first command.
Best Practices
  • Install system software only from signed repositories through apt or dnf — if upstream offers only a curl | sh installer, prefer their official .deb/.rpm or repo so the file stays under package management.
  • Scope every third-party repository key with signed-by= into /etc/apt/keyrings/; never use apt-key or a global keyring, so one vendor's key can authenticate only that vendor's packages.
  • Isolate language dependencies in a virtualenv, a project-local node_modules, or a container — keep pip and npm off the system Python and Node the distro manages.
  • Run apt update immediately before any apt install so the index is fresh; a stale index installs an old version or fails to find a package the mirror already has.
  • Pin a package deliberately with apt-mark hold only when a version genuinely breaks your workload, record why, and review apt-mark showhold on a schedule so no hold quietly outlives its reason.
  • Verify a downloaded standalone .deb with dpkg-deb --info and check its source before dpkg -i; a package runs maintainer scripts as root, so treat it with the same caution as any root-level executable.
Comparable toolsWindows — winget and MSI for system installs, Chocolatey as a community repository layermacOS — Homebrew, a third-party manager filling the gap of no built-in system package repositoryLanguage PMs — pip, npm, and cargo, per-project equivalents that sit beside the system manager, not instead of it

Knowledge Check

Why is installing system software with curl https://... | sudo bash a problem on a server even when the script works?

  • The installed files belong to no package, so they appear in no upgrade list and never receive the security patches the package manager pushes
  • The script cannot create the system users or start the services it needs under systemd, so the install always finishes in an incomplete, half-running state
  • It only works on Debian and Ubuntu and silently fails partway through on every RHEL-family system because the script branches on a package manager that is not present there
  • It installs into /opt instead of /usr, which breaks the dependency resolver's lookups and leaves later packages unable to find the shared libraries it dropped there

What does repository GPG signing actually protect against?

  • A tampered mirror or network path serving a malicious package whose root-level maintainer scripts would otherwise run unchallenged
  • Packages being downloaded from the mirror over plain HTTP instead of an encrypted HTTPS connection, exposing their bytes to anyone watching the network path between you and the server
  • Two packages declaring conflicting version constraints on the same shared library at install time, leaving the resolver unable to satisfy both at once and aborting the transaction
  • The local package cache in /var/cache/apt/archives/ growing without bound over a server's lifetime until it eventually fills the root partition and stalls further installs

Why is scoping a repository key with signed-by= safer than the old apt-key add approach?

  • The key authenticates only the one repository it is bound to, so a compromised third-party key cannot vouch for packages from the main archive
  • It encrypts the downloaded packages so their contents cannot be inspected anywhere in transit between the mirror and the host, shielding the payload from anyone on the path
  • It stores the key in a dearmored binary format that is measurably faster for apt to verify the release index against on every metadata refresh
  • It automatically rotates the key for you whenever the repository publishes a replacement one, fetching the new key during apt update so you never touch the keyring file yourself

You pin a package with apt-mark hold to avoid a breaking change. What is the trade-off?

  • The held package stops receiving automatic upgrades, including security patches, until you explicitly unhold it
  • Every other package that depends on it is also frozen at its current version until you unhold, so a single hold quietly stalls upgrades across a whole branch of the dependency tree
  • The package is dropped from the local database and has to be reinstalled before it can upgrade
  • Dependency resolution is disabled across the whole system for as long as any hold is in place

Where should a Python application's third-party libraries live on a server running the distro-managed Python?

  • In a virtualenv or container, isolated from the system Python so pip and apt never fight over the same files
  • Installed system-wide with sudo pip install so every service on the host shares one copy of each library under the distro's own site-packages directory
  • Added as a third-party apt repository configured with [trusted=yes] for convenience, letting the package manager pull and patch each library the way it does any other .deb
  • Unpacked directly into /usr/lib/python3 alongside the distro modules so the resolver can track and patch them as if they were packaged

You got correct