/proc and Introspection
Topic 34

/proc and Process Introspection

ProcessesIntrospection

/proc is a virtual filesystem the kernel synthesizes on demand — nothing on disk backs it. Read a file under it and the kernel runs code to produce the bytes at that instant: /proc/loadavg computes the load averages, /proc/meminfo reads the memory accounting, and /proc/<pid>/status dumps the live state of a process. Every per-process tool you already use — ps, top, htop, pgrep — is a parser over /proc, not a separate source of truth.

The operational consequence is that you never need a special agent to inspect a running process. If ps shows something you do not believe, you can read the raw kernel numbers yourself with cat and grep, and they will agree because they come from the same place. When a process is wedged, hung in uninterruptible sleep, or leaking file descriptors, /proc/<pid>/ is where the answer lives — and it is readable with the same tools you use on any text file.

The Per-Process Directory

Every running process gets a directory at /proc/<pid>/, created when it starts and removed the moment it exits. Inside are the files that describe everything the kernel knows about it. status gives a human-readable summary — state, UIDs, memory (VmRSS, VmSize), thread count, and the Cpus_allowed affinity mask. cmdline holds the argument vector with NUL separators, which is why cat /proc/<pid>/cmdline looks run-together and needs tr '\0' ' ' to read. environ exposes the environment the process was launched with — a place secrets leak if you pass them as variables.

# What is PID 1432 actually running, and where from?
sudo tr '\0' ' ' < /proc/1432/cmdline; echo
sudo ls -l /proc/1432/cwd /proc/1432/exe
# cwd and exe are symlinks to the working dir and the on-disk binary

Three symlinks are worth memorizing. cwd points at the process's current working directory, exe at the executable it was launched from, and root at its root directory (different inside a container or a chroot). If a deleted binary is still running, readlink /proc/<pid>/exe shows the path with a trailing (deleted) — the single fastest way to spot a service running stale code after an upgrade.

File Descriptors and Open Files

/proc/<pid>/fd/ is a directory of symlinks, one per open file descriptor, each pointing at what that descriptor actually references: a regular file, a socket (socket:[12345]), a pipe, or /dev/null. This is the authoritative answer to "what does this process have open?" and it is how you diagnose descriptor leaks: count the entries over time and watch the number climb toward the process's RLIMIT_NOFILE, visible in /proc/<pid>/limits.

# How many fds is the process holding, and against what limit?
sudo ls /proc/1432/fd | wc -l
grep 'Max open files' /proc/1432/limits
# lsof and ss read exactly this directory under the hood
sudo lsof -p 1432

A deleted file that a process still holds open keeps consuming disk space until that descriptor closes — the inode is unlinked but not freed. The classic symptom is df reporting a full disk while du finds nothing. The fix is in /proc: find the descriptor under fd/ whose target ends in (deleted), and either restart the holder or truncate through the descriptor with : > /proc/<pid>/fd/<n>.

System-Wide Kernel State

Above the per-process directories, /proc exposes the whole machine. /proc/cpuinfo and /proc/meminfo are what every monitoring agent scrapes; /proc/loadavg carries the 1-, 5-, and 15-minute load plus the running/total task counts; /proc/mounts is the live mount table; /proc/net/ backs ss and the legacy netstat. These are the same numbers the kernel reports to free, uptime, and vmstat — reading them directly removes any doubt about a tool's formatting or rounding.

PathWhat it reportsTool that parses it
/proc/loadavgLoad averages, task countsuptime, top
/proc/meminfoMemory and swap accountingfree
/proc/<pid>/statusPer-process state, RSS, UIDsps, top
/proc/<pid>/fd/Open file descriptorslsof
/proc/net/tcpTCP sockets and statesss, netstat

Tunable State: /proc/sys and sysctl

One branch of /proc is writable. /proc/sys/ exposes hundreds of kernel tunables as files, and writing to one changes kernel behavior immediately — echo 1 > /proc/sys/net/ipv4/ip_forward turns on routing this instant. The catch is that writes here are not persistent: a reboot reverts everything to the boot-time defaults. The supported way to read and set these is sysctl, which is just a typed front-end over the same tree.

# Ephemeral (lost on reboot) vs persistent
sudo sysctl -w net.ipv4.ip_forward=1
# Persist across reboots on Debian/Ubuntu:
echo 'net.ipv4.ip_forward = 1' | sudo tee /etc/sysctl.d/99-forward.conf
sudo sysctl --system   # reload all *.conf, including /etc/sysctl.d/

On Debian and Ubuntu, drop tunables into /etc/sysctl.d/*.conf rather than editing /etc/sysctl.conf directly, so package upgrades do not clobber your changes; sysctl --system applies the whole directory in order. Red Hat systems use the same sysctl command and the same /etc/sysctl.d/ convention — this part of /proc does not diverge between distributions.

/proc vs /sys vs /dev

/proc — process and kernel state. Originally per-process introspection (the /proc/<pid>/ directories), it also carries global counters and the writable /proc/sys tunables. Reach for it for anything about a running process or for sysctl-style settings.

/sys — the sysfs view of the device and driver model: block devices, network interfaces, cgroups, kernel modules. Newer and more structured than /proc; hardware and driver tunables (a NIC's queue length, a disk's scheduler) live here, not under /proc.

/dev — device nodes you do I/O through (/dev/sda, /dev/null), managed by udev. It is for reading and writing devices, not for reading their metadata — that is what /sys is for.

Common Mistakes
  • Writing tunables to /proc/sys/... with echo and expecting them to survive a reboot — they do not. Persist them in /etc/sysctl.d/*.conf and apply with sysctl --system, or the change silently reverts on the next restart.
  • Passing secrets as environment variables — any process that can read /proc/<pid>/environ (the owner, and root) can recover them long after launch. Use a secrets file or systemd LoadCredential= instead.
  • cat /proc/<pid>/cmdline and concluding the process has "no arguments" because the output ran together — the separators are NUL bytes, not spaces. Pipe through tr '\0' ' ' to read it.
  • Chasing a "full disk" with du when df and du disagree — the space is held by a deleted-but-open file. Find it under /proc/<pid>/fd/ (target ends in (deleted)); the inode frees only when the descriptor closes.
  • Reading PID-specific files without sudo and trusting an empty result — cmdline, environ, fd/, and exe for another user's process are restricted, and hidepid= mount options can hide them entirely.
  • Scripting against a PID across a delay without confirming identity — PIDs are reused, so /proc/<pid>/ may now describe a different process. Re-check cmdline or the start time before acting on a stale PID.
  • Treating /proc sizes as bytes on disk — it is synthesized, so files report size 0 and the real data appears only when read. Tools that stat before reading will think the files are empty.
Best Practices
  • Confirm what a service is really running with readlink /proc/<pid>/exe — a trailing (deleted) means it is executing stale code and needs a restart after the upgrade.
  • Diagnose descriptor leaks by watching ls /proc/<pid>/fd | wc -l against grep 'Max open files' /proc/<pid>/limits before the process hits its RLIMIT_NOFILE ceiling.
  • Set kernel tunables with sysctl -w for a live test, then persist the ones you keep in /etc/sysctl.d/*.conf and run sysctl --system so upgrades cannot clobber them.
  • Read /proc/<pid>/status for the fields ps truncates — State, VmRSS, Threads, and Cpus_allowed_list — when you need exact per-process numbers, not a formatted column.
  • Reach for /sys, not /proc, for hardware and driver tunables — disk schedulers, NIC settings, and cgroup limits live in sysfs.
  • Tighten exposure on multi-tenant hosts by mounting /proc with hidepid=2 (via a systemd drop-in or fstab) so users cannot enumerate other users' processes and command lines.
  • Prefer the maintained tools — lsof, ss, htop, sysctl — for routine work, and drop to raw /proc only when you need to verify them or capture exact bytes for a bug report.
Comparable toolsWindows — no /proc; per-process state comes from the Win32 API and ETW, surfaced by Task Manager, Process Explorer, and Get-Process in PowerShellmacOS — dropped its procfs; introspection goes through sysctl and libproc (powering ps, top, and lsof) rather than a readable /proc treeFreeBSD — procfs exists but is deprecated and unmounted by default; the kernel exposes process state through sysctl and the kvm interface instead

Knowledge Check

You write echo 1 > /proc/sys/net/ipv4/ip_forward and routing starts working, but after a reboot it is off again. Why?

  • Writes to /proc/sys change live kernel state only; persistence requires an entry in /etc/sysctl.d/*.conf applied at boot
  • The write failed silently because /proc is read-only and needs to be remounted read-write first
  • A reboot is required for any /proc/sys write to take effect, so the first change never actually applied
  • ip_forward is controlled by ufw, which resets it to zero at startup and overwrites any value written directly into /proc

df reports the root filesystem 100% full, but du -sh / accounts for far less. What does /proc let you find?

  • A deleted-but-still-open file — its descriptor under /proc/<pid>/fd/ targets a path ending in (deleted), and the space frees only on close
  • A corrupted inode table whose lost blocks only an online fsck reading the mount list in /proc/mounts can reclaim and return to the filesystem's free count
  • Cached memory pages that /proc/meminfo counts toward disk usage so that df sees them as full blocks while du cannot
  • A reserved-blocks setting under /proc/sys/vm that withholds the missing space from du's total while df still counts it as occupied

Why is passing a credential as an environment variable risky even after the process has been running for hours?

  • The launch environment stays readable at /proc/<pid>/environ for the process owner and root for the life of the process
  • Environment variables are written to a real /proc file on disk at launch and stay there until the next reboot clears the directory
  • Any unprivileged user on the system can read another user's environ file directly, with no elevated privileges required
  • The kernel logs every environment variable to /proc/kmsg when the process starts

When should you read /sys rather than /proc?

  • For device and driver model state — block-device schedulers, NIC settings, cgroup limits, loaded modules
  • For per-process file descriptors and command lines, which moved from /proc to /sys in recent kernels
  • For load averages and memory accounting figures, which /proc no longer exposes and which now live only under sysfs
  • For writable kernel tunables, since /proc/sys is now read-only and superseded by /sys

You got correct