seccomp and AppArmor/SELinux
Capabilities limit what powers a process holds. These controls limit two other things. seccomp limits which syscalls it can make, and the Linux Security Modules — AppArmor on Debian and Ubuntu, SELinux on RHEL and Fedora — limit what files and resources it can touch. They are the layers that contain a process which somehow holds root and capabilities anyway, by blocking the calls and file accesses it would need to do damage.
Docker ships a default seccomp profile that blocks roughly 44 dangerous syscalls out of the 300-plus available, and on most hosts an LSM profile is already confining your containers. Both are on unless you opt out. The skill in this topic is mostly knowing they exist and not turning them off — because the standard wrong move is to disable the whole layer the first time an app trips over it.
docker-default profile.seccomp Filters Syscalls
A seccomp profile is an allowlist-or-denylist of system calls — the narrow interface through which every process asks the kernel to do anything. Docker's default profile blocks ~44 calls that a normal container never makes but an escape would: mount, reboot, kexec_load, raw ptrace of other processes, and similar low-level operations. It permits the hundreds of calls an ordinary application actually uses — read, write, open, socket — so the filter is invisible to normal workloads and a wall to abnormal ones.
The Default Profile Is On
Unless you opt out, every container runs under Docker's default seccomp profile. You rarely need to write your own. When you do — for a service you want locked down further, like driftwood/web — you supply a stricter profile with --security-opt seccomp=profile.json. The important thing is that the protection is already there by default; the failure mode is removing it, not forgetting to add it.
# lock driftwood/web down with a stricter profile docker run -d --name web \ --user app \ --cap-drop=ALL \ --security-opt seccomp=driftwood-web.json \ -p 8000:8000 \ driftwood/web # the wrong move: never do this to get past one blocked syscall # docker run --security-opt seccomp=unconfined ...
The custom profile driftwood-web.json starts as a copy of Docker's default and removes the few extra syscalls the app demonstrably never makes. The commented-out unconfined line is what you must not reach for: it switches the entire filter off to permit a single call.
AppArmor and SELinux Confine Resources
The LSMs enforce mandatory access control — rules the kernel applies regardless of file permissions or process identity. AppArmor is path-based and default on Debian and Ubuntu through the docker-default profile; SELinux is label-based and default on RHEL and Fedora. Both restrict which files, devices, and capabilities a container can reach even as root, so a confined process cannot read /etc/shadow on the host even if it breaks out of its mount namespace. Where seccomp gates the syscall, the LSM gates the resource the syscall would touch.
Applying Profiles
Both LSMs attach through --security-opt: apparmor=<profile> for AppArmor, label=<option> for SELinux. In practice the defaults are right, and the work is not writing profiles — it is knowing they are there and leaving them on. A custom AppArmor or SELinux profile is occasionally worth it for a tightly scoped service, but the common case is the default profile doing its job silently.
Don't Disable to "Fix" a Bug
The standard wrong move is reaching for --security-opt seccomp=unconfined — or apparmor=unconfined, or label=disable for SELinux — the moment an app hits a blocked syscall or a denied file access. That removes the entire layer to fix one call. It works, the error goes away, and the container now runs with no syscall filter or no mandatory access control at all, which nobody remembers six months later.
The right move is surgical. For a blocked syscall, copy Docker's default seccomp profile and add only the one call the app legitimately needs. For a denied file path, adjust the AppArmor or SELinux policy to allow that path. You keep the layer and open exactly the one hole the workload requires, rather than tearing the wall down because one brick was in the way.
- Running
--security-opt seccomp=unconfinedto get past a blocked syscall — it disables the entire ~44-syscall filter to permit one call; write a profile that allows only that syscall instead. - Disabling AppArmor or setting
--security-opt label=disablefor SELinux to silence a permission error — that removes mandatory access control wholesale and re-opens the file-access paths it was confining. - Assuming
--privilegedstill leaves seccomp on — it disables the seccomp profile entirely, which is one more reason that flag is a footgun (topic 61). - Writing a custom seccomp profile from scratch and accidentally blocking syscalls the runtime needs, so the container won't even start — begin from Docker's default and subtract, never build from empty.
- Leave the default seccomp and LSM profiles on; they cost nothing and block the syscalls and file accesses a normal workload never makes.
- When an app needs a blocked syscall, copy the default seccomp profile and add only that one syscall, rather than going
unconfined. - Keep AppArmor (Debian/Ubuntu) or SELinux (RHEL/Fedora) enabled on the host so containers inherit mandatory access control by default.
- Verify the host's LSM is actually active with
aa-statusorgetenforceon production hosts, since a silently disabled module removes a layer you assumed was there.
Knowledge Check
What is the division of labor between seccomp and an LSM like AppArmor or SELinux?
- seccomp filters which syscalls the process can make; the LSM confines which files and resources it can touch
- seccomp limits which Linux capabilities a process holds while the LSM caps its CPU shares and memory usage
- seccomp scans the image layers for known vulnerabilities while the LSM scans the running process for malware
- They are simply two names for the same underlying syscall filter, one shipped on Debian and one on RHEL
What is true of Docker's default seccomp profile?
- It is on by default and blocks ~44 dangerous syscalls while permitting the hundreds a normal app uses
- It is off by default and must be enabled explicitly per container with a flag before it filters anything
- It blocks nearly every syscall by default, which is why most ordinary containers need a hand-written custom profile to run
- It only applies to containers explicitly started with the
--privilegedflag, and to nothing else
An app hits a syscall blocked by the default seccomp profile. What is the right response?
- Copy the default profile and add back only the one extra syscall the app legitimately needs
- Run with
seccomp=unconfinedto drop the whole filter and let every syscall through unchecked - Add
--privilegedto the run command so the syscall restriction no longer applies to the container - Write a brand-new profile from an empty allowlist that permits only that single syscall
How does an LSM protect the host even if a container process has root?
- It enforces mandatory access control regardless of identity, so a confined process can't read host files like
/etc/shadow - It strips the process of root entirely at exec time, turning every container process into an unprivileged ordinary non-root user
- It transparently encrypts every file on the host at rest so a container process can only ever read back ciphertext
- It boots a separate private kernel for the container so host files sit on a different kernel and stay unreachable
You got correct