Scheduling Priority
Topic 33

Scheduling Priority

ProcessesScheduling

Every runnable process competes for the same CPUs, and the kernel scheduler — EEVDF (Earliest Eligible Virtual Deadline First), which replaced the Completely Fair Scheduler in kernel 6.6 and is what current Debian and Ubuntu kernels run — decides which one runs next. Scheduling priority is the set of knobs that bias that decision: the nice value (a number from −20 to 19) tilts a process's share of CPU time, and ionice does the equivalent for disk I/O. A high nice value means "be polite, let others go first"; a low one means "I am more important than the default."

The thing to fix in your head first is what these knobs do not do. nice is not a cap or a quota — it changes proportions only when there is contention. A niced-to-19 batch job on an otherwise idle box still consumes a whole core, because nobody else wants it. The payoff comes under load: niceness is how you keep a nightly rsync or a video transcode from starving the interactive service that pays the bills, without suspending the batch job entirely.

The Scheduler and Time-Slicing

EEVDF does not hand out fixed time slices the way the older O(1) scheduler did. It tracks per-process virtual runtime — roughly, how much CPU each task has consumed, weighted by its priority — and runs the eligible task with the earliest virtual deadline next, keeping the runnable tasks in a red-black tree (the same structure CFS used, ordered by deadline rather than raw virtual runtime). The result is proportional fairness: over any window of contention, runnable tasks of equal priority each get an equal slice, and a task that has been sleeping gets to catch up when it wakes.

The nice value feeds directly into the weight the scheduler assigns each task. Each step of nice changes a process's CPU share by roughly 1.25×: a task at nice 0 gets about 1.25 times the CPU of a task at nice 1 under contention, and the spread from −20 to 19 covers a factor of several thousand. That is why nice is a relative dial, not an absolute limit — it expresses only how this process should be weighed against the others currently fighting for the same core.

nice and renice

Launch a command with a chosen niceness using nice; change a running process with renice. The default niceness of a normal process is 0. The asymmetry that trips people up: any user can make their own process nicer (raise the number, give up CPU), but lowering the number — claiming more CPU than the default — requires the CAP_SYS_NICE capability, which in practice means root.

# start a backup at the lowest CPU priority (nice 19)
$ nice -n 19 tar -czf /backup/data.tgz /srv/data

# renice a running process by PID — raising niceness needs no privilege
$ renice -n 10 -p 4821

# lowering niceness (more CPU) requires root
$ sudo renice -n -5 -p 4821

# confirm the value in the NI column
$ ps -o pid,ni,comm -p 4821

The bare nice command with no -n applies an increment of 10, not 0 — a common surprise. Use the explicit -n form everywhere so the value is unambiguous. Debian and Red Hat both ship nice from GNU coreutils and renice from util-linux, so this behaves identically across distributions; there is no apt-versus-dnf divergence here.

Real-Time versus Normal Priorities

Above the entire nice range sits a separate world: the real-time scheduling policies SCHED_FIFO and SCHED_RR, set with chrt. A real-time task at any RT priority (1–99) preempts every normal CFS task, regardless of niceness — an RT process that spins in a tight loop can lock out everything else, including the shell you would use to kill it. This is not a politeness dial; it is a hard preemption guarantee meant for audio pipelines, industrial control, and latency-critical packet processing, not for "make my web server faster."

# inspect a process's scheduling policy and priority
$ chrt -p 1
# pid 1's current scheduling policy: SCHED_OTHER
# pid 1's current scheduling priority: 0

# run a daemon under round-robin RT priority 50 (root + caution)
$ sudo chrt --rr 50 ./latency_sensitive_daemon

The kernel guards against a runaway RT task with sched_rt_runtime_us (default 950000 out of every 1000000 µs), reserving 5% of CPU time for non-RT work so the machine stays recoverable. Even so, the rule holds: reach for RT scheduling only when a measured latency requirement demands it, and never as a substitute for fixing the real bottleneck. For the vast majority of server workloads the nice range is the right tool, and cgroups (below) is the better one.

ionice and I/O Classes

CPU priority and disk priority are independent. A backup can be niced to 19 and still bring an interactive database to its knees, because the contention is on the disk queue, not the CPU. ionice sets the I/O scheduling class, which the kernel honors only under a block scheduler that implements priorities — BFQ on Debian and Ubuntu, the older CFQ on pre-5.0 kernels. There are three classes: real-time (1), best-effort (2, the default, with priority levels 0–7), and idle (3), where idle means the process gets disk time only when no other process wants it.

# run a backup so it touches the disk only when nothing else needs it
$ ionice -c 3 tar -czf /backup/data.tgz /srv/data

# combine: low CPU and idle disk priority for heavy housekeeping
$ nice -n 19 ionice -c 3 ./nightly-reindex.sh

# change the I/O class of a running process
$ ionice -c 2 -n 7 -p 4821

The catch worth knowing: idle and best-effort priorities only take effect under a scheduler that implements them. On a modern Ubuntu server, check /sys/block/sda/queue/scheduler — if it reads none or mq-deadline rather than bfq, ionice classes are effectively ignored and your "idle" backup competes on equal terms. Switch that device to BFQ, or move the limit up to cgroups, before assuming ionice is protecting anything.

cgroups as the Better Lever

For anything beyond a single ad-hoc command, control groups are the stronger tool. nice and ionice bias a process relative to its neighbors; cgroups v2 — the default hierarchy on Debian and Ubuntu under systemd — sets actual weights and hard ceilings on whole groups of processes. CPUWeight= on a systemd unit does what nice does but for an entire service tree, and CPUQuota=20% imposes a real cap that nice cannot express, throttling the service to one-fifth of a core even on an idle machine.

# cap a service to 20% of one CPU and lower its weight, live
$ sudo systemctl set-property nightly-batch.service CPUQuota=20% CPUWeight=20

# run a one-off command inside a transient scope with limits
$ sudo systemd-run --scope -p CPUQuota=50% -p IOWeight=10 ./import.sh

The practical rule: reach for nice and ionice for a quick, single command at the shell, and for cgroups — via systemd unit properties or systemd-run — when you need durable, enforceable limits that survive across every process a service spawns. The resource-control topic covers the CPU, IO, and memory controllers in depth; this is the forward pointer to it.

Common Mistakes
  • Expecting nice to cap CPU usage. It only changes proportions under contention — a niced job on an idle box still takes a whole core. To impose an actual ceiling you need CPUQuota= on a cgroup, not niceness.
  • A non-root user running renice -n -5 and hitting "Operation not permitted." Lowering niceness needs CAP_SYS_NICE; unprivileged users can only raise the number, never claim more CPU than the default.
  • Nicing a disk-bound backup and watching the database stall anyway. The contention is on the I/O queue, not the CPU — nice does nothing for disk, you also need ionice -c 3.
  • Setting ionice -c 3 on a device whose block scheduler is none or mq-deadline. Those schedulers ignore I/O classes, so the "idle" job competes on equal terms; only BFQ honors them.
  • Using chrt real-time priorities to "speed up" an ordinary service. An RT task preempts everything, and a tight loop at RT priority can wedge the whole machine to the point you cannot even get a shell to kill it.
  • Forgetting that bare nice ./cmd applies an increment of 10, not 0 — the job runs less aggressively than intended because no explicit -n was given.
  • Renicing only the parent of a multi-process service and assuming the workers picked it up. Children inherit niceness at fork time, but an already-running worker pool keeps its old value unless you renice the whole process group.
Best Practices
  • Launch batch jobs with nice -n 19 and keep interactive-facing work at the default 0 — slow down the jobs you are willing to wait on, never the ones users are waiting on.
  • Pair nice -n 19 ionice -c 3 on backups, reindexing, and disk scrubs so they yield on both CPU and disk, not just one.
  • Verify the active block scheduler with cat /sys/block/sda/queue/scheduler before relying on ionice, and switch the device to BFQ if you need I/O classes honored.
  • Use systemctl set-property with CPUQuota= and IOWeight= for any limit that must be durable and enforceable across a service's worker processes — cgroups, not per-process nice, is the lever at scale.
  • Reserve chrt real-time scheduling for workloads with a measured latency requirement, set the lowest RT priority that meets it, and leave sched_rt_runtime_us at its default so the box stays recoverable.
  • Confirm every change with ps -o pid,ni,cls,pri,comm afterward — read back the NI value and scheduling class rather than assuming the command took.
Comparable toolsWindows — process priority classes (Idle / Below Normal / Normal / High / Realtime), set in Task Manager or via SetPriorityClasscgroups — the cpu and io controllers, the durable cap-and-weight mechanism that supersedes per-process nice at scalemacOSnice and renice plus taskpolicy for the background QoS tier

Knowledge Check

You nice a CPU-heavy batch job to 19 on an otherwise idle 8-core server. What happens to its CPU usage?

  • It still uses a full core or more — nice only changes proportions under contention, and nothing else is competing
  • It is capped to roughly 5% of a single core because nice 19 sits near the bottom of the range and enforces that ceiling
  • It is suspended entirely until an interactive process needs the CPU, then resumed once that process yields
  • It is limited to one-eighth of total capacity — one core — by the fair scheduler

A non-root user runs renice -n -5 -p 4821 on their own process and gets "Operation not permitted." Why?

  • Lowering niceness claims more CPU and requires CAP_SYS_NICE; unprivileged users can only raise the value
  • A process can never be reniced once it is already running; its priority can only be fixed at launch time with nice
  • Niceness can only be changed on processes the user did not start themselves
  • The −5 value falls outside the legal range, which for unprivileged users starts at 0 and goes no lower

A nightly backup niced to 19 still makes the database unresponsive while it runs. What is the most likely fix?

  • Add ionice -c 3 — the contention is on the disk queue, which nice does not affect
  • Lower the niceness further to −20 so the backup finishes faster and gets out of the way
  • Switch the backup to a real-time policy with chrt so it no longer blocks the database
  • Renice the database to 19 as well so the two jobs are balanced against each other

Why is a cgroup CPUQuota=20% often a better lever than nice for a runaway service?

  • It imposes a hard ceiling that applies even on an idle machine, which nice cannot express
  • It runs the service at real-time priority so the kernel guarantees it exactly 20% of the CPU
  • It sets the service's nice value to 20, which is stricter than the −20..19 range allows
  • It only affects the service while the system is under contention, like nice but stronger

You set ionice -c 3 on a job but it still saturates the disk. What is the most likely cause?

  • The device's block scheduler is none or mq-deadline, which ignore I/O classes; only BFQ honors them
  • Idle class (3) is actually the highest available I/O priority, so the job was effectively promoted ahead of everything else, not demoted
  • ionice requires a matching nice -n 19 or the I/O class is silently reset to best-effort
  • ionice only works on network filesystems, never on local block devices

You got correct