Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

putting a runaway process back in its box with cgroups v2

A nightly job that occasionally pegged every core got fenced in with a systemd slice and a cgroups v2 CPU limit instead of being rewritten.

A Linux terminal showing process activity

A reporting job on one of my boxes had a habit of pinning all eight cores for twenty minutes and making the rest of the machine miserable. The job itself was fine; it just had no manners. Rewriting it was on nobody's list, so I reached for the cheap fix: tell the kernel it isn't allowed to be greedy.

On a modern systemd box that's cgroups v2, and you barely have to touch it directly. Drop the job into its own slice and set a quota:

# /etc/systemd/system/reports.slice
[Slice]
CPUQuota=200%
MemoryMax=2G

CPUQuota=200% means two cores' worth, no matter how many threads it spawns. Run the job inside that slice and the runaway is now a bounded nuisance. You can watch it behave with systemd-cgtop, which shows live CPU and memory per slice and is the first thing I check when a box feels sluggish.

The thing I like is that nothing in the job changed. No rewrite, no nice levels to fiddle, no cron tricks. The kernel enforces the ceiling and the rest of the machine carries on as if the job were a polite citizen. It still takes twenty-odd minutes, but it does it in the corner, quietly, which is all I ever wanted from it.