capping a runaway with one cgroups v2 line

A Linux terminal

A batch job decided to load an entire dataset into memory at once, ate everything the box had, and dragged the OOM killer in to shoot something. Which it did, more or less at random, which is exactly the unhelpful outcome you would expect. The job was the problem, but the real fault was mine for letting one process have the run of the machine.

The fix on a modern systemd box with cgroups v2 is almost embarrassing. You drop the job into a slice with a hard memory limit:

systemd-run --slice=batch.slice -p MemoryMax=2G ./importer

Now the importer gets two gigabytes and not a byte more. When it tries to exceed that, the kernel reclaims within the cgroup and, if it cannot, OOM-kills inside the cgroup only. The rest of the machine never notices. One badly behaved process can no longer take down everything sharing the box with it.

cgroups v2 is the default on current distros now, and the unified hierarchy makes this far less fiddly than the v1 days of mounting controllers by hand. The honest lesson is that I should not have been relying on the job to behave. Resource limits are not a punishment for bad code, they are a seatbelt, and MemoryMax is one line.