Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

cgroups v2 and a runaway process

How a single misbehaving import job stopped taking the whole box down once I gave it a cgroups v2 memory limit under systemd.

A Linux terminal

The symptom was boring and the cause was worse. Every few days a batch job would wake up, decide it needed all of the RAM, and the box would grind into swap until the OOM killer started taking hostages. Usually it took something I cared about with it. SSH would hang, monitoring would go quiet, and I would walk over and pull power like it was 2003.

The fix was not "make the job better". The job is a horror and I do not own it. The fix was to stop letting one process ruin everyone's afternoon.

v2 is the default now, mostly

Most current distributions have moved to the unified cgroups v2 hierarchy, and systemd drives it for you. That is the part worth internalising: you rarely poke /sys/fs/cgroup by hand anymore. You set properties on a unit and systemd does the writing.

You can confirm which hierarchy you are on:

stat -fc %T /sys/fs/cgroup/
# cgroup2fs means v2 unified

A server doing the work

a memory ceiling that actually holds

The job runs as a systemd service, so I just gave the unit a limit:

[Service]
MemoryHigh=2G
MemoryMax=2.5G

MemoryMax is the hard wall: cross it and that cgroup gets the OOM killer, not the whole machine. MemoryHigh is the softer one, where the kernel starts reclaiming aggressively and throttling the process before it gets to the wall. Setting both means the job spends most of its bad days being gently slowed rather than violently killed, which is exactly the tradeoff I wanted.

After systemctl daemon-reload and a restart, I watched it with:

systemctl status the-job.service
cat /sys/fs/cgroup/system.slice/the-job.service/memory.current

It hit the high mark, paused, did its garbage collection, and carried on. When it occasionally still overcooks itself, only it dies. The box stays up. Monitoring stays up. I stay sat down.

the lesson

I spent ages trying to make the misbehaving thing behave. The better move was to draw a box around it and let it misbehave inside the box. cgroups v2 makes that almost trivial when systemd is already managing the unit, and "contain the blast radius" beats "fix the unfixable" nearly every time.