Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

a fence round the process that kept eating the box

A leaky importer kept driving a host into swap; a systemd slice and cgroup v2 memory limits contained it without a code fix.

A terminal with system monitoring output

One service on a shared box has a slow memory leak. I know which one, I know roughly why, and I haven't fixed it yet because the fix lives in code I don't fully own and the change window is months away. In the meantime it creeps upward over a few days, drives the host into swap, and makes everything else on the machine miserable until I restart it.

I don't need to fix the leak today. I need to stop it taking hostages. That's a containment problem, and cgroup v2 with systemd is exactly the tool.

let systemd do the cgroup plumbing

You can drive cgroup v2 through the raw /sys/fs/cgroup filesystem, and I have, but for a long-lived service it's far nicer to let systemd own the slice. The unit already runs inside a cgroup; I just want to put limits on it.

# /etc/systemd/system/importer.service.d/limits.conf
[Service]
MemoryHigh=2G
MemoryMax=2560M
MemorySwapMax=0

The two limits do different jobs and the difference is the whole point. MemoryHigh is a soft throttle: cross it and the kernel piles reclaim pressure on this cgroup and only this cgroup, slowing it down without killing anything. MemoryMax is the hard wall: cross that and the OOM killer fires inside the slice, taking out the offending process rather than scoring the whole machine and shooting Postgres.

Setting MemorySwapMax=0 was the bit I nearly forgot. Without it the leak just oozes into swap and the host grinds for hours before anything dies. Denying it swap means the pressure shows up promptly and the throttle actually bites. Swap is meant to be a buffer for genuinely idle pages, not a place for a leak to hide while it slowly strangles the disk, and on a box with other tenants I'd much rather the misbehaving service hit its wall cleanly than spread the pain around.

watching it bite

After systemctl daemon-reload and a restart, the accounting is right there:

systemctl show importer.service -p MemoryCurrent
cat /sys/fs/cgroup/system.slice/importer.service/memory.events

The interesting line in memory.events is the high counter. As the leak pushes the service against MemoryHigh, that number climbs and the service slows, but the rest of the box stays responsive. The leak now plateaus against its own ceiling instead of the host's. A few days in, the service occasionally gets OOM-killed and restarted inside its own slice, which is ugly but contained, and nothing else on the machine even notices.

None of this fixes the leak. It's a fence, not a cure, and I'll still do the proper fix when the window opens. But the difference between "one service is degraded" and "the whole host is down" is the difference between a ticket I read on Monday and a page that wakes me up. cgroup v2 made that a four-line drop-in. Worth knowing before you need it.