Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

stop letting journald eat your root partition

How to cap systemd-journald's on-disk usage before it quietly fills a small root partition, and how to find the noisy unit that did it.

A Linux terminal showing log output

A small VM tipped over last week with the most boring failure mode there is: root partition full. Not a runaway database, not a log file someone forgot to rotate. Just /var/log/journal quietly growing to 3.8 GB on a box with a 20 GB disk, because nobody had ever told journald to stop.

The default on most distros is "use up to 10% of the filesystem", which sounds restrained until you remember that 10% of a tiny disk is still enough to ruin your afternoon. And if the partition is shared with everything else, that 10% is competing with your actual workload.

First, find out where you stand:

journalctl --disk-usage

That prints the current archived size. To see who is filling it, sort recent entries by unit and count them. A chatty service or a tight crash loop will jump straight out at you:

journalctl --since "1 hour ago" -o json | \
  jq -r '._SYSTEMD_UNIT' | sort | uniq -c | sort -rn | head

In my case it was a container runtime spewing the same health-check warning several times a second. Worth fixing at the source, but the journal still needs a hard ceiling regardless.

A rack of servers in a small datacentre

The fix lives in /etc/systemd/journald.conf. The two knobs that matter:

[Journal]
SystemMaxUse=500M
SystemMaxFileSize=50M

SystemMaxUse is the total cap for persistent storage. SystemMaxFileSize caps each individual journal file, which governs how granular the rotation is. Set it to roughly an eighth of the total and you get sensible vacuuming rather than one enormous file that only gets dropped when it is finally full.

Apply it without a reboot:

systemctl restart systemd-journald

If you need space back right now, before the new limits take effect on the next rotation, you can vacuum by size or age:

journalctl --vacuum-size=500M
journalctl --vacuum-time=2weeks

Both are safe to run live. They only touch archived files, never the active one, so you can reclaim space on a box that is mid-incident without restarting anything.

There is a related trap worth knowing about. If /var/log/journal does not exist, journald keeps everything in /run/log/journal, which is a tmpfs, which means it is eating your RAM instead of your disk. On a memory-constrained VM that can be the worse failure. Check which mode you are in with journalctl --header | head and create the persistent directory deliberately if you want logs to survive reboots:

mkdir -p /var/log/journal
systemd-tmpfiles --create --prefix /var/log/journal

The same SystemMaxUse cap then applies, so you get persistence and a ceiling rather than a surprise.

One thing I keep getting asked: should you just disable persistent journals and let it all live in RAM? Tempting on a constrained box, but then every reboot loses the logs from whatever made you reboot, which is precisely when you want them. A 500 MB cap costs almost nothing and keeps enough history to be useful. Cap it, fix the noisy unit, move on.