A small VM tipped over last week with the most boring failure mode there is: root partition full. Not a runaway database, not a log file someone forgot to rotate. Just /var/log/journal quietly growing to 3.8 GB on a box with a 20 GB disk, because nobody had ever told journald to stop.
The default on most distros is "use up to 10% of the filesystem", which sounds restrained until you remember that 10% of a tiny disk is still enough to ruin your afternoon. And if the partition is shared with everything else, that 10% is competing with your actual workload.
First, find out where you stand:
journalctl --disk-usage
That prints the current archived size. To see who is filling it, sort recent entries by unit and count them. A chatty service or a tight crash loop will jump straight out at you:
journalctl --since "1 hour ago" -o json | \
jq -r '._SYSTEMD_UNIT' | sort | uniq -c | sort -rn | head
In my case it was a container runtime spewing the same health-check warning several times a second. Worth fixing at the source, but the journal still needs a hard ceiling regardless.
The fix lives in /etc/systemd/journald.conf. The two knobs that matter:
[Journal]
SystemMaxUse=500M
SystemMaxFileSize=50M
SystemMaxUse is the total cap for persistent storage. SystemMaxFileSize caps each individual journal file, which governs how granular the rotation is. Set it to roughly an eighth of the total and you get sensible vacuuming rather than one enormous file that only gets dropped when it is finally full.
Apply it without a reboot:
systemctl restart systemd-journald
If you need space back right now, before the new limits take effect on the next rotation, you can vacuum by size or age:
journalctl --vacuum-size=500M
journalctl --vacuum-time=2weeks
Both are safe to run live. They only touch archived files, never the active one, so you can reclaim space on a box that is mid-incident without restarting anything.
There is a related trap worth knowing about. If /var/log/journal does not exist, journald keeps everything in /run/log/journal, which is a tmpfs, which means it is eating your RAM instead of your disk. On a memory-constrained VM that can be the worse failure. Check which mode you are in with journalctl --header | head and create the persistent directory deliberately if you want logs to survive reboots:
mkdir -p /var/log/journal
systemd-tmpfiles --create --prefix /var/log/journal
The same SystemMaxUse cap then applies, so you get persistence and a ceiling rather than a surprise.
One thing I keep getting asked: should you just disable persistent journals and let it all live in RAM? Tempting on a constrained box, but then every reboot loses the logs from whatever made you reboot, which is precisely when you want them. A 500 MB cap costs almost nothing and keeps enough history to be useful. Cap it, fix the noisy unit, move on.