A box paged me for disk space on a partition that should have had plenty. No runaway application, no enormous upload, no core dump. The culprit was /var/log/journal, sitting at several gigabytes and growing, because systemd's journal had been persistent for months and nobody had ever told it when to stop. This is one of those problems that's invisible right up until it isn't, so here's how I bound it for good.
what's actually going on
By default, on most modern distributions, the journal is persistent: it lives on disk under /var/log/journal rather than only in memory. That's a good default. The catch is that journald's idea of "how much disk may I use" is derived from the size of the filesystem it lives on, and if you didn't set explicit limits, it'll happily grow to a meaningful fraction of the partition. On a generous root partition that's fine for ages. On a tight one it's a slow-motion outage.
First, find out the truth rather than guessing:
journalctl --disk-usage
That prints exactly how much the journal is using right now. Mine was the better part of four gigabytes on a partition that didn't have four gigabytes to spare.
the immediate cleanup
You don't need to delete the directory by hand, and you shouldn't. journalctl has two vacuum modes that do it safely:
journalctl --vacuum-size=500M
journalctl --vacuum-time=2weeks
The first trims old journal files until the on-disk total is under the size you give. The second drops anything older than the window. Either reclaims space immediately and leaves the journal in a consistent state, which is the part you'd get wrong by reaching for rm.
That fixes today. It does nothing about tomorrow, because the journal will just grow again. For that you need configuration.
bounding it permanently
The settings live in /etc/systemd/journald.conf. The two that matter most:
[Journal]
SystemMaxUse=500M
SystemMaxFileSize=50M
MaxRetentionSec=2week
SystemMaxUse is the hard cap on total persistent journal size. This is the one most people are missing, and it's the one that turns "grows until the disk is full" into "grows until 500M, then rotates". SystemMaxFileSize caps individual journal files so rotation is granular rather than dropping a huge chunk at once. MaxRetentionSec adds a time bound on top, so even if you're well under the size cap, you're not keeping logs from the spring.
Pick numbers that match the partition, not mine. The point is that you've stated a limit at all. After editing, restart the service so it picks the new limits up:
systemctl restart systemd-journald
the storage decision underneath all this
There's a more fundamental knob: Storage in the same file. The values worth knowing are persistent, volatile and auto.
persistent keeps the journal on disk across reboots, which is what you want on a server you'll be debugging after the fact. volatile keeps it in memory only, under /run, so it vanishes on reboot and never touches your disk: handy for appliances, ramdisk-y embedded things, or anywhere writes are precious. auto, the common default, means "persistent if /var/log/journal exists, otherwise volatile", which is why simply creating or removing that directory silently changes the behaviour. That last detail trips people up, so it's worth being explicit.
For the box that paged me, persistent with a firm SystemMaxUse was exactly right. I want the logs to survive a reboot so I can read what happened before it, but I never want them to be the reason the reboot happened.
don't forget the forwarding
One thing people miss when they're tightening journald: if you also run a traditional syslog or ship logs off the box, the journal's local size cap is your safety net, not your archive. The off-box copy is the long-term record. Locally you only need enough history to debug the recent past, which for most of my machines is a fortnight. Set the cap to cover that and let the central system keep the rest.
the takeaway
The journal is good software with a default that suits a desktop better than a small server. It will use what you give it, and if you never said how much that was, the answer is "more than you expected, eventually". Five minutes setting SystemMaxUse on every host means the question never comes back as a 3am page. I've now got those three lines in the base configuration that every machine I build inherits, so this particular surprise is one I only get to have once.