Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

when journald quietly ate twelve gigs

A practical walk through bounding systemd-journald disk usage with SystemMaxUse, vacuuming old logs, and stopping a chatty service from filling the journal in the first place.

A terminal with green log output on a dark screen

A small VPS of mine started throwing disk-space alerts despite hosting almost nothing. Twenty gigabytes of disk, a couple of light services, and somehow only two gigs free. The culprit, as it so often is on a systemd box that nobody's tuned, was the journal. journalctl --disk-usage reported twelve gigabytes. On a 20GB disk. Logging, the thing that's meant to help you find problems, had become the problem.

The default journald configuration is sensible for a desktop and quietly hostile on a small server. It's allowed to use up to 10% of the filesystem, which sounds modest until your filesystem is small or your services are chatty. Worse, that limit is a high-water mark it grows towards and then sits at, so a brief flood of logs months ago can leave you permanently down a chunk of disk.

first, reclaim the space

The immediate fix is to vacuum. You can bound the journal by size, by age, or by number of files. I usually do it by size to get headroom back fast:

# how bad is it
journalctl --disk-usage

# keep only the most recent 500M, delete the rest now
sudo journalctl --vacuum-size=500M

# or by age, if you'd rather keep a fixed window
sudo journalctl --vacuum-time=2weeks

That's a one-off, though. It frees the space but does nothing to stop it filling up again, which is the bit people skip and then wonder why they're back a month later.

then, bound it permanently

The real fix lives in /etc/systemd/journald.conf. The defaults are mostly commented out, which is its own small trap: people open the file, see everything commented, assume nothing's configured, and don't realise the commented values are the active defaults. Set explicit limits:

[Journal]
SystemMaxUse=500M
SystemKeepFree=1G
SystemMaxFileSize=50M
MaxRetentionSec=2week

SystemMaxUse is the hard cap on total journal size. SystemKeepFree tells journald to always leave at least that much free on the filesystem, which is the belt-and-braces that stops logging ever being the thing that fills a disk. SystemMaxFileSize keeps individual journal files small enough that rotation and vacuuming are granular rather than all-or-nothing. MaxRetentionSec discards anything older than the window regardless of size.

A rack-mounted server with status lights

Restart the service to apply:

sudo systemctl restart systemd-journald
journalctl --disk-usage

On that VPS it dropped from twelve gigabytes to under five hundred megabytes and has stayed there since.

persistent or volatile

One decision worth making consciously: do you want the journal to survive reboots at all? If /var/log/journal exists, it's persistent. If it doesn't, the journal lives in /run and vanishes on reboot. For a server you almost always want persistence, so you can investigate a crash after the fact. The Storage=auto default does the right thing as long as that directory exists:

sudo mkdir -p /var/log/journal
sudo systemd-tmpfiles --create --prefix /var/log/journal

For an ephemeral box that ships its logs elsewhere, Storage=volatile is a legitimate choice and saves the disk entirely.

find the service that's actually shouting

Capping the journal treats the symptom. It's worth ten minutes to find the cause, because a service generating gigabytes of logs is usually a service with a problem you'd want to know about. You can sort by who's noisiest:

journalctl --output=verbose | \
  grep -oP '_SYSTEMD_UNIT=\K\S+' | sort | uniq -c | sort -rn | head

In my case it was a container health check logging a full line every two seconds, twenty-four hours a day, for months. Forty thous-odd lines a day of "I am fine" is its own kind of joke. I turned the check's verbosity down and the journal growth rate dropped to something sensible without any caps needed at all.

the takeaway

Bounding journald is two minutes of work and it's one of the first things I now do on any new server, before it has a chance to surprise me. Set SystemMaxUse and SystemKeepFree, make sure persistence is how you want it, and occasionally check who's filling the thing. The journal is a brilliant tool. It just shouldn't be allowed to grow until it's the emergency you're using it to diagnose.