when journald quietly ate forty gigabytes

A terminal showing systemd journal output

A disk-space alert fired on a box that had no business being short of space. Nothing big was installed, no runaway log files in the application's own directory, and yet / was wheezing. The culprit was /var/log/journal, sitting on forty-odd gigabytes of binary journal that nobody had ever told to stop growing.

The default behaviour of systemd-journald is reasonable but it's not psychic. Left alone it'll happily use up to 10% of the filesystem, and on a large root partition that 10% is a lot of room to lose to logs you'll never read. Worse, a chatty service can fill that allowance fast and you only find out when something else runs out of space.

see what you've actually got

First, the honest accounting. How big is the journal right now?

journalctl --disk-usage

That gives you the total. To see who's doing the shouting, this is the one I reach for: count log lines by unit over the last day.

journalctl --since "1 day ago" -o json --output-fields=_SYSTEMD_UNIT \
  | grep -oP '"_SYSTEMD_UNIT":"\K[^"]+' \
  | sort | uniq -c | sort -rn | head

Nine times out of ten one or two units dominate, and usually it's something logging the same handled error several times a second.

reclaim the space now

You don't have to wait for a config change to take effect. journalctl can vacuum on demand, either by size or by age:

# keep only the most recent 1G
journalctl --vacuum-size=1G

# or drop anything older than two weeks
journalctl --vacuum-time=2weeks

That bought back most of my forty gigabytes immediately, which calmed the alert and bought me time to do the proper job.

set the limits properly

The lasting fix lives in /etc/systemd/journald.conf (or, better, a drop-in under /etc/systemd/journald.conf.d/ so you're not editing the vendor file). The settings that matter:

[Journal]
# hard cap on persistent journal size
SystemMaxUse=1G
# leave at least this much free on the filesystem
SystemKeepFree=2G
# size of each individual journal file before rotation
SystemMaxFileSize=128M
# how long to keep entries at most
MaxRetentionSec=2week

SystemMaxUse is the headline number: it's the absolute ceiling for the persistent journal, and setting it explicitly means journald stops guessing based on a percentage of a disk that might be much larger than your logging needs. SystemKeepFree is the safety net that stops the journal being the thing that fills the partition. MaxRetentionSec caps age regardless of size, which is handy if you have a compliance reason to keep a fortnight but no reason to keep more.

Apply it with a restart of the service:

systemctl restart systemd-journald

the part people forget: rate limiting

Capping total size deals with the symptom. If a service is logging the same line two hundred times a second, you're still burning IO and rotating useful older entries out of existence to make room for noise. journald has a built-in rate limiter for exactly this:

[Journal]
RateLimitIntervalSec=30s
RateLimitBurst=1000

That allows a burst of a thousand messages per service per thirty-second window, then drops the rest and notes how many it suppressed. The numbers want tuning to your workload, but the principle is sound: a single misbehaving unit shouldn't be allowed to drown out everything else and fill the disk doing it. If one service genuinely needs to log a lot and legitimately, you can override the limit for just that unit rather than lifting it for the whole system.

the boring conclusion that isn't a conclusion

None of this is clever. It's the kind of housekeeping that ought to be in the base image and usually isn't, because journald's defaults are fine right up until they aren't, and "fine until it fills the disk at 3am" is a bad place to discover a default. Set SystemMaxUse, set SystemKeepFree, turn on rate limiting, and the journal becomes a thing you can ignore again, which is exactly what a log subsystem should be.