Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

getting the systemd journal to stop eating the disk

How systemd-journald grows unbounded by default, how to cap it with SystemMaxUse and vacuum settings, and how to stop it bringing /var to its knees.

A Linux terminal with a server in the background

The page said /var was at 96 percent on a box that logs almost nothing interesting. The culprit, as it so often is on a modern systemd host, was /var/log/journal, sat at several gigabytes and growing. Nobody had configured it to do anything in particular, which is precisely the problem. Left to its own devices the journal will happily use a chunk of your disk and only stop when it decides it's used enough, which by default is up to 10 percent of the filesystem. On a 200GB root that's 20GB of logs you never asked for.

This is one of those things that's fine until it isn't. The journal is genuinely good: structured, indexed, queryable in ways syslog never managed. But the defaults assume you'll never look at the disk, and if you run enough machines, one of them will eventually be the machine where that assumption breaks.

what's actually using the space

First, find out how big it is and where:

journalctl --disk-usage

That gives you the headline number. Persistent journals live under /var/log/journal/<machine-id>/, and if that directory exists at all, journald is writing to disk rather than just to a volatile ring buffer in /run. You can see the time range it's holding with:

journalctl --list-boots

If you've got hundreds of boots in there, the journal has been accumulating across every reboot since the machine was installed, which is usually not what anyone intended.

capping it properly

The settings live in /etc/systemd/journald.conf. The two that matter most:

[Journal]
SystemMaxUse=500M
SystemKeepFree=1G
MaxRetentionSec=2week

SystemMaxUse is the hard cap on how much the persistent journal may use. SystemKeepFree is the amount of free space it must leave on the filesystem regardless, which matters when the journal isn't the only thing on that disk. MaxRetentionSec discards entries older than the window no matter how much room is left, which is handy when you care more about "two weeks of logs" than about a byte count.

A server rack in a data centre

After editing, restart the service so it re-reads the config:

systemctl restart systemd-journald

That sets the policy going forward. It does not immediately reclaim the space you've already lost, because the cap applies as new entries arrive. For the box that's at 96 percent right now, you want the vacuum commands.

reclaiming space now

journalctl will trim the existing journal on demand, by size, by age, or by number of files:

journalctl --vacuum-size=500M
journalctl --vacuum-time=2weeks

The size variant rotates and deletes archived journal files until the total is under your figure. It only touches archived files, never the active one it's currently writing, so it's safe to run on a live system. I ran --vacuum-size=500M on the alerting box and watched /var drop from 96 percent to 71 in about four seconds, which bought enough breathing room to stop the page nagging.

A close-up of server status lights

the thing that's actually filling it

Capping the journal treats the symptom. Sometimes it's worth a minute to ask why it filled in the first place, because a journal that grows to gigabytes in days is usually one service shouting into the void. journalctl can tell you who:

journalctl --since "1 hour ago" -o json | \
  jq -r '._SYSTEMD_UNIT' | sort | uniq -c | sort -rn | head

That gives you a rough league table of which unit is the loudest over the last hour. On the box that paged me, it was a misconfigured backup agent logging a full debug line for every file it considered, which at a few hundred thousand files a run is a lot of journal for no benefit to anyone. The right fix there isn't a bigger cap, it's turning the agent's log level down from debug to info, after which the journal stopped being a problem at all and the cap became a safety net rather than a daily reality.

The general principle: a cap stops a runaway from taking the disk, but a service that genuinely needs gigabytes of logs a day is telling you something about its log level, and the cheapest fix is usually upstream of journald entirely. Cap first so you stop bleeding, then go find the noise.

rate limiting

journald has its own rate limiter, and it's on by default, which surprises people who go looking for every log line and find gaps. The relevant knobs, also in journald.conf:

RateLimitInterval=30s
RateLimitBurst=1000

That means a single service may log up to 1000 messages in any 30-second window before journald starts dropping the rest and noting that it did so. For most services that's invisible. For a service mid-meltdown, logging a stack trace per request, it can mean the journal quietly discards exactly the lines you wanted, and you'll find a Suppressed N messages marker where the evidence should have been. If you're chasing an incident and the logs look suspiciously thin, check whether the limiter ate them, and consider raising the burst on hosts where you'd rather keep everything and pay for it with the cap instead.

a note on volatile journals

There's a more aggressive option, which is to not persist the journal at all. If Storage=volatile in journald.conf, the journal lives only in /run and evaporates on reboot, capped by RuntimeMaxUse. I use this on stateless, ephemeral nodes that ship their logs off-box anyway, because keeping a local copy on a machine you'll destroy in an hour is pointless. For anything I might need to log into and investigate after a crash, persistent-but-capped is the right call. You want the logs to survive the reboot that the crash caused, and a volatile journal throws away exactly the evidence you came looking for.

what I actually do now

Two things, both boring, both worth it. First, I bake a sensible journald.conf into the base image so no machine ever ships with the unbounded default; SystemMaxUse=500M and a two-week retention covers the vast majority of hosts I run. Second, I added journalctl --disk-usage output to the same monitoring that watches disk space, so a journal that's misbehaving shows up as a number on a graph rather than as a 3am page about /var. The journal is a good tool that simply assumed I'd never run out of disk. Tell it otherwise once, in the image, and it behaves itself forever after.