Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

taming journald disk usage

A homelab box ran out of disk because systemd's journal quietly ate it, and the handful of settings that stop that happening again.

A Linux terminal filled with log output

A box ran out of disk. Not a database, not a runaway download, not a forgotten core dump. The systemd journal, sitting in /var/log/journal, had quietly grown to several gigabytes over the better part of a year and finally pushed a modest root filesystem over the edge. The service that actually mattered fell over because the logs telling me about it had no room to be written. There is a special irony in logging filling the disk and then being unable to log the disk filling.

This is not a bug. It is the default behaviour, and the default behaviour is generous because the people who set it assume you have plenty of disk and want history. On a small homelab box with a 32GB root partition, generous becomes a problem.

see what you are dealing with

First, find out how much the journal is actually using. journalctl will tell you directly:

$ journalctl --disk-usage
Archived and active journals take up 3.8G in the file system.

3.8 gigabytes. On this machine that was over ten per cent of the entire root filesystem given over to logs, most of which I would never read. You can also see the oldest entry still on disk, which is often a surprise:

$ journalctl --list-boots | head

If that list goes back to the day you installed the OS, the journal has never been trimmed and it never will be unless you tell it to.

A server's status display showing storage and resource graphs

the immediate fix

You can reclaim space right now without touching any config. The vacuum commands trim the journal down to a size or an age:

# keep only the last 500 MB
$ journalctl --vacuum-size=500M

# or keep only the last two weeks
$ journalctl --vacuum-time=2weeks

This is the satisfying part. It runs in seconds, the disk usage drops immediately, and the box stops being on fire. But it is a one-off. Without a policy the journal will simply grow back to wherever it was heading, and in a year I will be writing this same post again.

the permanent fix

The policy lives in /etc/systemd/journald.conf. The two settings that matter most are SystemMaxUse, an absolute cap on how much disk the journal may consume, and MaxRetentionSec, how long entries are kept regardless of size:

[Journal]
Storage=persistent
SystemMaxUse=500M
SystemKeepFree=1G
MaxRetentionSec=1month
MaxFileSec=1week

A walk through those, because the interaction between them catches people out:

  • SystemMaxUse=500M is the hard ceiling. The journal will never exceed this, full stop. This is the single most important line.
  • SystemKeepFree=1G tells journald to always leave at least a gigabyte free on the filesystem, even if that means dropping below the size it would otherwise keep. This is the safety net for the exact failure I hit, where something else is eating the disk and the journal needs to get out of the way.
  • MaxRetentionSec=1month discards anything older than a month even if there is room. On a homelab box I am almost never debugging something from six weeks ago, so there is no reason to hoard it.
  • MaxFileSec=1week rotates the active journal file weekly, which keeps individual files a sensible size and makes the vacuuming above behave predictably.

Apply it by restarting the service:

$ sudo systemctl restart systemd-journald

The lower of SystemMaxUse and the space implied by SystemKeepFree wins, which is the bit that trips people up. If your disk is nearly full, SystemKeepFree can pull the effective cap well below 500M, and that is correct: keeping the system alive matters more than keeping logs.

a note on volatile storage

There is a more aggressive option for the machines where you genuinely do not care about log history across reboots: set Storage=volatile and the journal lives entirely in /run, in RAM, and evaporates on reboot. I use this on throwaway VMs and build agents where the only logs I care about are the ones from this boot, and persisting them to disk is pure waste. It is not what you want on anything you might need to investigate after the fact, because the first thing an unexpected reboot does is destroy your evidence.

For a real homelab host I want persistence, just bounded persistence. The config above gives me a month of history capped at half a gigabyte, with a hard guarantee that the journal will never be the thing that fills the disk again. That last guarantee is the whole point. Logs are there to help you understand a failure, and a logging system that causes the failure has rather missed the brief.

The wider lesson is the boring one that homelabs teach you over and over: every default that assumes infinite resources is a future incident with a date on it you have not picked yet. The journal is just the one that picked its date first this time.