The disk filled up on a box that had logrotate configured perfectly, and that sentence should already be annoying you, because it annoyed me for the best part of an afternoon.
Here is the shape of it. A vendored daemon, the sort of thing you inherit rather than choose, writes to /var/log/thing/app.log. logrotate runs nightly, sees the file is over 100MB, renames it to app.log.1, gzips the old ones, and the new app.log sits there at zero bytes. Tidy. Except df kept creeping up, and du -sh /var/log/thing insisted the directory was small. Those two numbers disagreeing is one of my favourite Linux tells, and it points almost every time at the same thing.
$ df -h /var/log
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 20G 19G 290M 99% /var/log
$ du -sh /var/log/thing
412M /var/log/thing
a file that nobody can see but the kernel still loves
When logrotate renames app.log to app.log.1, the daemon doesn't know or care. It's still holding an open file descriptor pointing at the same inode. On Linux a file isn't really gone until the last name is unlinked and the last descriptor is closed. logrotate gave the inode a new name, then later its compress-and-delete step unlinked the new name too, but the daemon's fd kept the inode alive. So the daemon merrily carried on writing into a file with no directory entry at all. Invisible to du, very visible to df, and growing forever.
lsof is how you prove it, and proving it matters because the symptom looks like a logrotate bug when it absolutely is not.
$ lsof -nP /var/log/thing/* 2>/dev/null | grep deleted
thing 2841 thing 4w REG 8,17 6837412992 131074 /var/log/thing/app.log.1 (deleted)
There it is. 6.3GB of deleted file, held open, write mode, fd 4. The space comes back the instant that process closes the descriptor or exits. You can even watch it: cat /proc/2841/fd/4 will happily stream you the "deleted" log, because the kernel still has the inode.
why the postrotate hook did nothing
The logrotate config did the conventional thing:
/var/log/thing/app.log {
daily
rotate 7
compress
missingok
postrotate
/bin/kill -HUP $(cat /var/run/thing.pid) 2>/dev/null || true
endscript
}
That kill -HUP is the whole social contract. logrotate moves the file aside, then signals the daemon, and a well-behaved daemon catches SIGHUP and reopens its log file by name, picking up the fresh zero-byte app.log. syslog does it, nginx does it, almost everything sensible does it.
This daemon did not. SIGHUP wasn't handled, so the default disposition applied, which for SIGHUP is to terminate. Except the vendor had helpfully installed a handler that caught HUP and did nothing at all, presumably to stop accidental kills, which is the worst of both worlds: it survives the signal and ignores the intent. So logrotate signalled, the daemon shrugged, and kept writing to the now-nameless inode.
the fixes, in order of how much I trust them
The cleanest fix is to make the app reopen its log, but I couldn't change the binary. So:
- copytruncate. Tell logrotate to copy the file's contents to the rotated name and then truncate the original in place, rather than renaming. The inode never changes, so the daemon's fd stays valid and keeps writing to the same (now empty) file. There's a small race where lines written between the copy and the truncate are lost, but for this app that was acceptable.
/var/log/thing/app.log {
daily
rotate 7
compress
copytruncate
missingok
}
-
point the app at logrotate's preferred world differently. If the daemon can be told to log to stdout, you let the init system own the file. On this box that meant a small systemd unit with
StandardOutput=append:and letting journald or a redirect handle rotation, which sidesteps the whole reopen dance. -
a logrotate
postrotatethat actually restarts the service when nothing gentler works. Crude, causes a blip, but if an app won't reopen and can't be truncated safely, a cleansystemctl restart thingin the hook is honest about what's happening.
I went with copytruncate, reclaimed 6.3GB by restarting the service once to drop the orphaned descriptor, and added a Nagios check comparing df against du on that mount so the next time a process clings to a deleted file I hear about it in minutes rather than when the partition hits 99%.
a word on why copytruncate is the compromise and not the answer
I want to be honest that copytruncate is the least-bad option here, not a good one. The mechanism is a copy followed by a truncate, and those are two separate operations. Any line the daemon writes in the window between "logrotate has finished copying" and "logrotate truncates the file" lands in the original file and is then thrown away by the truncate. On a busy log you will lose lines, occasionally, at rotation time. For an audit log that would be unacceptable. For this app, which logs operational chatter I mostly grep through after the fact, losing a handful of lines once a night is a fair price for not filling the disk.
There's a second, subtler cost. truncate sets the file length back to zero but doesn't necessarily free the blocks immediately if anything still holds a reference, and on some filesystems a truncate of a very large file is not instant. None of that bit me here, but it's the sort of thing that turns "I'll just add copytruncate" into a longer afternoon than you budgeted for, so test it on the actual filesystem rather than assuming.
the check that should have existed already
The monitoring gap was the real failure. I had alerts on disk usage, but a single threshold at 90% gives you almost no warning when a deleted-but-open file is growing at a steady clip: you go from fine to paging in the time it takes the file to cross the line. The df versus du discrepancy, by contrast, starts showing the moment the orphaned descriptor appears, long before the disk is in trouble. A check that flags "this mount reports far more used than its files account for" is worth having on any box running software you didn't write, because it catches exactly this class of problem whilst it's still cheap to fix.
used_df=$(df --output=used -k /var/log | tail -1)
used_du=$(du -sk /var/log | cut -f1)
# alert if df exceeds du by more than, say, 1GB
It's crude, and on a busy box the two numbers naturally drift a little, so you want a generous margin rather than an exact match. But a 6GB gap is not drift, it's a held-open deleted file, and now I find out about it on the morning it starts rather than the night it finishes.
The real lesson is the old one. A signal is a request, not a command, and a daemon is free to ignore you. Always check that the thing you're HUPing has agreed to listen, and when it hasn't, watch the gap between what the filesystem thinks it's storing and what its files admit to.