The page was the boring kind: disk at 98% on a box that should never get anywhere near it. df confirmed it, but du on /var/log came back small and innocent. The numbers didn't add up, which is always the interesting bit.
The gap between df and du is one of the oldest tells in Unix. A process is holding a deleted file open. The bytes are still on disk, still counting against the filesystem, but the directory entry is gone so du walks straight past them. lsof finds it instantly:
$ lsof | grep deleted
myapp 4821 svc 3w REG 8,1 6884237312 /var/log/myapp/app.log (deleted)
Six and a half gigabytes, in a file with no name. There's your disk.
what actually happened
logrotate had done its job perfectly. At the daily run it renamed app.log to app.log.1, gzipped it, and started a fresh app.log for the new day. That's the normal dance, and for most software it works because logrotate either signals the process or the process reopens its log on its own.
This app did neither. It opened the log file once at startup, held the descriptor, and wrote to it forever. When logrotate renamed the file, the descriptor followed the inode, not the name. So the app cheerfully kept writing into a file that no longer had a path, while the new app.log sat there at zero bytes looking healthy. Every rotation made it worse, because the old "live" file was now also marked deleted and could never be reclaimed until the process let go of it.
the two ways out
The clean fix depends on the app. If it honours a signal to reopen its logs, you tell logrotate to send it after rotation:
/var/log/myapp/*.log {
daily
rotate 14
compress
delaycompress
missingok
notifempty
postrotate
kill -HUP $(cat /run/myapp.pid) 2>/dev/null || true
endscript
}
The trouble was this app ignored SIGHUP entirely. It caught nothing, reopened nothing, and a HUP just killed it outright, which is its own kind of exciting at three in the morning. So the postrotate signal was off the table.
For software like that, the right tool is copytruncate. Instead of renaming the file out from under the app, logrotate copies the current contents to the archive and then truncates the original file in place. The app's descriptor stays valid, still pointing at the same inode, and just carries on writing to a now-empty file. You lose the handful of lines written during the copy-then-truncate window, which for an application log is a fine trade:
/var/log/myapp/*.log {
daily
rotate 14
compress
copytruncate
missingok
notifempty
}
The immediate disk problem cleared the moment I restarted the service, which closed the orphaned descriptor and let the kernel reclaim the six gigabytes. The copytruncate change meant it stopped happening at all.
The lesson, again, is that the filesystem was telling the truth and I just had to ask df and du why they disagreed. A renamed file is not a closed file. The app never knew it was writing into the void, because as far as it was concerned, nothing had changed.