Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

logrotate did its job, the app kept writing to a deleted file

A disk that filled despite logrotate working perfectly, because the application held an open handle to a log it had no idea was gone.

A Linux terminal showing log output

The disk filled up on a host where logrotate was configured, working, and rotating the logs on schedule. ls -l in the log directory showed neat, dated, gzipped files, exactly as intended. And yet df said the partition was at 100% and climbing. This is one of those Linux puzzles that feels like the filesystem is lying to you, until you remember how files actually work.

The clue is lsof. Run it and grep for deleted handles:

lsof +L1 | grep deleted

There it was. The application had a file descriptor open on /var/log/app/app.log (deleted), and it was 6GB and growing. logrotate had done precisely what I told it to. It moved app.log aside, gzipped it, and created a fresh empty app.log. But the running process did not know any of that happened. It still held the descriptor to the original inode, which Unix politely keeps alive as long as something has it open, even though there is no name left pointing at it. So the app kept writing into a ghost file that ls could not see and only lsof could find, and the space would not come back until that process closed the handle.

A server with status lights

The reason this happens is the copytruncate versus signal split, and it is worth getting right because the wrong choice causes either this bug or a different one.

The clean way is for logrotate to rename the file, create a new one, and then tell the app to reopen its logs. The telling is usually a SIGHUP, configured with a postrotate block:

/var/log/app/*.log {
    daily
    rotate 14
    compress
    missingok
    notifempty
    postrotate
        /bin/kill -HUP $(cat /run/app.pid) 2>/dev/null || true
    endscript
}

That works beautifully, on the strict condition that the app actually does something useful when it receives a HUP. Mine did not. It had no signal handler for it at all, so the default disposition applied and the signal was effectively ignored for the purpose I wanted. The rename happened, the HUP went out, and the app sailed on writing to the deleted inode, blissfully unaware.

When you cannot make the app reopen its log, the fallback is copytruncate. Instead of renaming, logrotate copies the current file's contents to the rotated name and then truncates the original in place to zero length. The inode never changes, so the app's open descriptor stays valid and just carries on writing at the start of the now-empty file. The cost is a small race window: anything written between the copy and the truncate is lost, and you briefly use double the space during the copy. For most logs that is an acceptable trade.

I switched that app's config to copytruncate, killed the stale process so the 6GB ghost actually freed, and watched df finally drop. The longer-term fix was a note in the runbook and a quiet word with the developers about handling SIGHUP, because copytruncate is a workaround for an app that should be reopening its own files.

The thing to remember is that on Linux a file is not gone when you delete it. It is gone when the last thing holding it open lets go. logrotate was never the problem. The app's refusal to listen was.