Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

when logrotate rotates the file and the app keeps writing to nowhere

A service that filled the disk because logrotate renamed its log file but the app held the old file descriptor open and never reopened it.

A terminal showing df and a full filesystem

The page was the boring kind: disk at 98% on a box that should never get anywhere near it. df confirmed it, but du on /var/log came back small and innocent. The numbers didn't add up, which is always the interesting bit.

The gap between df and du is one of the oldest tells in Unix. A process is holding a deleted file open. The bytes are still on disk, still counting against the filesystem, but the directory entry is gone so du walks straight past them. lsof finds it instantly:

$ lsof | grep deleted
myapp  4821  svc  3w  REG  8,1  6884237312  /var/log/myapp/app.log (deleted)

Six and a half gigabytes, in a file with no name. There's your disk.

what actually happened

logrotate had done its job perfectly. At the daily run it renamed app.log to app.log.1, gzipped it, and started a fresh app.log for the new day. That's the normal dance, and for most software it works because logrotate either signals the process or the process reopens its log on its own.

This app did neither. It opened the log file once at startup, held the descriptor, and wrote to it forever. When logrotate renamed the file, the descriptor followed the inode, not the name. So the app cheerfully kept writing into a file that no longer had a path, while the new app.log sat there at zero bytes looking healthy. Every rotation made it worse, because the old "live" file was now also marked deleted and could never be reclaimed until the process let go of it.

A server rack and a status terminal

the two ways out

The clean fix depends on the app. If it honours a signal to reopen its logs, you tell logrotate to send it after rotation:

/var/log/myapp/*.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    postrotate
        kill -HUP $(cat /run/myapp.pid) 2>/dev/null || true
    endscript
}

The trouble was this app ignored SIGHUP entirely. It caught nothing, reopened nothing, and a HUP just killed it outright, which is its own kind of exciting at three in the morning. So the postrotate signal was off the table.

For software like that, the right tool is copytruncate. Instead of renaming the file out from under the app, logrotate copies the current contents to the archive and then truncates the original file in place. The app's descriptor stays valid, still pointing at the same inode, and just carries on writing to a now-empty file. You lose the handful of lines written during the copy-then-truncate window, which for an application log is a fine trade:

/var/log/myapp/*.log {
    daily
    rotate 14
    compress
    copytruncate
    missingok
    notifempty
}

The immediate disk problem cleared the moment I restarted the service, which closed the orphaned descriptor and let the kernel reclaim the six gigabytes. The copytruncate change meant it stopped happening at all.

The lesson, again, is that the filesystem was telling the truth and I just had to ask df and du why they disagreed. A renamed file is not a closed file. The app never knew it was writing into the void, because as far as it was concerned, nothing had changed.