Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

When logrotate Wins the Rotation but Loses the Logs

A disk that kept filling despite working log rotation, because the app held the old file open and never noticed SIGHUP.

A server room with a terminal

The disk kept filling up. logrotate was running, the rotated files were there with their nice .1, .2.gz suffixes, everything in /var/log looked exactly as it should. And yet df crept towards full every few days, and lsof told the real story: the application was still writing to a file that, as far as the directory was concerned, no longer existed.

This is the oldest log rotation trap there is, and it gets everyone eventually. When logrotate renames app.log to app.log.1, the running process doesn't care. It's holding an open file descriptor, and a descriptor points at an inode, not a name. The app cheerfully keeps writing to the now-unnamed inode. Disk space stays allocated because the file is still open. You see a rotated log that's frozen in time, and a phantom file you can't ls that's quietly eating the partition.

The usual fix, and why it didn't work

The standard answer is copytruncate or a postrotate hook that nudges the app to reopen its files. Most daemons reopen on SIGHUP. So the config looks like this:

/var/log/app/app.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    postrotate
        kill -HUP $(cat /run/app.pid) 2>/dev/null || true
    endscript
}

Sensible. Except this particular app, a third-party thing I did not write and could not change, ignores SIGHUP entirely. It uses HUP for nothing, swallows it, carries on writing to the old inode. The postrotate ran, returned success, and accomplished precisely nothing. logrotate was winning the rotation and losing the logs.

copytruncate, and its one nasty edge

With the app's behaviour fixed, copytruncate is the pragmatic escape hatch:

/var/log/app/app.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    copytruncate
}

It copies the current log to the rotated name, then truncates the original in place to zero length. The file descriptor the app holds stays valid because the inode never changes, the app keeps writing, and the disk gets its space back. Job mostly done.

The "mostly" matters. There's a window between the copy and the truncate where any line the app writes can be lost, or duplicated into both files, because the two operations aren't atomic. For a chatty service logging hundreds of lines a second, that window will eat the occasional line. For most apps it's a handful of lines a day and nobody cares. For anything where the logs are load-bearing, audit trails, billing, you do not want copytruncate and you need to find an app that reopens its files properly, or front it with something that does.

The honest lesson: log rotation isn't a logrotate problem, it's a contract between logrotate and the process. logrotate can rename and signal all day long. If the process on the other end doesn't reopen its files, you've got a tidy directory and a full disk. Check lsof | grep deleted the moment a disk fills despite "working" rotation. The deleted-but-open files will be sitting right there, telling you who didn't get the memo.