Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

the app that kept writing to a deleted log file

logrotate did its job, the disk filled up anyway, and the culprit was an application happily writing to a log file that no longer had a name.

A Linux terminal showing log output

The disk filled up. Logrotate was configured, ran nightly, compressed old logs, kept a week of them, and the disk filled up anyway. du on the log directory showed a sensible few hundred megabytes. df showed the partition at ninety-nine percent. When du and df disagree by gigabytes, you are not looking at a logging problem. You are looking at a deleted file that something still has open.

The application ignored SIGHUP, kept its log file open by file descriptor, and logrotate had renamed and deleted the file out from under it. The application neither knew nor cared. It just carried on writing to a file that no longer had a name, only an inode, and that inode could not be freed because a process still held it open. Invisible to ls, invisible to du, very much present in df.

How the trap is set

Logrotate, by default, rotates by renaming. It moves app.log to app.log.1, then expects the application to be told to reopen its log file, usually by a postrotate hook that sends a signal.

/var/log/myapp/*.log {
    daily
    rotate 7
    compress
    missingok
    notifempty
    postrotate
        /bin/kill -HUP $(cat /var/run/myapp.pid) 2>/dev/null || true
    endscript
}

That kill -HUP is the entire contract. Logrotate moves the file aside, then nudges the app, and a well-behaved app responds to SIGHUP by closing its current log handle and opening a fresh one at the original path. The new path now points at a fresh empty file, everyone is happy.

This app did not respond to SIGHUP. Possibly it used the signal for something else, possibly it ignored it entirely, but the effect was the same: logrotate renamed the file, the app held the old descriptor, logrotate later deleted the rotated-and-compressed copy a week on, and the original inode kept growing in the dark with no name attached to it.

A terminal showing server disk usage output

Confirming it

The tell is always lsof on deleted files. This lists every open file descriptor pointing at something that has already been unlinked, and the size column is the punchline.

lsof +L1 | grep deleted

There it was. The application's PID, a file descriptor, a path ending in (deleted), and a size of several gigabytes. The space cannot come back until either the process closes that descriptor or the process exits, because on Unix a file is only truly gone when the last name and the last open handle are both released. Logrotate had removed the name. The app was still clutching the handle.

If you ever need the space back this instant and cannot restart the process, you can truncate the file in place through /proc, which is an ugly trick but a useful one:

: > /proc/<pid>/fd/<fd>

That empties the open file without the process noticing anything other than its log starting again from zero. It is a fire extinguisher, not a fix.

The actual fix: copytruncate

When an application will not reopen its log on signal, you stop fighting it and change how logrotate works. The copytruncate directive tells logrotate to copy the current log to the rotated name and then truncate the original in place, rather than renaming it. The file descriptor the app holds stays valid because the file is never renamed or deleted, only emptied.

/var/log/myapp/*.log {
    daily
    rotate 7
    compress
    missingok
    notifempty
    copytruncate
}

A close-up of a server rack with status lights

It is not free. There is a small window between the copy and the truncate where any line written lands in neither file and is lost, so for logs where every line is sacred this is the wrong tool and you should make the app behave instead. For application logs where losing the odd line during the nightly rotate is survivable, it is exactly right, and it sidesteps the whole signal-handling argument.

The proper fix, of course, is an application that reopens its log on SIGHUP, or better still one that logs to stdout and lets the init system or the container runtime handle rotation entirely, so the app never owns a log file in the first place. That is where I have ended up on anything I control. But when you are handed a vendor binary that does what it does and will not be argued with, copytruncate is the pragmatic peace treaty.

The lesson I keep relearning is the one about du and df. When they disagree, stop staring at the file listing, because the thing eating your disk is precisely the thing that is not in the listing. lsof +L1 and the word deleted, every time.