Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

when logrotate quietly stops working because the app won't reopen

An app that held its log file open after rotation filled a disk with deleted-but-open data, and how copytruncate and a proper reopen fixed it.

A Linux terminal showing log files and disk usage

The disk filled up on a box where du swore blind there was plenty of space. That contradiction is one of my favourite Unix tells, and it points almost every time at the same thing: a file that has been deleted but is still held open by a running process. The space cannot be reclaimed until the process lets go.

The culprit here was logrotate doing exactly what it was told, and an application refusing to play along. logrotate had renamed app.log to app.log.1, gzipped the older ones, and removed the oldest. But the app had the original inode open and kept writing to it. As far as the kernel was concerned, that data was still live, just nameless. df saw a full disk; du saw nothing to account for it.

Confirming it

lsof settles the argument instantly:

lsof +L1   # files with a link count below 1: open but unlinked

There it was, the app's process holding a deleted log file that had grown to several gigabytes. The fix at 2am was the blunt one: restart the service, the file handle closed, the space came back, the disk breathed out.

A rack server with status lights

The real fix

The proper fix is to make rotation and the application agree on what is happening. There are two honest ways to do it.

The clean way is to have logrotate signal the app to reopen its log after rotation, usually with a postrotate script that sends SIGHUP or SIGUSR1:

/var/log/myapp/*.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    postrotate
        systemctl kill -s HUP myapp.service
    endscript
}

That only works if the application actually listens for the signal and reopens its files. This one did not. It logged straight to a file descriptor it opened once at startup and never reconsidered, and it treated SIGHUP as "terminate". Sending it the signal would not reopen the log, it would just kill the process.

For an app that genuinely cannot reopen on demand, the pragmatic answer is copytruncate:

/var/log/myapp/*.log {
    daily
    rotate 14
    compress
    copytruncate
}

Instead of renaming the file out from under the app, logrotate copies the current contents aside and then truncates the original in place. The app keeps writing to the same inode, which is now empty, and nobody holds a deleted ghost. The cost is a small race window where lines written during the copy can be lost, and a brief doubling of disk use, but for a chatty app that ignores signals it is the lesser evil.

The takeaway

Rotation is not just moving files around; it is a contract between the rotator and the writer about who owns the file descriptor. If the app reopens on signal, signal it. If it stubbornly will not, copytruncate and accept the small race. The failure mode of getting this wrong is invisible until a disk fills with data that nothing will admit exists.