Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

when logrotate and a stubborn daemon disagree

A service that kept writing to a deleted log file after rotation, why SIGHUP did nothing, and the three ways out of it depending on what the app supports.

A terminal showing log files and disk usage

The symptom was a disk filling up that, according to ls -lh, had no business filling up. The log file was a sensible size. df said the partition was nearly full. The two disagreed, and when ls and df disagree the answer is almost always a deleted file that something is still holding open.

This is the oldest log-rotation footgun there is, and it caught me again on a service I'd inherited, so here's the whole thing written down so the next person (probably me, in a year) doesn't have to rediscover it.

what was actually happening

logrotate's default mode renames the live log out of the way and creates a fresh one. So app.log becomes app.log.1, and a new empty app.log appears. The problem: the running process opened app.log by file descriptor at startup and it's still writing to that descriptor. The rename doesn't change where the fd points. The process happily keeps writing to the file now called app.log.1, the new app.log stays empty, and your shiny rotation has achieved nothing except confusing you.

The fix in the rotation world is the postrotate hook, which is meant to poke the app and say "reopen your files please". Conventionally that poke is SIGHUP. Here's the config that's supposed to do it:

/var/log/myapp/app.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    postrotate
        kill -HUP $(cat /var/run/myapp.pid) 2>/dev/null || true
    endscript
}

That config is fine. It just didn't work, because this particular daemon doesn't handle SIGHUP. The default disposition of SIGHUP is to terminate the process, but this app installs a handler that does nothing useful with it, or rather a handler that logs "received signal" and carries on writing to the same fd. So the signal arrived, the app shrugged, and the deleted file kept growing.

You can confirm the held-open deleted file directly, which is worth doing before you start changing things:

# lsof +L1
COMMAND  PID  USER   FD   TYPE  ...  NLINK  NAME
myapp   4123  myapp  3w   REG   ...      0  /var/log/myapp/app.log.1 (deleted)

NLINK of 0 and the (deleted) tag is the smoking gun. The space won't come back until that fd is closed, which is why a process that's been running for weeks can be sitting on gigabytes that du can't see.

A diagram-style terminal showing lsof output of a deleted held-open file

the three ways out

Which one you pick depends entirely on what the app supports, so figure that out first.

One: it has a real reopen mechanism, just not on SIGHUP. Some apps reopen logs on SIGUSR1 instead, or have a myapp reload subcommand, or expect a touch of a control socket. Read the manual. If there's a documented signal or command, use it in postrotate and you're done. This is the right answer when you can get it, because nothing restarts and no logs are lost.

Two: it can't reopen at all, so use copytruncate. logrotate has a mode that copies the current log aside and then truncates the original in place. Because it truncates the same inode the process is writing to, there's no rename and no held-open deleted file, the fd just carries on at offset zero.

/var/log/myapp/app.log {
    daily
    rotate 14
    compress
    copytruncate
    missingok
    notifempty
}

The catch, and it's a real one: there's a window between the copy finishing and the truncate happening where new lines can be written, and those lines get thrown away by the truncate. Under heavy logging you will lose a few lines per rotation. For most apps that's an acceptable price for not having to restart anything. For an audit log it is not, and you should solve the reopen problem properly instead.

Three: bypass the app entirely and log to a pipe. If the app writes to stdout and you're running it under a supervisor, the cleaner long-term answer is to stop having the app own a file at all. Pipe it into something that handles rotation on the read side, multilog from daemontools, or these days letting the journal or your init system's logging take it. Then "rotation" is the logger's problem and the app never holds a file open in the first place. This is more work to set up but it's the version that stops biting you.

what I actually did

This app understood SIGUSR1, it just didn't say so anywhere obvious. I found it by reading the source, because the README was no help and the man page didn't exist. Swapped the postrotate to send USR1 instead of HUP, ran a forced rotation with logrotate -f, checked lsof +L1 came back empty, and watched df and ls agree again for the first time in a week.

The lesson I keep relearning: SIGHUP is a convention, not a contract. Plenty of software predates the convention or ignores it on purpose, and logrotate sending a signal into the void looks exactly like success right up until your disk fills. So when you wire up rotation for anything you didn't write, don't trust that the postrotate worked because logrotate exited zero. Force a rotation, then go and look at lsof and df with your own eyes. The exit code only tells you logrotate ran. It tells you nothing about whether the app listened.