Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

the unit that would not die

A stopped systemd service that kept coming back to life, and the socket activation and Restart settings that explain why.

A terminal showing systemd unit status output

I stopped a service. It came back. I stopped it again, more firmly this time, with the quiet conviction of a man who knows how systemctl stop works. It came back again. At this point I did what any reasonable engineer does, which is to take it personally.

The service was a small internal API I needed offline for ten minutes whilst I migrated its database. systemctl stop myapi returned cleanly. systemctl status myapi showed it inactive. I started the migration. Thirty seconds later the logs lit up with connection errors, because the thing was running again, serving traffic against the half-migrated database I had specifically taken it down to avoid.

A server with status lights, cabling in the background

My first assumption was the obvious one: something restarted it. A cron job, a config-management run, a colleague with a nervous trigger finger. So I checked. No Puppet run in the window, no cron entries, nobody else logged in. The journal told the real story, and it was more interesting than human error.

The clue was in journalctl -u myapi: the service did not get restarted by anything external. It got started by its own socket. I had, some months earlier and very pleased with myself, set up socket activation. There was a myapi.socket unit listening on the port, and the moment my migration tooling tried to connect to check whether the old service was really gone, systemd dutifully accepted the connection and spun the service straight back up to handle it. I was resurrecting it myself, with my own health check.

This is socket activation working exactly as designed, which is the most annoying kind of bug. The service unit was stopped. The socket unit was not, and the socket is what was actually keeping the lights on. Stopping myapi.service whilst leaving myapi.socket enabled is like turning off a motion-sensor light and then walking around in front of it.

The fix was to stop the socket as well:

systemctl stop myapi.socket myapi.service

With the socket masked for the duration, nothing answered the port, nothing triggered activation, and the migration ran in blessed silence. Afterwards I started both again and it behaved like a model citizen.

There was a second trap waiting that I want to flag, because I nearly fell into it. The unit also had Restart=always set. If the socket had not got me, that almost certainly would have on a different day: stop the service, and depending on exit code and RestartSec, systemd may bring it straight back. For genuine maintenance windows the honest tool is:

systemctl stop myapi.service
# and if it is truly stubborn, prevent any path back up:
systemctl mask myapi.service myapi.socket

mask points the unit at /dev/null so nothing, not a socket, not a restart policy, not a dependency, can start it until you unmask. It is the difference between asking a service to stop and removing its ability to exist.

The broader lesson is that under systemd a "service" is rarely one unit. It is a small constellation: the service, maybe a socket, maybe a timer, maybe a path watcher, all capable of pulling it back into being. When something refuses to stay dead, do not assume a person did it. Run systemctl list-dependencies and ask which other unit is quietly holding the door open. In my case the culprit was a feature I had been so proud of installing that I had completely forgotten it existed.