Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

the service that would not die

A unit I stopped kept coming back from the dead, and the reason was a socket I had forgotten existed.

A rack of servers in a dim room

I stopped a service. It came back. I stopped it again, properly this time, and it came back again. By the third round I had stopped feeling clever and started feeling watched.

The unit was a small internal API. I wanted it down for ten minutes to swap a config out from under it. So I ran the obvious thing:

systemctl stop my-api.service

systemctl status confirmed it was inactive. Thirty seconds later it was active again, with a fresh PID and a start time that made no sense. Nothing in my history had started it. No cron, no Ansible run, no colleague being helpful in the background.

What I checked, in order

First suspicion is always restart policy. But Restart=on-failure does not resurrect a unit you stopped cleanly; systemd treats a deliberate stop as success and leaves it alone. I checked anyway, and the policy was fine. A stopped unit staying stopped is the whole contract.

Then I went looking for who actually asked for it:

systemctl list-dependencies --reverse my-api.service
journalctl -u my-api.service --since "5 minutes ago"

The journal was the giveaway. Every restart was preceded by a line about an incoming connection on a socket. That was the moment it clicked.

A server in a dim machine room

The socket I forgot I wrote

Months earlier I had set this thing up with socket activation. There was a my-api.socket unit sitting alongside the service, listening on the port, and its entire job was to start the service on demand whenever a connection arrived. I had completely forgotten it existed.

So the sequence was: I stop the service, the socket is still listening, a health check or a stray client connects, and systemd dutifully does exactly what I told it to months ago. It starts the service to handle the connection. From my side it looked like the unit refused to die. From systemd's side it was being a model citizen.

The fix was to stop the socket as well, not just the service:

systemctl stop my-api.socket my-api.service

With the socket down, nothing was listening, nothing triggered activation, and the service finally stayed in its grave long enough for me to do the work. I brought both back together when I was done.

The lesson

Socket activation is genuinely good. It saves resources, it handles startup ordering for you, and it means a service only runs when something needs it. But it also quietly decouples "is this running" from "will this start", and if you forget the socket exists you will swear the machine is haunted.

Now when a unit will not stay dead, the socket is the first place I look, not the last. Check for a matching .socket, check for a .path unit watching a file, check for a .timer. Something is allowed to wake your service up, and it is almost always something you set up yourself and forgot.