getting rootless containers working without losing my mind

A Linux terminal mid-container-build

Rootless containers are one of those features that sound trivial in the announcement and then quietly consume an afternoon when you actually try to run something non-trivial. I have been moving my containers to run as an unprivileged user, partly for the security posture and partly because I am tired of one compromised image having a clear shot at root on the host. It is genuinely good once it works. Getting there involved a few snags that are worth writing down, mostly so future-me does not have to rediscover them at eleven at night.

This is Podman, not Docker, because rootless is where Podman has always been most at home: no daemon, containers running directly under your user, and a fork-exec model that maps cleanly onto an unprivileged process. Most of what follows applies to rootless Docker too, but the daemon adds wrinkles I was happy to avoid.

the first wall: subuid and subgid

A rootless container needs a range of subordinate user and group IDs to map the container's root (UID 0 inside) to some unprivileged range outside. If those ranges are not configured for your user, you get errors that are technically accurate and completely unhelpful, along the lines of "there might not be enough IDs available in the namespace".

The fix is two files, /etc/subuid and /etc/subgid, each needing an entry for your user with a decent range:

john:100000:65536

That gives my user 65,536 subordinate IDs starting at 100000, which is enough to map a container's full UID range. On a fresh distribution install this is often present already; on a host that has been upgraded across several releases, or had its user created by a config-management tool, it frequently is not. After editing, podman system migrate makes Podman pick up the change cleanly.

the second wall: cgroups v2 and systemd

The bigger snag, and the one that actually ate the afternoon, was resource limits. Setting a memory or CPU limit on a rootless container quietly did nothing, or failed outright, depending on the host.

The reason is cgroups. Rootless resource control needs cgroups v2 with systemd as the cgroup manager, and crucially it needs the relevant controllers delegated to your user's systemd slice. Out of the box, an unprivileged user often cannot manage cpu or memory controllers, because the kernel does not hand those to non-root by default. So podman run --memory=512m either errors or silently ignores you.

The delegation lives in a drop-in for the user slice:

# /etc/systemd/system/[email protected]/delegate.conf
[Service]
Delegate=cpu cpuset io memory pids

After a systemctl daemon-reload and logging the user session out and back in, the controllers are available, and limits start being honoured. You can confirm the host is even capable with podman info and checking that the cgroup version reads v2 and the manager reads systemd. If you are on a host still running cgroups v1, rootless resource limits are essentially a non-starter, and the honest answer is to get the host onto v2 first rather than fighting it.

A server running rootless container workloads

the third wall: ports and lingering

Two smaller things that bite everyone exactly once.

First, privileged ports. A rootless container cannot bind below 1024 by default, because binding low ports is a privileged operation and the whole point here is that you are not privileged. So you do not bind your web service to 443 directly. You either lower the unprivileged-port threshold via net.ipv4.ip_unprivileged_port_start, or, far more sensibly, you bind to a high port and let a reverse proxy on the host own 80 and 443. I went with the proxy, because punching a sysctl hole to give rootless processes low ports rather undermines the reason you went rootless.

Second, lingering. By default a user's systemd session, and anything running under it, is torn down when that user logs out. If you want your rootless containers to survive logout and start at boot, you must enable lingering for the user:

loginctl enable-linger john

Without this, you do the work to set up a tidy rootless service, reboot to test it, and find nothing came up, because the user manager never started. It is a one-liner, but it is a one-liner that is very easy not to know exists.

making it stick with systemd units

A second server node in the rack handling container workloads

The piece that turned this from "a clever demo" into "my actual setup" was generating proper systemd units for the containers, so they are managed like any other service rather than run by hand:

podman generate systemd --new --name myapp \
  > ~/.config/systemd/user/myapp.service
systemctl --user enable --now myapp.service

--new makes the unit recreate the container on start rather than depending on a pre-existing one, which is what you want for something reproducible. Combined with lingering, the container now starts at boot under my user, with its limits honoured, behind a host proxy, and with no daemon running as root anywhere in the picture.

was it worth it

Yes, comfortably. The security argument is real: a container breakout lands you on an unprivileged user with a remapped UID space, not on host root. But the part I did not expect to value as much is that the whole thing is mine. No system daemon, no socket owned by root, no group membership that quietly grants root-equivalent access. My containers, in my session, with my permissions.

The cost is the afternoon above. None of these snags are hard once you know them; they are just invisible until you hit them, and the error messages rarely point at the real cause. If you take three things away: sort out subuid/subgid first, delegate the cgroup controllers second, and remember enable-linger before you reboot and wonder why everything is dead. Get those right and rootless stops fighting you and simply works, which is exactly how it should have felt from the start.