the runc breakout and the joy of a shared kernel

Tech news headline on a screen

This week's disclosure is the runc breakout, CVE-2019-5736, and unlike most container security scares that turn out to be theatre, this one made me stop and patch the same day. A malicious container, or a benign one running an image you didn't write carefully, can overwrite the host's runc binary and from there execute code on the host with root. That is the bad one. That is the breakout people have been hand-waving about for years, finally written down with a CVE number attached.

The mechanism is grimly clever. When you exec into a running container, the host's runc enters the container's namespaces. The exploit gets runc to open itself via /proc/self/exe, then overwrites that binary from inside the container. Next time anyone runs runc on the host, they run the attacker's payload as root. Your container escaped not by breaking a wall but by editing the door.

City skyline at dusk

What makes this land, for me, is that it punctures the comfortable lie a lot of people tell themselves about containers. A container is not a VM. It is a set of namespaces and cgroups over a shared kernel, with a thin runtime gluing it together. Most of the time that's a fine trade, lighter and faster than a VM, and I run plenty. But "isolation" was always doing more rhetorical work than the technology could back up. When the runtime that enforces the boundary has a hole, the boundary is gone, because there was only ever the one kernel underneath.

The good news is the response has been quick and grown-up. The runc maintainers had a fix out fast, and the distros and the big registries moved within days. If you run containers anywhere, the action is dull and urgent in equal measure: update runc (or containerd/Docker, whatever pulls it in) to the patched version, today, on every host. There's no clever mitigation that beats just patching it.

For what it's worth, here's what I changed beyond patching. I'd been lazy about running containers as root, and this was the nudge to stop. User namespaces, so the container's root maps to an unprivileged host user, would have blunted this. Dropping capabilities I never needed. And treating docker run on an untrusted image with the same suspicion I'd give running its entrypoint as root on the host directly, because as of this week that's demonstrably what it can amount to.

None of this is panic. Containers are still good and I'm still using them tomorrow. But it's a useful, specific reminder that the kernel is shared, the runtime is part of your trust boundary, and "it's in a container" is the start of a security argument, not the end of one. Patch your hosts.