Ramblings of an aging IT geek
← Ramblings of an aging IT geek
news

the xz backdoor, and the volunteer who happened to be paying attention

Reacting to the xz-utils backdoor disclosed in late March, and what a near-miss supply-chain attack says about how we maintain the software everything depends on.

A city skyline at dusk representing infrastructure

For the better part of two weeks my feeds have been about one thing, and for once it deserves the attention. On 29 March a Microsoft engineer, Andres Freund, posted to the oss-security list that he had found a backdoor in xz-utils, the compression library, tracked as CVE-2024-3094. He found it because SSH logins on a test machine were taking about half a second longer than they should, and he had the curiosity and the time to chase down why. That is the entire reason we are not living through a much worse week.

Let me say plainly how bad this could have been. The backdoor was planted in the upstream source of liblzma, which gets linked into all sorts of things, and the payload specifically targeted sshd. A compromised xz in the wrong distributions would have handed an attacker remote access to an enormous number of internet-facing Linux servers. It was caught while it was still working its way into the testing and unstable branches, before it reached the stable releases most of us actually run. We got lucky, and the luck had a name and a job at Microsoft.

A wide view of a city at night

the part that keeps me up

The mechanism of the attack is clever and worth understanding, but it is not the part that unsettles me. What unsettles me is the social engineering that made it possible. The backdoor did not get in through a clever exploit of a build system. It got in because a person, operating under a name, spent something like two years contributing useful patches, building trust, becoming a co-maintainer of a project that one exhausted volunteer had been carrying largely alone. There were even sock-puppet accounts applying pressure on the original maintainer to hand over more control, complaining about how slow releases were, manufacturing the very burnout that creates a vacancy.

That is the attack I cannot defend against with a scanner. You can lint code, you can audit dependencies, you can pin versions and check signatures. You cannot easily detect a patient, well-resourced adversary who is willing to spend two years being a model open-source citizen in order to earn the commit bit. The trust model of open source is also its attack surface, and this is the cleanest demonstration of that I have ever seen.

what this is really about

I have spent a fair amount of my career leaning on libraries maintained by people I have never met, for free, in their spare time. xz is exactly that kind of project. Critical, ubiquitous, and held up by a maintainer who, by his own account, was dealing with mental health struggles and very little support. That is not an exception in our ecosystem. That is the median case for the load-bearing infrastructure underneath nearly everything.

We have built an industry on the unpaid labour of a small number of exhausted volunteers, and then expressed surprise when one of them, worn down and short of help, accepts assistance from someone who turns out to have an agenda. The lesson is not "audit your dependencies harder", though you should. The lesson is that the people maintaining the libraries you depend on are a single point of failure, and we treat them accordingly only after something nearly goes catastrophically wrong.

Another night-time city view

A few things I have actually done this week, none of them heroic:

  • Checked which of my boxes were running affected versions of xz. None were on the stable branches that matter, which is the only reason this is an interesting blog post rather than an incident report.
  • Looked at my own dependency tree with fresh and slightly paranoid eyes, particularly the small, single-maintainer packages I had never thought about because they "just work".
  • Started actually sponsoring a couple of the maintainers whose work I rely on, which I should have been doing years ago and had simply never got around to.

That last one feels like the only response that addresses the real problem. You cannot scanner your way out of a social attack. But you can make it less likely that a maintainer is so isolated and so burned out that a friendly stranger offering to help looks like a lifeline rather than a risk. Funding, co-maintainers who are actually present, and a culture that does not demand free labour at the pace of a paid team. None of that is glamorous, and none of it makes a good headline.

The headline this time is that we got away with it. Andres Freund noticed half a second of latency and pulled on the thread, and the whole thing unravelled before it shipped. That is a wonderful outcome and a terrifying one in equal measure, because the next attempt will be quieter, and the next maintainer might be a little more tired, and we might not have someone with the right instincts watching the right test machine on the right day.

The fix is not technical. It rarely is. We need to look after the people holding up the foundations, and right now we mostly do not. This week was the bill arriving, and we very nearly couldn't pay it.