Ramblings of an aging IT geek
← Ramblings of an aging IT geek
rust

parsing a config format with nom, and learning to think in combinators

Writing a small parser for a key-value config format using Rust's nom, and adjusting to building parsers by composing tiny combinators.

Rust source code for a parser on screen

I had a small config format to parse, the sort of thing you'd usually hack together with a few split calls and live to regret. Instead I reached for nom, the parser combinator crate, partly because the job warranted something more robust and partly because I'd been meaning to learn it properly. The format was nothing exotic: lines of key = value, comments starting with #, blank lines ignored.

The mental shift with nom is that you don't write a parser. You write lots of tiny ones and glue them together. A parser is just a function from input to "the bit I matched, and the rest of the input". Each combinator consumes a slice of the front and hands the remainder to the next. Once that clicks, you stop thinking about the string as a whole and start thinking about what to peel off the front next.

A diagram of parsing steps

So a key is "some characters that aren't whitespace or an equals sign":

named!(key<&str, &str>, take_while1!(|c| c != '=' && c != ' '));

A value is "the rest of the line". A line is a key, then optional whitespace, then =, then more optional whitespace, then a value. You write each of those as its own combinator and then compose them. The named! macro was the idiom at the time, and while it looks a bit alien at first, it reads cleanly once you've seen a few. Each piece is independently testable, which is the real win: I could unit-test the key parser on its own without standing up the whole thing.

The part that took adjusting was error handling. nom doesn't throw; a parser either succeeds with a remainder or returns an error or "incomplete". You compose the failure paths as deliberately as the success paths, which felt like extra work right up until the moment a malformed config file gave me a precise position instead of a vague panic. The thing I'd have got from a quick split was a silent wrong answer. The thing I got from nom was an error that pointed at the exact byte.

Was it overkill for a key-value format? Honestly, a bit. I could have shipped the split version in ten minutes. But config formats grow. The day someone wants quoted values with embedded equals signs, or line continuations, the combinator version absorbs it by adding another small parser, whereas the split version gets rewritten from scratch in anger. And I came out of it actually understanding nom, which was the other half of the point. Combinators reward you for thinking in small composable pieces, which is a habit worth having well beyond parsing.