parsing a daft little config format with nom

A close-up of code on a screen

I had a small config format to parse this weekend, the kind of thing that starts as key = value and ends up with sections, comments and quoted strings because real configs always do. My first instinct was the usual: split on =, trim, special-case the awkward bits. That instinct is wrong roughly the moment a value contains an = sign, and I knew it, so I reached for nom instead.

nom is a parser combinator library, which is a fancy way of saying you build a big parser out of small ones. You write a thing that matches a quoted string, a thing that matches whitespace, a thing that matches an identifier, and then you glue them together. Each piece is tiny and testable on its own, which is the whole appeal. I'd been put off by older versions where everything went through macros, but the function-style API in the current release reads far more like ordinary Rust.

fn key_value(input: &str) -> IResult<&str, (&str, &str)> {
    let (input, key) = identifier(input)?;
    let (input, _) = tuple((space0, char('='), space0))(input)?;
    let (input, value) = value(input)?;
    Ok((input, (key, value)))
}

The thing that won me over wasn't the speed, though it's quick. It was that when the format changed, adding inline comments after a value, I changed one small combinator and nothing else moved. With my hand-rolled splitter that same change would have meant unpicking a knot of indices and hoping. The parser composes; the string-munging doesn't. I'll be reaching for it again.