Ramblings of an aging IT geek
← Ramblings of an aging IT geek
rust

parsing a config format with nom, and learning to think in combinators

Building a small config-line parser with nom in Rust, and the mental shift from writing a parser to composing one out of tiny combinators.

A monitor showing program source code

I needed to parse a small custom config format, the sort of key = "value" line you knock out by hand a hundred times, and I decided to do it properly with nom rather than reaching for a regex I'd hate myself for in six months. nom is a parser combinator library, and the thing nobody tells you up front is that the hard part isn't the API, it's unlearning the instinct to write a parser as one big function and learning to assemble it from tiny pieces instead.

The shift is this. With a hand-rolled parser you think procedurally: read a char, is it whitespace, skip it, now read until =, and so on. With combinators you think in terms of small parsers that each consume a bit of input and either succeed with a value and the remaining input, or fail. Then you glue them together. Each combinator is trivial; the parser is what falls out when you compose them.

So for a line like timeout = "30s", I want a parser for the key (an identifier), a parser for the = with optional surrounding spaces, and a parser for a quoted value. Then a combinator that runs all three in sequence and hands me back a tuple.

use nom::{
    bytes::complete::{tag, take_while1, take_until},
    character::complete::{char, multispace0},
    sequence::{delimited, separated_pair},
    IResult,
};

fn identifier(input: &str) -> IResult<&str, &str> {
    take_while1(|c: char| c.is_alphanumeric() || c == '_')(input)
}

fn quoted(input: &str) -> IResult<&str, &str> {
    delimited(char('"'), take_until("\""), char('"'))(input)
}

fn config_line(input: &str) -> IResult<&str, (&str, &str)> {
    separated_pair(
        identifier,
        delimited(multispace0, char('='), multispace0),
        quoted,
    )(input)
}

Code displayed on a dark screen

separated_pair runs the first parser, then a separator parser whose result it throws away, then the second, and gives you the two results you care about. delimited does the same trick around the =, eating the optional whitespace on either side. take_until("\"") grabs everything up to the closing quote. Once it clicked that these are just functions returning functions, it stopped feeling like magic and started feeling like Lego.

The IResult<&str, &str> return type is the bit worth staring at until it makes sense. The first type parameter is what's left of the input, the second is what you parsed out. Every combinator threads the remaining input through to the next one, which is how the whole chain advances through the string without any of the individual pieces knowing about each other. You never write the plumbing; the combinators are the plumbing.

It's not all smooth. The error messages when a type doesn't line up are dense, because you're composing generic functions and the compiler reports the whole nested type when something's off. And complete versus streaming variants will catch you out once: complete assumes you have all the input, which for a config file you do, so import from nom::*::complete and don't think about it again.

Would I use this for key = "value"? Honestly, for something this simple a hand-rolled split would have done. But the format grew, as formats do, picking up quoted strings with escapes, comments, and nested sections, and the combinator version absorbed each new rule by adding one small parser and slotting it in. The hand-rolled version would have turned into a swamp by the third addition. That's the actual win: not the first parser, but the fifth, when the thing you built composes instead of collapses.