Ramblings of an aging IT geek
← Ramblings of an aging IT geek
rust

a small parser, written with nom

Building a parser for a simple key-value config format in Rust using nom's combinators, and what the macro-heavy style actually feels like to write.

Source code on a screen

I needed to parse a small config format, the sort of thing where reaching for a full grammar feels like overkill but hand-rolling string splitting always ends in tears the moment someone puts a quote in the wrong place. So I tried nom, the parser combinator library everyone in the Rust world keeps recommending, and wrote a parser for lines of key = value with comments and quoted strings.

The mental model took a moment to click. A combinator is just a function that takes input and returns the rest of the input plus whatever it parsed, or an error. You build big parsers by gluing small ones together. In nom 4, which had just landed, that gluing is done with macros, and the result reads oddly at first but grows on you.

Abstract programming imagery

Here is the core of it. A key is an identifier, a value is either a quoted string or a bare token, and a line is a key, an equals, and a value with whitespace allowed around the lot.

#[macro_use]
extern crate nom;

use nom::alphanumeric;

named!(key, take_while1!(|c| nom::is_alphanumeric(c) || c == b'_'));

named!(quoted_value,
    delimited!(char!('"'), take_until!("\""), char!('"'))
);

named!(bare_value, take_while1!(|c| c != b'\n' && c != b' '));

named!(value, alt!(quoted_value | bare_value));

named!(pub kv<(&[u8], &[u8])>,
    do_parse!(
        k: ws!(key)   >>
        ws!(char!('=')) >>
        v: ws!(value) >>
        (k, v)
    )
);

The thing that surprised me is how much the compiler helps once you stop fighting it. The types thread through automatically, so a parser that returns the wrong shape is a build error, not a mystery at runtime. The thing that did not delight me is the error messages when a macro is subtly wrong: you get a wall of expansion that takes real effort to read. That is the cost of the macro-based API, and nom knows it, which is partly why a function-based future is being discussed.

For a format this small I could have written it by hand. But the moment requirements grow, "support comments", "allow escaped quotes", "handle multi-line values", each is a small new combinator slotted in rather than a rewrite of a fragile splitter. That composability is the whole point, and it is genuinely pleasant once the syntax stops looking alien.

Would I use it again? For anything more structured than a flat file, yes, without hesitation. For a two-field format I would probably still reach for split. The trick, as ever, is matching the tool to the size of the problem, and nom is a good tool to have once the problem is real.