Ramblings of an aging IT geek
← Ramblings of an aging IT geek
rust

nom made me stop hand-rolling parsers, mostly

A first proper go at nom in Rust, parsing a small config-ish format, and what the combinator style actually buys you over a hand-written loop.

Code on a screen, Rust source in view

I had a small text format to parse, a half-baked key-value-with-sections thing of my own making, and my instinct as ever was to reach for a hand-written loop with a cursor and a pile of if statements. I've written that loop a dozen times and got the off-by-one wrong a dozen times. So this time I sat down with nom properly.

The idea behind nom is that a parser is just a function: it takes input, and returns either the rest of the input plus a parsed value, or an error. Once you accept that, you build big parsers by gluing small ones together. A parser for a line is a parser for a key, then =, then a value. You don't manage an index at all.

Here's the shape of it, parsing key = value pairs:

A close-up of a programming editor

use nom::{
    bytes::complete::{tag, take_while1},
    character::complete::{space0, char},
    sequence::separated_pair,
    IResult,
};

fn ident(input: &str) -> IResult<&str, &str> {
    take_while1(|c: char| c.is_alphanumeric() || c == '_')(input)
}

fn kv(input: &str) -> IResult<&str, (&str, &str)> {
    separated_pair(
        ident,
        (space0, char('='), space0),
        take_while1(|c: char| c != '\n'),
    )(input)
}

What struck me is how much the combinators read like the grammar in your head. separated_pair is "a thing, then a separator I don't care about, then another thing". I'm not tracking where I am in the string; nom threads the remaining input through for me, and if a sub-parser fails it backs out cleanly instead of leaving me half-consumed.

The bit that took adjusting to was the error story. nom's default errors are terse, and IResult returning a borrowed slice of the input means lifetimes follow you around. For a real format you'll want nom::error::context to annotate failures, and you may end up reaching for VerboseError so a malformed line tells you something more useful than "Tag". I bolted that on once the happy path worked, which is the right order.

Would I use it for everything? No. For a one-line split, str::split_once is right there and nom is overkill. But the moment there's nesting, optional whitespace, or anything recursive, the hand-rolled loop becomes a liability and the combinator version stays flat and testable. Each small parser tests in isolation, which is the quiet superpower here. I tested ident on its own before it ever met kv, and when the whole thing misbehaved I knew exactly which layer to blame. That alone has earned it a place in the toolbox.