I had a log format that wasn't quite a log format. Lines like 2019-07-30T11:04:02Z [warn] queue=ingest depth=204 host=node-3, except that the fields drifted between versions, and the regex I'd been carrying around for it had grown to the point where I no longer trusted my own changes to it. So I rewrote the parser in Rust using nom, and I'm glad I did.
nom is a parser combinator library. Instead of one big pattern, you write small parsers that each consume a known shape, and you glue them together. The whole thing composes, and crucially the compiler keeps you honest about what each piece returns.
A field at a time
The smallest useful parser here matches a key=value pair. With nom you describe it as a sequence: some non-equals characters, an equals sign, then some non-space characters.
use nom::bytes::complete::{tag, take_till};
use nom::character::complete::char;
use nom::sequence::separated_pair;
use nom::IResult;
fn kv(input: &str) -> IResult<&str, (&str, &str)> {
separated_pair(
take_till(|c| c == '='),
char('='),
take_till(|c| c == ' '),
)(input)
}
That returns the remaining input plus the captured pair, and it threads errors through for you. The shape that took me three goes to get right in a regex is, here, just separated_pair. You read it and you know what it does.
Why it felt safer
Two reasons, really.
The first is that nom forces the failure cases into the open. Every parser returns an IResult, so an unexpected byte isn't a silent partial match that quietly corrupts the next field. It's an error value you have to handle, at the exact point it happened.
The second is composition. When the log format added a trace_id field in a later version, I didn't reach back into a monolith. I wrote a parser for the new field and slotted it into the list. The old fields didn't notice. That's the difference between extending something and disturbing it.
The cost is real, to be fair. There's a learning curve, the error types take a while to sit comfortably in your head, and for a genuinely throwaway one-liner a regex is still faster to write. nom 5.0 landed recently and moved everything to functions rather than the old macros, which makes the examples online a bit of a mixed bag depending on when they were written. Worth knowing before you go searching.
But for anything I expect to maintain, where the format will shift and I'll be the one cursing at it in six months, the combinator approach wins. The parser is now something I can read top to bottom and believe. My regex never gave me that.