Ramblings of an aging IT geek
← Ramblings of an aging IT geek
ai

my first rag pipeline was a confident liar

A first attempt at retrieval-augmented generation over my own notes failed not because of the model but because chunking and retrieval were sloppy.

A small robot on a desk

I built a little RAG system over my own notes, asked it a question I knew the answer to, and got back a fluent, well-structured, completely wrong answer. The model wasn't the problem. The retrieval was, and I'd spent all my attention on the wrong half.

The setup was the obvious one. Embed all my Markdown notes, stuff them in a vector store, embed the question, pull the top few chunks, hand them to the model with a "use only this context" prompt. It demos beautifully and falls over the moment you ask it something specific.

A close-up of a circuit board

Two things were broken. First, my chunking was naive: a fixed window of characters with no overlap, splitting straight through the middle of the sentences that actually mattered. The answer to my test question lived across a chunk boundary, so neither half on its own said anything useful, and neither got retrieved with a high score. Adding overlap and splitting on headings and paragraphs instead of raw character counts fixed most of it on its own.

Second, I was trusting cosine similarity to mean relevance, and it doesn't, not reliably. Top-k by vector distance happily returns chunks that are about the same topic but don't contain the fact you asked for. The model then dutifully synthesises an answer from material that looks right and isn't. Adding a re-ranking pass over a larger candidate set, and actually printing the retrieved chunks during development so I could see what the model was being fed, turned it from guesswork into something I could debug.

The lesson that stuck: when a RAG answer is wrong, look at the context before you blame the model. Nine times out of ten the right passage was never retrieved, or was retrieved and buried under three near-misses. A model can only reason over what you put in front of it, and mine was being handed nonsense with great confidence. Fix retrieval first. The generation was never the hard part.