my first rag pipeline retrieved confidently and answered wrong

A small robot, an illustration of AI

My first RAG pipeline was useless, and it was useless in the most flattering way: it answered every question fluently and confidently, and was wrong often enough to be dangerous. The whole point of retrieval-augmented generation is to ground the model in your actual documents so it stops making things up. Mine retrieved the wrong documents and then made things up about those.

The problem was chunking, almost entirely. I'd split my documents into fixed 1000-character blocks, slicing mid-sentence and mid-table, so a chunk would contain the back half of one idea and the front half of an unrelated one. The embedding of that mush meant nothing, so the nearest-neighbour search returned plausible-looking rubbish, and the model dutifully synthesised an answer from rubbish. Garbage in, confident garbage out.

Two changes did most of the work. First, I chunked on structure, by heading and paragraph, with a bit of overlap so context didn't fall off a cliff at the boundaries, rather than by character count. Second, I stopped using the cheapest embedding model I'd grabbed and moved to one actually meant for retrieval, which clustered related passages far more sensibly. Suddenly the retrieved context was relevant, and a model fed relevant context is a wholly different beast.

The lesson I keep relearning: in RAG, the model is rarely the weak link. Retrieval is. Get the chunks right before you blame the LLM.