my first rag pipeline retrieved confidently wrong things

A small robot figure on a desk

My first RAG build did exactly what the tutorials promised and was completely useless. It embedded a pile of internal docs, stuffed the top matches into the prompt, and answered questions with total confidence and frequent wrongness. The model wasn't the problem. The retrieval was.

The first mistake was chunking by a fixed character count. I split everything into 1000-character blocks with no overlap, which meant half my chunks started or ended mid-sentence, and the one paragraph that actually answered a question was routinely sawn in two across a boundary. Neither half retrieved well, because neither half was a coherent thought.

The second was trusting cosine similarity to mean relevance. It doesn't, quite. A chunk that shares vocabulary with the question scores well even when it says nothing useful, so a query about "renewing a certificate" happily pulled back a chunk about renewing a subscription because the words rhymed in embedding space. With no re-ranking and no threshold, I fed the model four near-misses and asked it to be clever. It obliged by inventing the bit that wasn't there.

The fix wasn't fancier. Chunk on semantic boundaries with overlap, add a cheap re-ranker after the vector search, and set a floor below which I retrieve nothing and let the model say it doesn't know. The lesson I keep relearning: RAG is a search problem wearing an AI hat, and if your search is bad, a better model just lies more fluently.