AI Learning

Step 1 · concept

Most RAG failures are retrieval failures

When a RAG system gives a wrong answer, the instinct is to blame the LLM. In practice, a large share of RAG failures are retrieval failures — the right information never reached the model. If the correct chunk is not in the context window, even the best model cannot produce the right answer.

Before tuning prompts or switching models, always debug retrieval first. Production retrieval is almost always two-stage:

Query
  │
  ▼
[Vector store] ──→ Top-20 candidates (fast, approximate, recall-biased)
                       │
                       ▼
                 [Blend / rerank] ──→ Top-5 (precise, sent to the LLM)

Pure vector search has a blind spot: it matches meaning, not words. A query containing "SKU-2847" or "CVE-2024-12345" needs the exact token. Hybrid retrieval keeps the vector's semantic reach AND lets a keyword score rescue exact matches. That's the first half of retrieval maturity.

The second half is knowing the failure modes:

Lexical miss. The right chunk contains a rare identifier, acronym, or code that dense retrieval underweights.
Semantic drift. The retriever returns chunks about the same broad topic but the wrong sub-question.
Boundary loss. The answer spans two chunks because chunking cut the key sentence or clause in half.
Context mismatch. The right chunk exists, but freshness, tenant, language, version, or product-line metadata should have filtered the search space first.

BM25

Your RAG system confidently tells a user that their product ID 'SKU-2847' is in stock, but the real answer lives in a chunk that uses the exact string 'SKU-2847'. The retriever returned five chunks; none contained that ID. Where is the failure?