Retrieve with hybrid dense + BM25 and metadata filters
Build a retriever that combines vector similarity with lexical BM25 scoring — the production pattern that beats either approach alone — and reduce the candidate set upfront with metadata filters.
You're on lesson 4 of 6 in the free RAG module. Unlock the full AI Engineer curriculum →
Most RAG failures are retrieval failures
When a RAG system gives a wrong answer, the instinct is to blame the LLM. In practice, a large share of RAG failures are retrieval failures — the right information never reached the model. If the correct chunk is not in the context window, even the best model cannot produce the right answer.
Before tuning prompts or switching models, always debug retrieval first. Production retrieval is almost always two-stage:
Query
│
▼
[Vector store] ──→ Top-20 candidates (fast, approximate, recall-biased)
│
▼
[Blend / rerank] ──→ Top-5 (precise, sent to the LLM)
Pure vector search has a blind spot: it matches meaning, not words. A query containing "SKU-2847" or "CVE-2024-12345" needs the exact token. Hybrid retrieval keeps the vector's semantic reach AND lets a keyword score rescue exact matches. That's the first half of retrieval maturity.
The second half is knowing the failure modes:
- Lexical miss. The right chunk contains a rare identifier, acronym, or code that dense retrieval underweights.
- Semantic drift. The retriever returns chunks about the same broad topic but the wrong sub-question.
- Boundary loss. The answer spans two chunks because chunking cut the key sentence or clause in half.
- Context mismatch. The right chunk exists, but freshness, tenant, language, version, or product-line metadata should have filtered the search space first.