Rerank candidates with cross-encoders and reciprocal rank fusion
Turn high-recall retrieval into high-precision context: fuse dense and lexical rankings with RRF, rerank the top candidates, and decide when reranking is worth the latency.
You're on lesson 7 of 6 in the free RAG module. Unlock the full AI Engineer curriculum →
Step 1 · concept
Retrieval is usually a two-stage system
Dense retrieval is fast and good at recall. It finds semantically related candidates. But the top-5 is often noisy: the right chunk is present, just not ranked first. That is the exact job of reranking.
Production retrieval usually looks like this:
query
│
▼
[dense + lexical retrieval] -> top-20 or top-50 candidates
│
▼
[reranker / cross-encoder] -> top-5 sent to the LLM
The first stage is recall-biased: do not miss the good chunk. The second stage is precision-biased: put the best chunk first and push distractors down.
cross-encoder