Lesson 7 of 8

Rerank candidates with cross-encoders and reciprocal rank fusion

Turn high-recall retrieval into high-precision context: fuse dense and lexical rankings with RRF, rerank the top candidates, and decide when reranking is worth the latency.

You're on lesson 7 of 6 in the free RAG module. Unlock the full AI Engineer curriculum →

Step 1 · concept

Retrieval is usually a two-stage system

Dense retrieval is fast and good at recall. It finds semantically related candidates. But the top-5 is often noisy: the right chunk is present, just not ranked first. That is the exact job of reranking.

Production retrieval usually looks like this:

query
  │
  ▼
[dense + lexical retrieval]  -> top-20 or top-50 candidates
  │
  ▼
[reranker / cross-encoder]   -> top-5 sent to the LLM

The first stage is recall-biased: do not miss the good chunk. The second stage is precision-biased: put the best chunk first and push distractors down.

Your retriever has hit@20 = 95% but precision@5 = 35%. The right chunk is usually in the candidate pool, but users still get noisy context windows. What is the right next move?