AI Learning

Step 1 · concept

Retrieval is usually a two-stage system

Dense retrieval is fast and good at recall. It finds semantically related candidates. But the top-5 is often noisy: the right chunk is present, just not ranked first. That is the exact job of reranking.

Production retrieval usually looks like this:

query
  │
  ▼
[dense + lexical retrieval]  -> top-20 or top-50 candidates
  │
  ▼
[reranker / cross-encoder]   -> top-5 sent to the LLM

The first stage is recall-biased: do not miss the good chunk. The second stage is precision-biased: put the best chunk first and push distractors down.

cross-encoder

Your retriever has hit@20 = 95% but precision@5 = 35%. The right chunk is usually in the candidate pool, but users still get noisy context windows. What is the right next move?