RAG — From Embeddings to Answers

Free

Module 1 of the AI Engineer path: from vector representations through grounded answers — Retrieval-Augmented Generation end-to-end.

What you'll build

  • Embed and rank documents with OpenAI embeddings
  • Advanced chunking & Metadata filtering
  • pgvector at scale — ANN indexes & hybrid search
  • Grounded answers with citations & RAGAS eval

Lessons

  1. Embed and compare your first documents

    Build a TypeScript script that turns text into vectors using OpenAI's text-embedding-3-small model, then ranks 10 documents by cosine similarity.

  2. Chunk long documents without losing meaning

    Implement three chunking strategies — fixed-window with overlap, recursive paragraph-aware splitting, and sentence-aware chunking — and see how chunk boundaries affect retrieval quality downstream.

  3. From in-memory nearest-neighbour to pgvector at scale

    Index and query embeddings in Postgres with pgvector — the same extension that powers production RAG at scale — and compare it to a naive in-memory baseline.

  4. Retrieve with hybrid dense + BM25 and metadata filters

    Build a retriever that combines vector similarity with lexical BM25 scoring — the production pattern that beats either approach alone — and reduce the candidate set upfront with metadata filters.

  5. Ground answers in sources with citations and 'I don't know'

    Complete the RAG pipeline: generate answers that cite their sources, refuse to hallucinate when retrieval is thin, and surface provenance so learners trust the output.

  6. Evaluate your RAG system with precision@k and faithfulness

    Stop guessing whether your RAG system is good. Build a golden-Q&A eval harness that measures retrieval precision, answer faithfulness, and answer quality — the difference between 'I built it' and 'I ship it'.

  7. Rerank candidates with cross-encoders and reciprocal rank fusion

    Turn high-recall retrieval into high-precision context: fuse dense and lexical rankings with RRF, rerank the top candidates, and decide when reranking is worth the latency.

  8. Metadata Filtering & Hybrid Search

    Vectors aren't everything. Learn how to combine semantic search with hard filters (date, user_id, category) and master the trade-offs of Pre-vs-Post filtering in vector indexes.