Lesson 3 of 8

From in-memory nearest-neighbour to pgvector at scale

Index and query embeddings in Postgres with pgvector — the same extension that powers production RAG at scale — and compare it to a naive in-memory baseline.

You're on lesson 3 of 6 in the free RAG module. Unlock the full AI Engineer curriculum →

Step 1 · concept

What a vector store buys you

A vector store holds embeddings and answers a specific question fast: "give me the top-k vectors closest to this query vector." Regular databases answer exact-match queries (WHERE id = 42). Vector stores answer similarity queries.

You could skip the database entirely and keep vectors in a JavaScript array. Brute-force cosine similarity works fine for under 10,000 vectors. Beyond that, four walls hit you at once:

  • Memory. 1M vectors × 1536 dims × 4 bytes ≈ 6 GB. Your Node process can't hold it.
  • Speed. Brute-force is O(n) — a linear scan of every vector on every query.
  • Persistence. When the process restarts, your index is gone.
  • Metadata. You want to filter by source, date, section alongside similarity.

A vector store solves all four. pgvector adds a fifth win: it runs inside the Postgres you already operate — no extra service, no extra auth, no extra backup pipeline.

You're prototyping RAG with 5,000 FAQ entries. A teammate insists you need Pinecone before writing any code. What's the fastest path to a working prototype?