AI Learning

Step 1 · concept

Text becomes geometry

An embedding is a numerical representation of text as a vector — a list of floating-point numbers that live in a high-dimensional space. When you embed a sentence, you project its meaning into a coordinate system where similar meanings cluster together.

Think of it like a map. On a real map, cities that are close geographically sit near each other. In embedding space, sentences that are close semantically sit near each other — even if they use completely different words.

"The cat sat on the mat"     → [0.12, -0.34, 0.56, ...]  (1536 dims)
"A kitten rested on a rug"   → [0.11, -0.33, 0.55, ...]  (nearly identical)
"Stock prices rose sharply"  → [0.89,  0.12, -0.67, ...]  (very different)

The embedding model learned this mapping through contrastive training on billions of text pairs. By the time you call the API, it already knows that cat and kitten mean nearly the same thing — and that sat and rose do not.

cosine similarity

Which pair best demonstrates embeddings capturing similar meaning even when the wording changes?