Cosine Similarity

Also known as: cosine score, cosine distance

TL;DR

Cosine similarity is the cosine of the angle between two vectors — equivalently, their dot product divided by the product of their magnitudes. It's the standard way to compare embedding vectors for relevance.

Cosine similarity measures how aligned two vectors are, ignoring their magnitudes. Two vectors pointing in the same direction get a score of 1; perpendicular vectors get 0; opposite directions get -1.

The formula:

cos(a, b) = (a · b) / (||a|| × ||b||)

Where is the dot product (sum of pairwise products) and is the 2-norm (Euclidean length) of the vector.

Why it’s the default for embeddings

Embedding models are typically trained so that direction in vector space carries the semantic meaning. Magnitude can drift across inputs (longer texts, different domains) and isn’t a reliable relevance signal. By dividing out the magnitudes, cosine similarity isolates the directional component — which is what the model was actually trained to align.

Cosine similarity only works because trained embedding models actively fight the high-dimensional geometry. Without contrastive training, every pair of random vectors sits at cosine zero.

The unit-vector shortcut

If you normalize all embeddings to unit length offline (), then cosine similarity collapses to a plain dot product:

cos(a, b) = a · b   (when ||a|| = ||b|| = 1)

This is why most embedding indexes store unit-normalized vectors: the inner loop is just a SIMD dot product, no division needed. Approximate-nearest-neighbor libraries (FAISS, HNSW) lean on this aggressively for performance.

Cosine vs dot vs Euclidean

In practice, all three are used:

Similarity functions in production

Cosine — direction-only, magnitude-invariant. Default for text embeddings.
Dot product — direction and magnitude. Used when magnitude is meaningful (e.g., some retrieval fine-tunings deliberately encode confidence in magnitude).
Euclidean (L2) distance — straight-line distance. Equivalent to cosine when vectors are unit-normalized.

For unit-normalized embeddings, all three are mathematically equivalent up to a monotonic transformation, so they produce the same ranking. The choice usually comes down to what your vector index supports natively.

Cosine requires a division by the product of two L2 norms. A division is much more expensive than a fused multiply-add on every modern CPU and GPU — and an ANN index does the inner-loop comparison billions of times during a single search.

If you normalize all stored vectors at index time (), the magnitude of every stored vector is exactly 1. At query time you normalize the query vector once and the inner-product with any indexed vector becomes the cosine — no per-pair division.

The cost of this shortcut is that the index has to commit to one similarity convention. If your downstream code wants Euclidean distance, you have to convert: for unit vectors, , so cosine and Euclidean produce the same ranking. Most ANN libraries (FAISS, HNSWlib, ScaNN) expose both conventions transparently and pick the right kernel internally.

Scoring conventions

A “cosine similarity” of 1 means perfectly aligned. Some libraries report “cosine distance” as , where 0 means perfectly aligned. Read the docs of whichever index you’re using to know which convention is in play.

Go further

What does this look like in high dimensions?

Counter-intuitively, two random unit vectors in 1024+ dimensions are almost always nearly orthogonal — cosine similarity sharply concentrates around 0. So the 'high cosine = relevant' signal only works because trained embedding models actively fight the geometry.

Curse of dimensionality Orthogonality concentration Johnson-Lindenstrauss lemma

When does cosine similarity stop being a useful signal?

At very high dimensions and large corpora, raw cosine flattens out — almost everything is roughly equidistant. Production retrieval addresses this by following first-pass cosine search with a learned similarity function (a cross-encoder reranker) that re-scores the top candidates more carefully.

Cross-encoder Reranker Hybrid search end-to-end (playbook)

What's the role of unit normalization?

Most embedding indexes store unit-normalized vectors so cosine similarity collapses into a plain dot product — one fused-multiply-add per dimension, no division needed. This is why ANN libraries care about whether your vectors are pre-normalized.

2-norm (Euclidean length) Embedding

← All concepts

Posts on the ZeroEntropy blog that reference cosine similarity.

Mar 18, 2026

Bi-Encoders vs Cross-Encoders

Why production search uses two models, not one — and what happens when you pair them.

The best AI teams build with ZeroEntropy models

Book Demo View docs