Cosine Similarity

Also known as: cosine score, cosine distance

TL;DR

Cosine similarity is the cosine of the angle between two vectors — equivalently, their dot product divided by the product of their magnitudes. It's the standard way to compare embedding vectors for relevance.

Cosine similarity measures how aligned two vectors are, ignoring their magnitudes. Two vectors pointing in the same direction get a score of 1; perpendicular vectors get 0; opposite directions get -1.

COSINE SIMILARITYDirection, not magnitude.dim 1dim 20abθCOS(θ)0.47DOT PRODUCT OVER MAGNITUDEScos(θ)=a·b‖a‖×‖b‖=0.47LARGE ANGLE → LOW SIMILARITY

The formula:

cos(a, b) = (a · b) / (||a|| × ||b||)

Where is the dot product (sum of pairwise products) and is the (Euclidean length) of the vector.

Why it’s the default for embeddings

Embedding models are typically trained so that direction in vector space carries the semantic meaning. Magnitude can drift across inputs (longer texts, different domains) and isn’t a reliable relevance signal. By dividing out the magnitudes, cosine similarity isolates the directional component — which is what the model was actually trained to align.

Cosine similarity only works because trained embedding models actively fight the high-dimensional geometry. Without contrastive training, every pair of random vectors sits at cosine zero.

The unit-vector shortcut

If you normalize all embeddings to unit length offline (), then cosine similarity collapses to a plain dot product:

cos(a, b) = a · b   (when ||a|| = ||b|| = 1)

This is why most embedding indexes store unit-normalized vectors: the inner loop is just a SIMD dot product, no division needed. Approximate-nearest-neighbor libraries (FAISS, HNSW) lean on this aggressively for performance.

Cosine vs dot vs Euclidean

THREE WAYS TO SCORE A PAIRSame direction, very different numbers.dim 1dim 20THE PAIR · q AND dq‖q‖ = 3d‖d‖ = 9θnear-aligned direction, very different lengths.COSINE · cos θmagnitude-invariant: only the angle matters.SCORE0.95∈ [−1, 1]L2 · ‖q − d‖straight-line distance — changes with magnitude.SCORE4.7∈ [0, ∞)DOT · q · dmagnitudes × cos θ — mixes both signals.SCORE8.3∈ (−∞, ∞)SAME ANGLE, VERY DIFFERENT SCORES.cosine alone normalizes for length.

In practice, all three are used:

Similarity functions in production
  • Cosine — direction-only, magnitude-invariant. Default for text embeddings.
  • Dot product — direction and magnitude. Used when magnitude is meaningful (e.g., some retrieval fine-tunings deliberately encode confidence in magnitude).
  • Euclidean (L2) distance — straight-line distance. Equivalent to cosine when vectors are unit-normalized.

For unit-normalized embeddings, all three are mathematically equivalent up to a monotonic transformation, so they produce the same ranking. The choice usually comes down to what your vector index supports natively.

Cosine requires a division by the product of two L2 norms. A division is much more expensive than a fused multiply-add on every modern CPU and GPU — and an ANN index does the inner-loop comparison billions of times during a single search.

If you normalize all stored vectors at index time (), the magnitude of every stored vector is exactly 1. At query time you normalize the query vector once and the inner-product with any indexed vector becomes the cosine — no per-pair division.

The cost of this shortcut is that the index has to commit to one similarity convention. If your downstream code wants Euclidean distance, you have to convert: for unit vectors, , so cosine and Euclidean produce the same ranking. Most ANN libraries (FAISS, HNSWlib, ScaNN) expose both conventions transparently and pick the right kernel internally.

Scoring conventions

A “cosine similarity” of 1 means perfectly aligned. Some libraries report “cosine distance” as , where 0 means perfectly aligned. Read the docs of whichever index you’re using to know which convention is in play.

Go further

When does cosine similarity stop being a useful signal?

At very high dimensions and large corpora, raw cosine flattens out — almost everything is roughly equidistant. Production retrieval addresses this by following first-pass cosine search with a learned similarity function (a cross-encoder reranker) that re-scores the top candidates more carefully.

What's the role of unit normalization?

Most embedding indexes store unit-normalized vectors so cosine similarity collapses into a plain dot product — one fused-multiply-add per dimension, no division needed. This is why ANN libraries care about whether your vectors are pre-normalized.

Related articles

Posts on the ZeroEntropy blog that reference cosine similarity.

ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord