Vector

Also known as: dense vector, feature vector, tensor (1D)

TL;DR

A vector is an ordered list of numbers — the universal data shape in modern AI. Every embedding, every layer activation, every gradient, every prediction is a vector under the hood.

A vector is an ordered list of numbers. That’s it. [0.42, -1.8, 3.1, ..., 0.07] of length is a -dimensional vector. In modern AI, vectors are the universal data shape: every is a vector, every layer activation inside a is a vector, every gradient in backpropagation is a vector, every output before softmax is a vector. The whole stack runs on float arithmetic over arrays of numbers.

VECTORAn ordered list of numbers.SCALAR0.42d = 12D VECTOR34[3, 4]d = 2768-DIM ROW···d = 768THREE OPERATIONSADDITION2-130+12-143124element-wiseSCALAR MULTIPLICATION2×2-1304-260every element scalesDOT PRODUCT2-130·12-142+-2+-3+0SUM-3Σ aᵢ · bᵢA vector is an ordered list of numbers.

What “dimension” actually means

The dimension is the length of the list. A 768-dimensional vector has 768 numbers in it. The numbers themselves can mean anything — coordinates in space, scores across a vocabulary, components of an embedded sentence — but the shape of the data structure is fixed and matters for compute.

Common dimensions you’ll meet in AI:

  • 768 — original BERT-base hidden size; many embedding models still emit this.
  • 1024 — BERT-large, many production embedding models (E5, BGE).
  • 1536 — OpenAI’s text-embedding-3-small.
  • 2048–4096 — typical hidden sizes inside frontier LLMs.
  • 30000–256000 — vocabulary-sized vectors (logits over the tokenizer’s vocabulary).

These numbers aren’t magic. They’re hardware-friendly multiples of 64 or 128 (which match GPU warp sizes and memory-coalescing widths) chosen by the original architects. Embedding models inherit them by convention.

The operations you’ll see everywhere

Three operations dominate AI compute:

  • Addition — adding two vectors of the same dimension element-wise. Used in residual connections, gradient updates, and accumulation.
  • Scalar multiplication — multiplying every element by a single number. Used in scaling, normalization, and softmax temperature.
  • . The core similarity operation. Used in attention’s , in cosine similarity, in the final layer’s logit computation.

Almost every ML primitive reduces to a sequence of these three. The reason GPUs are good at AI is that they parallelize all three across thousands of cores at once.

INNER PRODUCTInner product, term by term.a3a1-1a22a34a4b1b12b23b3-1b4aᵢ·bᵢ+33·1-2-1·2+62·3-44·-1Σ+3+1+7+3a · b3‖a‖ = √(9 + 1 + 4 + 16) = √30 ≈ 5.477‖b‖ = √(1 + 4 + 9 + 1) = √15 ≈ 3.873COS(a, b)0.141

The dot product is the only bilinear, symmetric, scalar-valued operation on two vectors. Anything you want to compute that takes two vectors and returns a single number — similarity, projection, alignment, attention weight — eventually reduces to a dot product (often after some normalization).

Concretely:

  • In attention: measures how much the query token wants to read from the key token.
  • In retrieval: (after embedding both) measures relevance.
  • In the LLM output head: for each row of the output projection produces one logit per vocab token.
  • In a linear classifier: the score for class is .

The recurring pattern: project both inputs into a shared space, then dot-product. The whole field of representation learning is “design embedding spaces where dot products mean something useful.”

Norms — the “size” of a vector

The most common way to talk about a vector’s magnitude is its (Euclidean length): . A unit vector has norm 1. Most production embedding indexes store unit-normalized vectors so that cosine similarity collapses to a plain dot product — one fewer operation per comparison, and of them per query is the inner loop of retrieval.

Where vectors stop being enough

Vectors are 1D. As soon as you need a 2D structure (an attention matrix Q @ K.T) or a 3D structure (a batch of sequences of token embeddings: [batch, seq_len, d]), you’re in tensor territory. A tensor is a higher-rank array of numbers; in PyTorch / NumPy code, “vector” and “tensor” are interchangeable up to the rank in .shape. Every operation in a deep network reduces to vector arithmetic, batched and parallelized over higher-rank tensor shapes.

Go further

What's the difference between a vector and a tensor?

A vector is a 1-dimensional tensor. A tensor is the general N-dimensional generalization — a scalar is rank-0, a vector is rank-1, a matrix is rank-2, an attention K/Q/V tensor is rank-3 or 4. In ML code (torch.Tensor, numpy.ndarray), 'tensor' is the umbrella; 'vector' is the rank-1 case.

Why do dimensions matter so much?

Dimensionality is the universal lever in ML. More dimensions = more expressive capacity but more storage, more compute per operation, more risk of curse-of-dimensionality effects. The standard knobs (768, 1024, 1536) come from BERT-era hidden sizes and are mostly hardware-friendly multiples of 64.

Are vectors always floats in AI?

In modern AI, almost always — float32, float16, or bfloat16. Integer vectors show up only as quantized representations (int8 for compressed embeddings, int4 for compressed model weights) or as token-ID inputs before they get embedded. Once you're in the model's hidden state, you're in float space.

ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord