FAISS

Q: What index type should I default to in FAISS?

For corpora under 1M, IndexFlatIP (exact) — simple, no tuning, fast enough. From 1M to 10M, IndexHNSWFlat. From 10M to 100M, IndexIVFFlat with nlist=sqrt(N). Beyond 100M, IndexIVFPQ. The factory string DSL (IVF1024,PQ96) lets you compose these without code changes.

Also known as: Facebook AI Similarity Search, faiss-cpu, faiss-gpu

TL;DR

FAISS (Facebook AI Similarity Search) is the C++ library for efficient similarity search and clustering of dense vectors. It implements the canonical ANN algorithms — flat, IVF, HNSW, PQ, and combinations — with CPU and GPU backends.

FAISS (Facebook AI Similarity Search) is the open-source C++ library for similarity search and clustering of dense vectors, originally built at Meta and the de facto reference implementation in the field. It exposes the full canonical ANN catalog — flat (exact), IVF , HNSW , product-quantization compression, and arbitrary compositions of these — in a tight C++ core with Python bindings, CPU and GPU backends. Most vector databases either embed FAISS directly, port a subset of its algorithms, or compete with its raw-throughput baseline.

What it actually does

FAISS is a library, not a database. You give it numpy arrays of vectors; you get back numpy arrays of nearest neighbors. There’s no HTTP server, no SQL, no auth, no replication. The interface is roughly four functions per index type:

index = faiss.IndexFactory(d, "IVF1024,PQ96") — construct the index.
index.train(training_vectors) — for indexes that need K-means or codebook training.
index.add(corpus_vectors) — insert the corpus.
D, I = index.search(query_vectors, k) — return distances and integer IDs of top-k matches per query.

Most vector retrieval systems either call these primitives under the hood or reimplement the same algorithms.

The factory string DSL

FAISS’s most distinctive feature is the index factory: a string DSL that composes algorithms by name.

Index factory examples

Flat — exact brute-force search. The baseline.
IVF1024,Flat — IVF with 1024 cells, exact distance within cells. Memory-efficient for medium corpora.
IVF16384,PQ96 — IVF with 16384 cells, vectors stored as 96-byte PQ codes. The billion-scale workhorse.
HNSW32 — HNSW with M=32. The default for 10M-scale in-memory.
OPQ96_768,IVF16384,PQ96 — optimize PQ rotation, IVF cluster, then PQ-compress. State-of-the-art at billion-scale.
PCA512,IVF1024,Flat — PCA-reduce to 512 dims first, then IVF. Useful when embeddings are over-dimensioned for the data.

The DSL is compositional: each comma-separated stage is a transformation or index type, applied in order. Want to swap PQ for SQ8 quantization? Change PQ96 to SQ8 in the string. Want to add a coarse quantizer before IVF? Prepend it. The flexibility is most of why FAISS dominates research benchmarks — running ablations is trivial.

CPU vs GPU

FAISS ships two backends. The CPU backend is the production path most teams run: tight SIMD-optimized C++, multi-threaded for batch queries, runs anywhere. The GPU backend (NVIDIA only, via custom CUDA kernels) is dramatically faster for bulk indexing — training IVF on a billion vectors in minutes instead of hours, building large indexes for offline batch retrieval.

Three workloads where it wins decisively. Bulk indexing: training and adding hundreds of millions of vectors. CPU IVF training is the slow part of building large indexes; GPU compresses that 10-20x. Massive batch queries: scoring millions of queries against a corpus offline (synthetic data, retraining datasets). GPU brute-force search is far faster than CPU per query at large batch. GPU-resident pipelines: when the embeddings already live on a GPU (you embedded them moments ago), running ANN search there avoids the host transfer.

Where it doesn’t pay off: low-QPS online serving. The latency win per query is dominated by index lookup overhead, not the math. Most production teams build with FAISS-GPU and serve with FAISS-CPU.

When to use FAISS, when not to

Use FAISS directly for:

Research and prototyping (fastest path from numpy to results).
Offline batch retrieval at scale (synthesize training data, dedup corpora, retrieval-augmented dataset construction).
Embedded retrieval in a single-process application where you want zero infrastructure.

Use a vector database (Qdrant, Weaviate, Vespa, Milvus, pgvector, Pinecone) when you need:

HTTP/gRPC API and operational tooling.
Filtering, hybrid search, multi-tenancy.
Live inserts/deletes with consistency guarantees.
Replication and high availability.

Most production systems use a vector database; many of those vector databases call FAISS internally for the actual ANN math. The library and the database are different roles in the stack.

Why it remains the reference

If you’re evaluating an ANN algorithm and want to know whether it’s serious, check whether FAISS has implemented it.

Go further

FAISS vs a vector database — when do I pick which?

FAISS is a library: an in-process index, no server, no metadata layer, no replication, no auth. A vector database wraps that with HTTP APIs, persistence, filters, multitenancy, and operational tooling. Pick FAISS for offline batch jobs, embedded retrieval in a single process, or research. Pick a vector DB for production multi-tenant retrieval at scale.

Approximate nearest neighbor First-pass retrieval

Is the GPU version actually worth it?

For batch indexing of hundreds of millions of vectors, yes — FAISS-GPU's IVF training and add operations are 5-20x faster than CPU. For online query serving at moderate QPS, usually no — the latency win is small and you give up the simpler CPU operational story. Most production deployments train on GPU and serve on CPU.

Throughput IVF clustering

What index type should I default to in FAISS?

For corpora under 1M, IndexFlatIP (exact) — simple, no tuning, fast enough. From 1M to 10M, IndexHNSWFlat. From 10M to 100M, IndexIVFFlat with nlist=sqrt(N). Beyond 100M, IndexIVFPQ. The factory string DSL (IVF1024,PQ96) lets you compose these without code changes.

HNSW Product quantization

← All concepts

The best AI teams build with ZeroEntropy models

Book Demo View docs