Also known as: query expansion, query reformulation, query transformation
TL;DR
Query rewriting transforms a user's raw query into one or more reformulated versions tuned for retrieval — expanding abbreviations, decomposing multi-part questions, or fixing the syntax expected by an underlying search API.
Query rewriting sits between the user’s raw input and the retrieval engine. The user typed “how do I cancel”; your retrieval system gets fed something like “how do I cancel my subscription, refund policy, downgrade to free plan” — three reformulations chosen to find the right docs.
Why it helps:
Users phrase things ambiguously, vaguely, or with abbreviations that the index doesn’t contain.
Multi-intent queries (“compare plans and tell me which to pick”) retrieve poorly when treated as one blob; decomposing them into sub-queries surfaces the right docs for each part.
Some search APIs expect specific syntax (Boolean, fielded, structured) that the user obviously isn’t going to type.
Common rewriting patterns
Rewriting patterns
HyDE (“Hypothetical Document Embeddings”) — generate a fake answer to the query first, embed that, retrieve against it. Often improves recall because the answer-shaped text aligns better with answer-shaped documents than with query-shaped text.
Multi-query — produce N rewrites and union their results.
Decomposition — break compound queries into sub-questions.
Conversational rewriting — “what about the second one?” needs to be expanded with the prior turn’s context before retrieval.
Acronym expansion / domain-specific rewrites — “GDPR” → “General Data Protection Regulation”, “k8s” → “Kubernetes”. Often domain-specific.
HyDE works on a specific shape of corpus and query. The corpus needs to be answer-shaped — knowledge bases, technical manuals, FAQs, where documents read like answers to natural questions. Queries need to be question-shaped — full natural-language questions, not keyword fragments. In that regime, the embedding gap between query and document closes when you replace the question with a hypothetical answer. On heterogeneous corpora — mixed document types, code repos, log lines, scraped web pages of unknown shape — HyDE often hurts: the LLM hallucinates an answer in a confident voice that anchors retrieval in the wrong neighborhood, and the false-positive rate spikes. Empirical rule: if your corpus has clear “this is what an answer looks like” structure and your queries are user questions, try HyDE; otherwise default to plain query embeddings.
LLM-as-rewriter vs specialized model
A frontier LLM can do query rewriting in a single prompt, and many production systems do exactly this. But: it’s slow (latency added to every query), expensive (full LLM call per request), and not always accurate at the rewrite-shape your retrieval needs.
For high-volume systems, a small specialized rewriting model trained on (intent, observed-API-recall) pairs runs an order of magnitude faster at the same or higher rewrite quality.
Where it can hurt
Rewriting isn’t free. Bad rewrites can introduce false positives, dilute the original intent, or cause hallucinated entities to show up in retrieval. Always evaluate rewriting end-to-end against retrieval quality, not just on rewrite plausibility.
The right rewriter is not the one that produces the most fluent reformulations — it’s the one that maximizes downstream Recall@K on your actual corpus.
Go further
Does HyDE actually work, or is it folklore?
It depends on your corpus. HyDE helps when documents are answer-shaped (knowledge bases, manuals) and the query is question-shaped — embedding a synthetic answer aligns better. It hurts on heterogeneous corpora where the LLM's hallucinated 'answer' anchors retrieval in the wrong neighborhood.
Don't — evaluate end-to-end. Rewrite plausibility (does the rewrite read well?) is a poor proxy for retrieval lift. Hold the rest of the pipeline fixed and measure Recall@K and NDCG@10 with vs. without the rewriter.
Why train a small specialized rewriter instead of calling a frontier LLM?
Latency, cost, and tighter targeting. A 1B-class model fine-tuned on (intent → rewrites that actually retrieved the right doc) runs in single-digit milliseconds and beats a generic LLM rewrite on the metric that matters — observed downstream recall.