Query Rewriting

Also known as: query expansion, query reformulation, query transformation

TL;DR

Query rewriting transforms a user's raw query into one or more reformulated versions tuned for retrieval — expanding abbreviations, decomposing multi-part questions, or fixing the syntax expected by an underlying search API.

Query rewriting sits between the user’s raw input and the retrieval engine. The user typed “how do I cancel”; your retrieval system gets fed something like “how do I cancel my subscription, refund policy, downgrade to free plan” — three reformulations chosen to find the right docs.

QUERY REWRITINGOne ambiguous query, many sharper ones.USER QUERY"how do I cancel"LLM REWRITERintent → k paraphrasesVARIANT 1q′1how to cancel mysubscriptionVARIANT 2q′2refund policy aftercancellationVARIANT 3q′3downgrade plan to freetierVARIANT 4q′4turn off auto-renewalbillingTOP HITSd₁₂d₄₅d₀₇TOP HITSd₃₁d₁₂d₅₈TOP HITSd₂₃d₆₂d₀₇TOP HITSd₄₅d₈₁d₃₁UNIONED · DEDUPED8 unique · 4× recall surfaced₁₂d₄₅d₀₇d₃₁d₅₈d₂₃d₆₂d₈₁A short, ambiguous user query enters the pipeline.

Why it helps:

  • Users phrase things ambiguously, vaguely, or with abbreviations that the index doesn’t contain.
  • Multi-intent queries (“compare plans and tell me which to pick”) retrieve poorly when treated as one blob; decomposing them into sub-queries surfaces the right docs for each part.
  • Some search APIs expect specific syntax (Boolean, fielded, structured) that the user obviously isn’t going to type.

Common rewriting patterns

Rewriting patterns
  • HyDE (“Hypothetical Document Embeddings”) — generate a fake answer to the query first, embed that, retrieve against it. Often improves recall because the answer-shaped text aligns better with answer-shaped documents than with query-shaped text.
  • Multi-query — produce N rewrites and union their results.
  • Decomposition — break compound queries into sub-questions.
  • Conversational rewriting — “what about the second one?” needs to be expanded with the prior turn’s context before retrieval.
  • Acronym expansion / domain-specific rewrites — “GDPR” → “General Data Protection Regulation”, “k8s” → “Kubernetes”. Often domain-specific.

HyDE works on a specific shape of corpus and query. The corpus needs to be answer-shaped — knowledge bases, technical manuals, FAQs, where documents read like answers to natural questions. Queries need to be question-shaped — full natural-language questions, not keyword fragments. In that regime, the embedding gap between query and document closes when you replace the question with a hypothetical answer. On heterogeneous corpora — mixed document types, code repos, log lines, scraped web pages of unknown shape — HyDE often hurts: the LLM hallucinates an answer in a confident voice that anchors retrieval in the wrong neighborhood, and the false-positive rate spikes. Empirical rule: if your corpus has clear “this is what an answer looks like” structure and your queries are user questions, try HyDE; otherwise default to plain query embeddings.

LLM-as-rewriter vs specialized model

A frontier LLM can do query rewriting in a single prompt, and many production systems do exactly this. But: it’s slow (latency added to every query), expensive (full LLM call per request), and not always accurate at the rewrite-shape your retrieval needs.

For high-volume systems, a small specialized rewriting model trained on (intent, observed-API-recall) pairs runs an order of magnitude faster at the same or higher rewrite quality.

Where it can hurt

Rewriting isn’t free. Bad rewrites can introduce false positives, dilute the original intent, or cause hallucinated entities to show up in retrieval. Always evaluate rewriting end-to-end against retrieval quality, not just on rewrite plausibility.

The right rewriter is not the one that produces the most fluent reformulations — it’s the one that maximizes downstream Recall@K on your actual corpus.

Go further

Does HyDE actually work, or is it folklore?

It depends on your corpus. HyDE helps when documents are answer-shaped (knowledge bases, manuals) and the query is question-shaped — embedding a synthetic answer aligns better. It hurts on heterogeneous corpora where the LLM's hallucinated 'answer' anchors retrieval in the wrong neighborhood.

Why train a small specialized rewriter instead of calling a frontier LLM?

Latency, cost, and tighter targeting. A 1B-class model fine-tuned on (intent → rewrites that actually retrieved the right doc) runs in single-digit milliseconds and beats a generic LLM rewrite on the metric that matters — observed downstream recall.

ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord