Instruction-Following Reranker

Also known as: instructable reranker, promptable reranker

TL;DR

An instruction-following reranker accepts an explicit instruction or context alongside the (query, document) pair, and reranks accordingly. Lets you inject business rules, user preferences, or domain context per call without retraining.

A standard takes (query, document) and produces a relevance score. An instruction-following reranker takes (query, instruction, document) — the instruction shapes how the reranker interprets relevance.

INSTRUCTION-FOLLOWING RERANKERSame query, same candidates — instruction changes the order.QUERY"candidates with IMO experience"INSTRUCTION · LANE ACHANNELmarine logistics intent"We are a marine logistics company hiring for shi…INSTRUCTION · LANE BCHANNELmath olympiad intent"We are recruiting a coach for our national mathe…R · INSTRUCTION-FOLLOWINGmodel(q, instr, d) → scoreconditioned on lane AR · INSTRUCTION-FOLLOWINGmodel(q, instr, d) → scoreconditioned on lane BRERANKED · INSTRUCTION ARERANKED · INSTRUCTION B1.d₁Worked at the International Mari…0.872.d₃Cargo-routing software for shipp…0.743.d₅Port-state inspections under IMO…0.624.d₂Coached the national IMO team in…0.125.d₄Olympiad-style proofs in combina…0.061.d₂Coached the national IMO team in…0.912.d₄Olympiad-style proofs in combina…0.783.d₁Worked at the International Mari…0.184.d₅Port-state inspections under IMO…0.145.d₃Cargo-routing software for shipp…0.09A single ambiguous query: "candidates with IMO experience".
What lives in the instruction channel
  • Business rules — “Prefer documents that cite the original signatory.”
  • User context — “This user is in the medical-records team; weight clinical-grade sources higher.”
  • Disambiguation hints — “Treat ‘IMO’ as referring to the International Maritime Organization, not the International Mathematical Olympiad.”
  • Term glossaries — “EBITDA equals earnings before interest, taxes, depreciation, and amortization.”
  • Recency bias — “The user is debugging current production; prefer documents written in the last six months.”

Why this matters

Without instruction-following, the only way to inject business context into reranking is to pre-process the query (concatenating context) or post-process the results (re-ranking via business rules outside the model). Both are awkward and lose the model’s ability to weigh context against semantic relevance natively.

A reranker trained on instructions can:

  • Resolve polysemic queries using context cues (the “IMO” example below).
  • Score documents that provide useful background (not direct answers) appropriately — without dropping them like a strict-relevance reranker would.
  • Adapt scoring to per-customer rules without per-customer fine-tuning.

The training data shape is (query, instruction, document, score) quadruples — not just (query, document, score). Supervision is sourced through the same pairwise-LLM-judge pipeline used for ordinary rerankers, but the judge is shown the instruction alongside the query when emitting a preference. The model learns to attend to the instruction tokens and condition its relevance scoring on them rather than ignoring them. Without instructions in the training data, fine-tuning a reranker to follow instructions at inference time barely works — instructions act as out-of-distribution noise.

Concrete behavior

zerank-2 is instruction-following. A practical example from its launch eval:

Query: "Candidates with IMO experience"
Instruction: "We're looking for engineering talent for a marine logistics company."

Document: "Candidate experience: Worked at the International Marine Organization"

zerank-2 (no instruction): 0.33
zerank-2 (with instruction): 0.64

The same document jumps from “marginally relevant” to “strongly relevant” once the model knows the user’s domain.

Risks

  • Instruction drift — overusing instructions to compensate for a poor index. Instructions should clarify, not compensate.
  • Prompt-injection-shaped attacks — if the instruction is shown to the model in the same tokens as the document, a malicious document could include text that overrides the instruction. Production systems segregate the instruction channel.
  • Instructions become product — once you depend on a particular phrasing, you can’t easily swap reranker models without re-validating instruction behavior.
Go further

How does this differ from query rewriting?

[Query rewriting](/concepts/query-rewriting/) reshapes the query before retrieval — every downstream stage sees the new query. Instructions live in a separate channel only the reranker reads, so they can encode business rules and user context without polluting the first-pass retrieval signal.

Are instructions calibrated the same way scores are?

Sort of — a well-trained instruction-following reranker preserves [calibration](/concepts/score-calibration/) under instruction shifts, so 0.7 still means roughly 'highly relevant' even when the instruction tightens or loosens the relevance criterion. Drift is real though, so re-validate calibration when you change instruction templates.

How do I evaluate instruction quality?

A/B the same query+candidate set with and without your instruction; measure NDCG@10 against a labeled set that reflects the intent the instruction encodes. If instructions help, the lift should be measurable on instruction-aware queries and roughly neutral on generic ones.

ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord