Deep Dive: The Architecture of ZeroEntropy v1

May 14, 2025 · GitHub Twitter Slack LinkedIn Discord
Deep Dive:  The Architecture of ZeroEntropy v1
TL;DR

ZeroEntropy is a full-stack, hybrid retrieval platform that combines sparse (BM25), dense (vector embeddings), and LLMs in the loop to deliver enterprise-grade search over unstructured documents.

At ZeroEntropy we’ve re-imagined every layer of the retrieval stack, from PDF parsing to query execution, to deliver end-to-end document intelligence on par with having an entire team of expert search engineers, behind one simple API.

In the sections below, we’ll dive into each layer of our architecture, from ingestion and indexing to query execution and security, demonstrating how we achieve sub-second latency, 90%+ recall, and enterprise-grade compliance (SOC 2, HIPAA).

Query Architecture Diagram
Ingestion Architecture Diagram

1. High-Level Overview

At its core ZeroEntropy is a hybrid search system combining:

  • Sparse retrieval (BM25) for lightning-fast keyword matching
  • Dense embedding retrieval for semantic relevance
  • LLM-in-the-loop for query understanding, keyword generation, and final result re-ranking

By combining all three, we avoid the “either/or” trade-offs of vanilla search systems.

2. Document Ingestion & Chunking

Render → OCR → VLM diagram tags

  • Why? Many PDFs, DOCXs, and PPTs hide text inside images, so we convert each page to a JPEG and OCR it.
  • Why keep the JPEG? At query time, you can request the original page image alongside your top hits, which is useful if you want to feed the image into an VLM.
  • Why VLM? Diagrams, flowcharts, tables and formulas carry meaning a simple OCR method would miss.

Hierarchical chunking

We detect language, pick the right tokenizer & stemmer, then split into words → sentences → paragraphs. By keeping contextual spans, we try to create meaningful chunks. We currently support two chunk sizes: coarse (around 2,000 chars) and fine (around 200 chars).

3. Indexing: Sparse & Dense

Index TypeWhat we indexWhy it matters
ParadeDB BM25Paragraph & document tokens + LLM-generated keywordsFuzzy/typo-tolerant keyword recall; lightning wildcard/fuzzy via BK-tree
TurbopufferEmbeddings of every node (sentences → document)Sub-second semantic search at scale
Ingestion Architecture Diagram
Query Architecture Diagram

4. Query Execution Walkthrough

LLM Rewriters

  • Query Rewriter: Refines your input into a clearer embedding prompt (e.g. “procedure for submitting Form 10-K to the SEC”).
  • Keyword Generator: Scores key terms (e.g. “10-K” = 0.8, “file” = –0.2) to improve matching.
  • Performance Modes: • Fast Mode skips LLM steps for sub-500 ms responses • Deep Mode runs the full LLM pipeline in 2–3 seconds

Tokenization & Typo Correction

We split the raw query into tokens and run them through a BK-tree typo corrector so even “10-Kk” maps back to “10-K.”

Sparse + Dense Fan-Out

Dense Recall

We query Turbopuffer embeddings to fetch the top-N semantically related chunks.

Sparse Recall

We use ParadeDB’s BM25 index to retrieve the top-N’ keyword matches.

Reciprocal Rank Fusion

We merge the sparse and dense rankings using reciprocal rank fusion to select the final top K results, combining complementary signals for up to a 10–15% boost in overall accuracy.

5. Security & Deployment

  • SOC 2 Type II compliant.
  • HIPAA compliant & following industry best practices.
  • End-to-end encryption for data in transit and at rest.
  • On-Prem deployment available for enterprise users, as easy-to-use docker images.

Wrapping Up

ZeroEntropy isn’t “another vector search.” It’s a full-stack retrieval platform that:

  • Knows how to parse your most fiendish PDFs.
  • Indexes meaningful snippets at every granularity.
  • Blends classical IR, embeddings, and LLM intelligence under the hood.
  • Scales from a single document to billions of nodes without compromising accuracy or speed.

Ready to see it in action?

Explore the docs →

Book a demo →

We are able to render at 3-4x the resolution with JPEG at the same size of PNG. So, we can actually send better quality images with JPEG within the constraint of a particular latency / storage / bandwidth allocation.

Related Blogs

Catch all the latest releases and updates from ZeroEntropy.

ZeroEntropy
The best AI teams retrieve with ZeroEntropy
Follow us on
GitHubTwitterSlackLinkedInDiscord