PageIndex: The Reasoning-Based RAG Engine That Thinks Before It Retrieves

By Ankit Gubrani on

You've built a RAG system. You've tuned your chunk sizes, picked the right embedding model, and dialed in your similarity thresholds. And yet, it still gets things wrong.

A user asks, "How many days do I have to raise a dispute after receiving an invoice?" Simple question. The answer is definitely somewhere in the contract. Your vector search returns chunks about payment terms, invoicing schedules, and billing procedures. They look relevant. The similarity scores are solid. But none of them contain the actual dispute window, because that specific rule lives in the "Dispute Resolution" section three chapters away. Your chunking strategy split it off, embedded it separately, and it scored just low enough to get skipped.

The answer was in the document. Your system just couldn't find it.

This is not a tuning problem. It is a fundamental limitation of how vector RAG thinks about documents. And it is exactly the problem that PageIndex was designed to solve.

PageIndex takes a radically different approach. Instead of embedding chunks and hoping similarity search lands on the right ones, it builds a hierarchical understanding of the entire document and uses an LLM to reason its way to the right answer. No vectors. No similarity scores. Just structured thinking.

Let's unpack why this matters and how it works.

The Problem with Vector RAG

Vector RAG works by converting your documents into numerical representations called embeddings, then storing them in a vector database. When a query comes in, it gets embedded too, and the system finds the chunks whose embeddings are closest in that high-dimensional space. Closest embeddings presumably means most semantically similar content.

For a lot of use cases, this works beautifully. Short Q&A, product FAQs, knowledge bases with well-scoped topics. But the moment you throw a long, structured document at it like a 90-page legal contract, a technical specification, or a financial report, the cracks start showing.

The Chunking Problem: When Answers Get Split Apart ORIGINAL DOCUMENT Section 3: Payment Terms 3.1 Invoicing Schedule 6.2 Dispute Window: 30 days 3.3 Late Payment Fees 5.0 Termination 6.0 Dispute Resolution ... chunk CHUNK A Section 3: Payment Terms... CHUNK B 3.1 Invoicing schedule... CHUNK C (has the answer!) 6.2 Dispute window: 30 days CHUNK D 3.3 Late payment fees... embed VECTOR SEARCH Query: "dispute invoice" Chunk A: 0.82 sim Chunk B: 0.78 sim Chunk C: 0.66 sim (skipped!) Chunk D: 0.72 sim Returns A, B, D WRONG Answer returned Chunk C (dispute window) scored lowest because it lives in a different section. Vector RAG missed it.

Chunking destroys context

To fit documents into a vector store, you have to break them apart. And no matter how smart your chunking strategy is, splitting a document always risks tearing apart information that belongs together. A sentence on page 12 might only make sense in the context of a table on page 7. A number on page 34 might only be meaningful when read alongside a definition on page 2.

Once you split it, that relationship is gone.

Similarity is not the same as relevance

Vector search finds chunks that are semantically similar to the query. But the chunk that actually answers the question might use completely different vocabulary. "How many days to raise a dispute?" could map poorly to a chunk that says "the receiving party must notify within thirty calendar days," simply because the embedding model captures different surface features.

High similarity score does not guarantee the right answer is inside. And a chunk buried further down the similarity ranking might be the exact one you needed.

Long documents are structurally rich, embeddings are structurally blind

A well-written legal contract or compliance document has a deliberate structure. Sections, sub-sections, definitions, cross-references. That structure is meaningful. It tells a reader where to look. But vector embeddings flatten all of that. A chunk has no awareness of whether it came from Section 1 or Section 14, whether it is a definition or an obligation, or whether it references something three pages back.

For long structured documents, vector RAG is essentially reading the document in random order and hoping it finds the right paragraph.

Enter Reasoning-Based Retrieval

Think about how an experienced lawyer reads a contract. They don't randomly flip to a page and hope the answer is there. They know the structure of the document. They know that dispute resolution terms live in a specific section. They know that definitions in Section 1 govern terms used in Section 8. They navigate the document deliberately, using their understanding of structure to guide where they look.

That is reasoning-based retrieval. Instead of computing similarity across flattened chunks, it builds an understanding of the document's structure first, then uses that understanding to navigate toward the answer.

The key insight is this: retrieval is a reading comprehension task, not a search task. A search engine finds documents that match your query. A reader understands the document and knows where to look.

PageIndex is a retrieval engine built around that insight.

What is PageIndex?

PageIndex is a vectorless, reasoning-based RAG engine. It does not use embeddings. It does not run similarity search. Instead, it indexes documents by building a hierarchical tree of summaries and then uses an LLM to navigate that tree at query time, reasoning its way down to the relevant content.

The name gives it away. PageIndex thinks about documents the way an index in the back of a book does. A book index doesn't embed the content. It captures the structure and tells you exactly where to look.

The hierarchical tree

When you index a document with PageIndex, it processes the document in pages or logical sections and builds a tree where:

  • The leaves are the raw pages or sections of the document
  • The parent nodes are LLM-generated summaries of groups of pages
  • The root node is a high-level summary of the entire document

Each level of the tree captures a different level of abstraction. The root tells you what the document is about overall. The mid-level nodes tell you what each chapter or major section covers. The leaves give you the actual text.

When a query comes in, the LLM starts at the root and asks: which branch of this tree is most likely to contain the answer? It picks a branch and descends. At each level it re-evaluates and keeps narrowing until it reaches the relevant pages.

PageIndex: Hierarchical Tree + LLM Navigation ROOT SUMMARY Full document overview generated by LLM CHAPTER SUMMARY A Sections 1-3: Definitions & Scope CHAPTER SUMMARY B Sections 4-6: Obligations & Liability CHAPTER SUMMARY C Sections 7-10: Termination & Disputes PAGE / SECTION Sec 4: Indemnification PAGE / SECTION Sec 5: Liability Cap $2M PAGE / SECTION Sec 6: Exclusions LLM Navigation Path for "What is the liability cap?" Root --> Chapter B (Obligations & Liability) --> Section 5 (Liability Cap) No embeddings. No similarity search. Pure structured reasoning.

How PageIndex Works: Step by Step

Phase 1: Indexing

When you feed a document into PageIndex, it does not embed anything. Instead it runs through the following steps:

  1. Page-level processing: The document is split into logical units, typically pages or sections. These become the leaf nodes of the tree.
  2. Bottom-up summarization: The LLM reads groups of pages and generates a summary for each group. These become the mid-level nodes.
  3. Recursive summarization: Groups of mid-level summaries are themselves summarized, building up toward the root.
  4. Tree construction: The final result is a tree where each node has a summary and pointers to its children.

This indexing step is compute-intensive because it requires multiple LLM calls to build the summaries. But it only happens once per document. The payoff comes at query time.

Phase 2: Query and Navigation

When a query arrives, PageIndex does something fundamentally different from vector RAG. Instead of comparing the query against all chunks in parallel, it starts at the root and navigates the tree step by step.

  1. Start at the root: The LLM reads the root summary and the query, then decides which branch is most likely to contain the answer.
  2. Descend the tree: The LLM moves to the selected branch and reads its children's summaries. It picks the most promising one.
  3. Continue until leaves: The process repeats until the LLM reaches the leaf level and identifies the specific page or section that contains the answer.
  4. Read the full leaf: The actual raw text of the selected page is retrieved.

Phase 3: Answer Generation

Once the relevant page or section is retrieved, it gets sent to the LLM along with the original query for final answer generation. This is the same generation step as in traditional RAG, except the context is far more targeted.

Because the LLM navigated to the right content rather than guessing from similarity scores, the context it receives is almost always the exact content needed to answer the question.

PageIndex: Query Flow 1. Query In "What is the liability cap?" 2. Read Root LLM reads doc summary, picks branch 3. Descend Tree Reads chapter summaries, narrows to best section 4. Read Leaf Full raw text of target page retrieved 5. Answer LLM generates precise answer Indexing Phase (Done Once, Offline) Document --> Page Splits --> Bottom-up LLM Summarization --> Hierarchical Tree Stored No embeddings generated. No vector database. Tree is stored in structured format (JSON / graph).

When You Should Use PageIndex

PageIndex is not a replacement for vector RAG everywhere. It is purpose-built for a specific class of problems where structure matters and accuracy is non-negotiable.

Long, structured documents

Contracts, technical specifications, compliance manuals, regulatory filings, research reports. Any document where the answer could live in a specific section and missing that section would mean a wrong answer. These are exactly the cases where chunking destroys context and PageIndex's tree navigation thrives.

Compliance, finance, and legal workloads

When a compliance officer asks your system whether a policy document meets a regulatory requirement, "close enough" is not acceptable. A wrong answer drawn from a mis-retrieved chunk could have real consequences. PageIndex's structured navigation significantly reduces the chance of missing a critical clause.

Single-document deep-dive applications

Contract review tools, due diligence assistants, policy analysis tools. Anywhere the user is drilling deep into one specific document and expects precise, cited answers. PageIndex gives you that precision.

When document structure carries meaning

If the fact that something is in Section 3.2 versus an appendix changes how you interpret it, you need a retrieval system that respects that structure. PageIndex does. Vector RAG does not.

When NOT to Use PageIndex

PageIndex comes with real tradeoffs. It is worth being honest about where it struggles.

Websites and unstructured content

Web pages, blog posts, and marketing content don't have the kind of hierarchical structure that PageIndex relies on. Building a meaningful tree over a flat web page doesn't add value and wastes LLM compute. Stick with vector RAG for these.

Large-scale multi-document systems

If you're running a knowledge base with thousands of documents and users are searching across all of them, PageIndex doesn't scale the same way. Vector RAG with its O(1) approximate nearest neighbor search is the right tool for that problem. PageIndex's navigation tree is a per-document construct. It's not designed for cross-corpus search.

Low-latency applications

Tree navigation means multiple sequential LLM calls. Even with a fast model, each step in the navigation chain adds latency. If you need sub-second responses, PageIndex will feel slow. Vector search returns results in milliseconds. The tree walk does not.

High-volume query throughput

Each query triggers several LLM calls. At scale, the cost and rate-limit implications are significant. If you're handling tens of thousands of queries per minute, the economics of PageIndex are difficult to justify compared to a single embedding lookup.

Vector RAG vs PageIndex: The Clear Comparison

Here's how the two approaches line up across the dimensions that matter most in production systems.

Dimension Vector RAG PageIndex
Core mechanism Embedding similarity search LLM tree navigation
Indexing cost Low (embed and store) High (multi-step LLM summarization)
Query latency Very low (ms) Higher (multiple LLM hops)
Accuracy on long docs Inconsistent (chunking errors) High (structured navigation)
Handles document structure No (flattens structure) Yes (tree mirrors structure)
Cost per query Very low Higher (multiple LLM calls)
Best for FAQs, web content, large corpora Legal, compliance, finance, deep docs
Infrastructure needed Vector database required No vector DB (tree stored as JSON/graph)
Determinism Deterministic (same query, same result) Varies (LLM navigation can differ)

Final Thoughts: Where Does PageIndex Fit?

PageIndex is not trying to replace vector RAG. It is solving a different problem. And understanding that distinction is what makes you a better architect.

Vector RAG is fast, cheap, and scales incredibly well across large corpora. For most web-scale search and knowledge base use cases, it is the right tool. But the moment you are dealing with long, structured documents where precision matters, chunking becomes your enemy and similarity search becomes an unreliable guide.

PageIndex flips the retrieval model on its head. It says: instead of finding the needle by brute-force comparing every piece of hay, let's understand the structure of the haystack first. The LLM navigates like an expert reader, not a search engine.

In practice, the most robust production systems will likely use both. Vector RAG handles the wide, fast, multi-document retrieval layer. PageIndex handles the deep, precise, single-document analysis layer. Together, they cover a far wider range of real-world retrieval scenarios than either approach alone.

As LLMs get faster and cheaper, the latency and cost penalties of tree navigation will continue to shrink. The structural advantage of PageIndex will remain. That is a compelling trajectory for workloads where accuracy is not negotiable.

The future of retrieval is not one approach winning. It is knowing which approach to reach for based on the shape of the problem in front of you.

Stay tuned, and as always, if you've worked with PageIndex or reasoning-based retrieval in production, I'd love to hear what you've learned. Connect with me on LinkedIn.