Introducing the Document Parser for RAG

Introducing the Document Parser for RAG

May 13, 2025

By Akash Mahajan, Mathew Hogan, Ishan Sinha

Today, we’re excited to introduce our document parser that enables enterprise AI agents to navigate and understand large and complex documents with superior accuracy and context awareness.

Parsing complex, unstructured documents is the critical foundation for agentic RAG systems. Failures in parsing cause these systems to miss critical context, degrading response accuracy. Existing solutions—whether basic parsers, standalone OCR, or large vision language models—are unreliable for the long, complex documents prevalent in most enterprises.

Our document parser represents a significant improvement in how AI systems interact with complex enterprise documents. Our pipeline combines the best of custom vision, OCR, and vision language models, along with specialized tools like table extractors—achieving superior accuracy and reliability by excelling in the following areas:

Document-level understanding vs. page-by-page parsing: Our parser understands the section hierarchies of long documents, equipping AI agents to understand relationships across hundreds of pages to generate contextually supported, accurate answers.
Minimized hallucinations: Our multi-stage pipeline is uniquely reliable and minimizes severe hallucinations while providing accurate bounding boxes and confidence levels for table extraction to audit its output.
Superior handling of complex modalities: Our advanced system orchestrates the best models and specialized tools to handle the most challenging document elements, such as tables, charts, and figures.

To get started today for free, create a Contextual AI account. Visit the Components tab to use the Parse UI playground, or get an API key and call the API directly. The first 500+ pages in our Standard mode (for complex documents that require VLMs and OCR) are on us!

Try free

The Problem with Traditional Document Parsing

Document parsing in enterprise settings presents unique challenges that most parsers simply do not meet:

Intricate hierarchical structures spanning hundreds of pages. Enterprise documents cannot be parsed as collections of isolated pages; rather, they should be interpreted as intricately layered documents with hierarchical relationships between sections, sub-sections, and document components. Existing solutions miss the bigger picture of how individual pages fit into the broader context of the document (e.g., information on page 5 contextualizes data on page 237). When document parsing fails to capture the nuanced relationships between different parts of a document, AI agents struggle to deliver contextually accurate responses.
Low tolerance for hallucinations: Enterprise use cases handle sensitive information in which hallucinations are not tolerable. Unfortunately, most existing parsers are prone to severe hallucinations (incorrect values, missing information, malformed table structures, etc.) without any indication of uncertainty.
Complex modalities: Enterprise documents have complex tables, charts, and figures that are notoriously difficult to parse. Most existing parsers are not designed to handle the nuances of these modalities in real-world enterprise documents, leading to hallucinations and errors.

Existing parsers’ shortcomings hamper development of advanced AI use cases. Since parsing is the first step of a RAG pipeline, any errors at this stage are propagated downstream in agentic workflows.

A New Approach to Document Understanding

Document Hierarchy: Context is King

Unlike traditional parsers, Contextual AI’s solution understands how each page fits within the document’s holistic structure and hierarchy, enabling AI agents to navigate long, complex documents with the same understanding a human would have. Specifically, we automatically infer a document’s hierarchy and structure, which enables developers to add metadata to each chunk that describes its position in the document. This improves retrieval and allows agents to understand how different sections relate to each other to provide answers that connect information across hundreds of pages.

The parser intelligently infers the complete document hierarchy of the Attention is All You Need paper.

Leveraging document hierarchy, we add each chunk’s relevant parent sections to its metadata to improve retrieval and downstream accuracy.

Why this matters: Context drives accuracy. By understanding document hierarchy and relationships between sections, your agents can make connections and inferences that would be impossible with traditional parsers. In an end-to-end RAG evaluation using a dataset of SEC 10Ks and 10Qs, we found that including document hierarchy metadata in chunks increased the equivalence score from 69.2% to 84.0%.

Naive chunking creates fixed-length chunks. Hierarchy-aware chunking creates chunks based on detected section headings. Hierarchy-aware chunking with contextualization adds metadata about document structure to chunks like section, subsection, subsubsection, etc. The SEC RAG evaluation dataset contains 70+ documents spanning 6,500+ pages. LLM responses are scored against the ground truth answers to measure end-to-end accuracy.

Auditability: Minimizing Hallucinations

Our system is designed to maintain accuracy even in the most complex documents, avoiding the catastrophic hallucinations that plague many AI document systems. Unlike competitors, we don’t just guess when documents get complex—our system provides confidence levels in table extraction, ensuring you know when information is reliable and when it requires human verification. We also provide bounding box references to the original document, enabling streamlined verification of parsing accuracy.

Why this matters: In enterprise environments, wrong information can be worse than no information. Our system knows what it knows, and just as importantly, knows what it doesn’t know.

A table from Intel’s FY2024 10K that presents several formatting challenges for document parsers.

The parsed output of Intel’s FY2024 10K. Contextual AI’s solution parses it correctly. A major multimodal LLM’s output has misaligned headers, a missing column, and misplaced values between columns and rows.

Complex Modalities

Our solution excels at handling complex document elements that are common pain points with other tools:

Technical diagrams with accompanying text
Charts and figures with multiple datapoints and dynamics
Large tables with nested hierarchies

Why this matters: The most valuable information in enterprise documents is often contained in varied and complex modalities. Our parser extracts both the semantic meaning and relationships within them.

A complex figure from a Goldman Sachs’ Research Report on AI/data center energy usage.

The parsed output from the Goldman Sachs’ Research Report. Contextual AI’s solution comprehensively and accurately captions the figures. A major multimodal LLM misplaces the values in the first chart.

Real-World Use Cases

Our parser is well-suited for a variety of real-world use cases:

Investment Research: Financial institutions work with long, complex, and multimodal documents. Our parser enables teams to analyze financial reports, regulatory documents, and deal memos with exceptional precision—maintaining relationships between sections that contextualizes critical financial data. The system also excels at extracting the modalities common in these sources—tables with nested hierarchies and financial charts with multiple datapoints.

Technical Customer Support: Understanding technical documentation requires understanding its section hierarchy and specialized diagrams within each section. Our parser helps customer engineering teams navigate complex hierarchies and accurately analyze technical figures and charts, preserving semantic relationships that traditional parsers miss.

Policy Compliance: Compliance and legal teams analyze massive policy document repositories to ensure their organizations adhere to internal guidelines and government regulation. Our parser ensures these documents are accurately processed with minimal hallucinations to reduce potential legal risk. Furthermore, the inferred document hierarchy helps agents navigate these large documents.

…and many more!

In all cases, our parser’s strengths in document hierarchy understanding, hallucination prevention, and complex modality handling directly address the unique challenges these industries face.

Getting Started

Get started today for free by creating a Contextual AI account. Visit the Components tab to use the Parse UI playground, or get an API key and call the API directly. We provide credits for the first 500+ pages in Standard mode (for complex documents that require VLMs and OCR), and you can buy additional credits as your needs grow. To request custom rate limits and pricing, please contact us. If you have any feedback or need support, please email parse-feedback@contextual.ai.

Documentation: /parse API, Python SDK, and code example notebook

Sample code block (from code example notebook):

# Setup Contextual Python SDK
try:
  from contextual import ContextualAI
except:
  %pip install --upgrade --quiet contextual-client
  from contextual import ContextualAI

client = ContextualAI(api_key=api_key)

# Submit parse job
with open(file_path, "rb") as fp:
    response = client.parse.create(
        raw_file=fp,
        parse_mode="standard",
        figure_caption_mode="concise",
        enable_document_hierarchy=True,
        page_range="0-5",
    )

job_id = response.job_id

Interpreting the output:

There are 3 types of output:

Markdown-document: A single Markdown for the entire document.
Markdown-per-page: A list of Markdowns for each page of the document.
Blocks-per-page: A list of structured JSON representations of the content blocks of each page of the document, sorted by reading order. Content blocks are either `heading`, `text`, `table`, or `figure`. Please see details of what each JSON contains here.

In addition, setting enable_document_hierarchy=True adds the inferred document hierarchy to the output, which you can use to add contextual metadata to chunks.

Optimizing Performance with the Contextual AI Platform

At Contextual AI, we believe in a “systems over models” approach to enterprise RAG. Rather than cobbling together disjointed models, our end-to-end platform orchestrates and optimizes all RAG components as a single, unified system, delivering unmatched simplicity and performance.

In addition to offering our powerful end-to-end platform that solves your hardest RAG problems, we’ve also exposed several of our platform’s state-of-the-art primitives as component APIs. These APIs can be easily plugged into existing RAG systems. Beyond today’s launch of our document parser, we’ve also recently released:

/rerank: The first instruction-following reranker, providing greater control over how retrievals are prioritized
/generate: The most grounded language model in the world, engineered specifically to minimize hallucinations
/LMUnit: Our evaluation-optimized model for preference, direct scoring, and granular unit test evaluation

Sign up today to get free access to our end-to-end platform until June 10 and $25 of free credits toward our component APIs! Whichever way you choose to build, we can’t wait to see what you create!

Thank you for your interest – we will be in touch. In the meantime, you can follow us on LinkedIn and Twitter.

Introducing the Document Parser for RAG

The Problem with Traditional Document Parsing

A New Approach to Document Understanding

Document Hierarchy: Context is King

Auditability: Minimizing Hallucinations

Complex Modalities

Real-World Use Cases

Getting Started

Optimizing Performance with the Contextual AI Platform