Service 58

Document Intelligence

AI API

Azure AI Document Intelligence (formerly Form Recognizer) extracts structured data from documents — text, key-value pairs, tables, and fields — going beyond plain OCR to understand a document's layout and meaning. It turns invoices, receipts, contracts, and forms into structured data an application can use, rather than a flat string of recognized characters.

The distinction from OCR is the whole point. OCR tells you the characters on the page; Document Intelligence tells you that this number is the invoice total and that block is the vendor address. For document-heavy workflows — accounts payable, claims, onboarding — that structure is the difference between automation and a human re-keying the result.

Prebuilt Models

Prebuilt models extract fields from common document types out of the box — invoices, receipts, identity documents, business cards, tax forms, health insurance cards. For these standard formats, no training is needed: send the document, get named fields back, with confidence scores you can threshold for human review.

Layout

The layout model extracts text, tables, selection marks, and structure with their positions, preserving the document's organization. It is the foundation other extraction builds on and is often used directly to feed clean, structured document content into a retrieval-augmented generation pipeline — turning PDFs into something a language model can ground on.

Custom Models

When documents are specific to your business — a particular form or contract layout — custom models train on a small set of your labeled examples to extract exactly the fields you need. This is the bridge to your own document types without the effort of a full machine-learning project, the same custom-on-top-of-pretrained pattern as Custom Vision.

Document Intelligence vs OCR

Plain OCR (the Read capability in AI Vision) recognizes characters; Document Intelligence understands documents. If you only need the text, OCR is simpler and cheaper. If you need to know what the text means — which value is the total, which rows are line items — Document Intelligence is the tool, and using OCR there forces you to rebuild that understanding by hand.

Document Intelligence vs plain OCR

Document Intelligence — Extracts structured fields, key-value pairs, and tables with meaning. Choose it for document workflows where structure drives automation.

OCR (AI Vision Read) — Extracts raw characters and their positions. Choose it when you only need the text, not its structure or meaning.

Common Mistakes

Using plain OCR for a document workflow and then rebuilding field extraction by hand, when Document Intelligence already understands the layout.
Training a custom model for a document type a prebuilt model already handles (invoices, receipts, IDs).
Ignoring confidence scores and auto-accepting low-confidence extractions instead of routing them to human review.
Using Document Intelligence where only the raw text was needed, paying for structure that adds no value.
Feeding messy, unstructured PDF text into a RAG pipeline when the layout model would produce clean, structured input.
Assuming custom models need a large dataset, when a small set of labeled examples suffices.

Best Practices

Use prebuilt models for common document types — invoices, receipts, IDs — with no training.
Train a custom model on a small labeled set for business-specific document layouts.
Threshold on confidence scores and route low-confidence extractions to human review.
Use the layout model to feed clean, structured document content into RAG pipelines.
Use plain OCR when you only need the text, and Document Intelligence when you need its meaning.
Pair extraction with downstream automation so structured output drives the workflow end to end.

Comparable servicesAWS TextractGCP Document AI

Knowledge Check

How does Document Intelligence differ from plain OCR?

OCR recognizes characters; Document Intelligence understands layout and meaning — which value is the total, which block is the address
Document Intelligence is simply a cheaper, lower-cost version of the same OCR engine
Plain OCR is the one that extracts tables and structured fields, while Document Intelligence can only read back undifferentiated plain text
They are exactly the same capability marketed under two different names

When is a custom Document Intelligence model warranted?

For business-specific document layouts that prebuilt models do not cover — trained on a small labeled set
For standard invoices and receipts, even though the prebuilt models already handle those layouts out of the box
Whenever any document anywhere happens to contain a table
Only when you are processing more than a million documents

Why threshold on confidence scores?

To route low-confidence extractions to human review instead of auto-accepting likely errors
To reduce the per-document price charged for each extraction call
Because the prebuilt models will not run at all unless a confidence threshold is explicitly set first
To switch on and enable the built-in layout model

You got correct