Amazon Textract
Amazon Textract is AWS's document-understanding service. It performs OCR and goes beyond plain text by extracting forms (key-value pairs), tables, signatures, and structured fields from identity documents and receipts — reading a document the way a human does rather than returning a flat list of words.
Launched in 2019, it is the right starting point when the goal is "extract structured data from these documents" rather than "train a model on these documents."
What It Returns
DetectDocumentText is plain OCR — words and lines with bounding boxes. AnalyzeDocument is the richer call, taking feature flags: FORMS returns key-value pairs, TABLES returns rows and columns, SIGNATURES returns signature locations, and LAYOUT returns titles, headings, and paragraphs.
AnalyzeID normalizes identity-document fields (name, date of birth, document number) regardless of layout, and AnalyzeExpense extracts merchant, total, and line items from receipts. Each call is metered separately — pick the smallest that gives you what you need.
API Shapes and Custom Queries
The synchronous API handles single-page and small PDFs; the asynchronous API handles larger multi-page PDFs via an S3 job with SNS notification. The standard high-volume pattern is documents landing in S3, an EventBridge rule starting a Textract job, and a Lambda processing the result when SNS fires.
Custom queries let you ask natural-language questions of a document ("What is the policy holder's name?") without training a model — easier than form-based extraction for free-form documents where the answer moves around.
Textract — structured extraction from documents — forms, tables, IDs, receipts.
Rekognition — in-the-wild text in photos (signs, license plates), not document structure.
Bedrock multimodal — reasoning about document content, beyond extracting fields.
- Using the synchronous API for large multi-page PDFs that exceed its size limits — use the asynchronous API.
- Turning on every feature flag when you only need plain text, doubling the per-page bill needlessly.
- Hand-writing a parser for the block tree instead of using an AWS-published helper or library.
- Sending PDFs that already have a text layer through OCR instead of just parsing them.
- Pushing low-confidence extracted fields straight into a system of record instead of a human review queue.
- Using Textract for in-the-wild photo text, where Rekognition is the right tool.
- Pick the smallest API call that returns what you need — detection-only is cheaper than full analysis.
- Use the asynchronous API for multi-page or large PDFs.
- Parse the block tree with an AWS-published helper rather than reinventing it.
- Use FORMS for consistent layouts and Queries for variable ones.
- Use AnalyzeID and AnalyzeExpense for identity documents and receipts.
- Route below-threshold fields to human review, not directly into your data store.
Knowledge Check
How does Textract go beyond plain OCR?
- It extracts forms, tables, signatures, and fields from IDs and receipts, not just words
- It translates the document's text into other languages
- It reads in-the-wild text from photos of street signs
- It generates a natural-language summary of the document and answers questions about its contents
For a large multi-page PDF, which Textract API shape is required?
- The asynchronous API — an S3 job with SNS notification
- The synchronous API, which has no page-count or file-size limits
- The streaming API that pages results in real time
- Custom queries only, with no async job needed
What do Textract custom queries enable?
- Asking natural-language questions of a document without training a model
- Translating the whole document into a range of other languages on the fly
- Training a custom OCR model on your documents
- Detecting in-the-wild text in scene photos
What should happen to fields Textract extracts with low confidence?
- Route them to a human review queue, not straight into a system of record
- Accept every one of them automatically so the downstream pipeline keeps running at full speed
- Discard the entire document and send it back to be re-scanned from scratch
- Re-run just the uncertain fields through Rekognition for a second opinion
You got correct