Service 56

Amazon Comprehend

AI/MLNLPAPI

Amazon Comprehend is AWS's natural-language-processing API. You send text; it returns structured analysis — language, named entities, sentiment, key phrases, topics, and PII — with no model to train or infrastructure to manage. Like Rekognition for vision, it covers most "what is in this text" questions that do not require a custom model.

A sibling, Comprehend Medical, applies the same idea to clinical text.

What It Detects

Built-in features cover sentiment (Positive/Negative/Neutral/Mixed), entity detection (people, places, organizations, dates), key-phrase extraction, language detection (100+ languages), PII detection with optional redaction, targeted sentiment (per-entity rather than per-document), and topic modeling over a corpus.

Custom Models and API Shapes

Two custom paths run on Comprehend's infrastructure: custom classification (route tickets or emails into your categories) and custom entity recognition (detect entities the built-ins miss, like product names). You call an inference endpoint Comprehend manages.

The real-time API handles per-request analysis (up to 25 documents in batch-detect calls); batch jobs process S3 corpora asynchronously; provisioned throughput reserves capacity for high real-time volume.

Comprehend vs Bedrock vs Translate

Comprehend — narrow, cheap built-in NLP tasks — sentiment, entities, PII — at small per-call cost.

Bedrock — modern LLM tasks — summarization, Q&A, generation, complex reasoning over text.

Translate — converting text between languages, which Comprehend does not do.

Common Mistakes

Running other features without detecting language first, since many Comprehend features behave differently per language.
Using the real-time API for bulk historical analysis where batch jobs are cheaper and simpler.
Fighting Comprehend's narrow API for summarization or Q&A instead of using Bedrock.
Trying to coerce built-in entity types for domain-specific entities instead of training custom entity recognition.
Sending oversized documents past Comprehend's size limits instead of chunking first.
Dropping a PII-containing document entirely instead of combining detection with targeted masking.

Best Practices

Detect language first, then run other features with the correct language code.
Use the real-time API for per-request work and batch jobs for bulk historical analysis.
Combine PII detection with your own masking logic for redaction workflows.
Train custom entity recognition for domain-specific entities.
Chunk long texts before sending to stay within document-size limits.
For LLM-style tasks (summarization, Q&A, generation), use Bedrock instead.

Comparable services GCP Cloud Natural Language APIAzure Azure AI Language

Knowledge Check

What does Amazon Comprehend do?

Managed NLP — sentiment, entities, key phrases, language, and PII over text, no model to train
Optical character recognition that reads raw text off scanned documents and photographed pages
Text-to-speech synthesis across many voices and languages
Image and video analysis for objects, faces, and scenes

Why detect language before running other Comprehend features?

Many features behave differently per language and need the correct code
Language detection is the only feature offered free of per-call charge
Other features fail entirely unless detection is disabled first
It reduces the per-document call cost to exactly zero

For summarization, Q&A, or open-ended generation over text, which service fits better?

Amazon Bedrock — modern LLM tasks beyond Comprehend's built-in API
Amazon Comprehend custom classification routing text into your own categories
Amazon Translate for converting the text between languages
Amazon Textract to extract fields from the source document

What are Comprehend's two custom-model paths?

Custom classification and custom entity recognition
Custom optical character recognition and custom translation
Sentiment-model training and topic-model training
Fine-tuning and continued pre-training on your corpus

You got correct