Amazon Comprehend
Amazon Comprehend is AWS's natural-language-processing API. You send text; it returns structured analysis — language, named entities, sentiment, key phrases, topics, and PII — with no model to train or infrastructure to manage. Like Rekognition for vision, it covers most "what is in this text" questions that do not require a custom model.
A sibling, Comprehend Medical, applies the same idea to clinical text.
What It Detects
Built-in features cover sentiment (Positive/Negative/Neutral/Mixed), entity detection (people, places, organizations, dates), key-phrase extraction, language detection (100+ languages), PII detection with optional redaction, targeted sentiment (per-entity rather than per-document), and topic modeling over a corpus.
Custom Models and API Shapes
Two custom paths run on Comprehend's infrastructure: custom classification (route tickets or emails into your categories) and custom entity recognition (detect entities the built-ins miss, like product names). You call an inference endpoint Comprehend manages.
The real-time API handles per-request analysis (up to 25 documents in batch-detect calls); batch jobs process S3 corpora asynchronously; provisioned throughput reserves capacity for high real-time volume.
Comprehend — narrow, cheap built-in NLP tasks — sentiment, entities, PII — at small per-call cost.
Bedrock — modern LLM tasks — summarization, Q&A, generation, complex reasoning over text.
Translate — converting text between languages, which Comprehend does not do.
- Running other features without detecting language first, since many Comprehend features behave differently per language.
- Using the real-time API for bulk historical analysis where batch jobs are cheaper and simpler.
- Fighting Comprehend's narrow API for summarization or Q&A instead of using Bedrock.
- Trying to coerce built-in entity types for domain-specific entities instead of training custom entity recognition.
- Sending oversized documents past Comprehend's size limits instead of chunking first.
- Dropping a PII-containing document entirely instead of combining detection with targeted masking.
- Detect language first, then run other features with the correct language code.
- Use the real-time API for per-request work and batch jobs for bulk historical analysis.
- Combine PII detection with your own masking logic for redaction workflows.
- Train custom entity recognition for domain-specific entities.
- Chunk long texts before sending to stay within document-size limits.
- For LLM-style tasks (summarization, Q&A, generation), use Bedrock instead.
Knowledge Check
What does Amazon Comprehend do?
- Managed NLP — sentiment, entities, key phrases, language, and PII over text, no model to train
- Optical character recognition that reads raw text off scanned documents and photographed pages
- Text-to-speech synthesis across many voices and languages
- Image and video analysis for objects, faces, and scenes
Why detect language before running other Comprehend features?
- Many features behave differently per language and need the correct code
- Language detection is the only feature offered free of per-call charge
- Other features fail entirely unless detection is disabled first
- It reduces the per-document call cost to exactly zero
For summarization, Q&A, or open-ended generation over text, which service fits better?
- Amazon Bedrock — modern LLM tasks beyond Comprehend's built-in API
- Amazon Comprehend custom classification routing text into your own categories
- Amazon Translate for converting the text between languages
- Amazon Textract to extract fields from the source document
What are Comprehend's two custom-model paths?
- Custom classification and custom entity recognition
- Custom optical character recognition and custom translation
- Sentiment-model training and topic-model training
- Fine-tuning and continued pre-training on your corpus
You got correct