Amazon Comprehend
Service 56

Amazon Comprehend

AI/MLNLPAPI

Amazon Comprehend is AWS's natural-language-processing API. You send text; it returns structured analysis — language, named entities, sentiment, key phrases, topics, and PII — with no model to train or infrastructure to manage. Like Rekognition for vision, it covers most "what is in this text" questions that do not require a custom model.

A sibling, Comprehend Medical, applies the same idea to clinical text.

What It Detects

Built-in features cover sentiment (Positive/Negative/Neutral/Mixed), entity detection (people, places, organizations, dates), key-phrase extraction, language detection (100+ languages), PII detection with optional redaction, targeted sentiment (per-entity rather than per-document), and topic modeling over a corpus.

Custom Models and API Shapes

Two custom paths run on Comprehend's infrastructure: custom classification (route tickets or emails into your categories) and custom entity recognition (detect entities the built-ins miss, like product names). You call an inference endpoint Comprehend manages.

The real-time API handles per-request analysis (up to 25 documents in batch-detect calls); batch jobs process S3 corpora asynchronously; provisioned throughput reserves capacity for high real-time volume.

Comprehend vs Bedrock vs Translate

Comprehend — narrow, cheap built-in NLP tasks — sentiment, entities, PII — at small per-call cost.

Bedrock — modern LLM tasks — summarization, Q&A, generation, complex reasoning over text.

Translate — converting text between languages, which Comprehend does not do.

Common Mistakes
  • Running other features without detecting language first, since many Comprehend features behave differently per language.
  • Using the real-time API for bulk historical analysis where batch jobs are cheaper and simpler.
  • Fighting Comprehend's narrow API for summarization or Q&A instead of using Bedrock.
  • Trying to coerce built-in entity types for domain-specific entities instead of training custom entity recognition.
  • Sending oversized documents past Comprehend's size limits instead of chunking first.
  • Dropping a PII-containing document entirely instead of combining detection with targeted masking.
Best Practices
  • Detect language first, then run other features with the correct language code.
  • Use the real-time API for per-request work and batch jobs for bulk historical analysis.
  • Combine PII detection with your own masking logic for redaction workflows.
  • Train custom entity recognition for domain-specific entities.
  • Chunk long texts before sending to stay within document-size limits.
  • For LLM-style tasks (summarization, Q&A, generation), use Bedrock instead.
Comparable services GCP Cloud Natural Language APIAzure Azure AI Language

Knowledge Check

What does Amazon Comprehend do?

  • Managed NLP — sentiment, entities, key phrases, language, and PII over text, no model to train
  • Optical character recognition that reads raw text off scanned documents and photographed pages
  • Text-to-speech synthesis across many voices and languages
  • Image and video analysis for objects, faces, and scenes

Why detect language before running other Comprehend features?

  • Many features behave differently per language and need the correct code
  • Language detection is the only feature offered free of per-call charge
  • Other features fail entirely unless detection is disabled first
  • It reduces the per-document call cost to exactly zero

For summarization, Q&A, or open-ended generation over text, which service fits better?

  • Amazon Bedrock — modern LLM tasks beyond Comprehend's built-in API
  • Amazon Comprehend custom classification routing text into your own categories
  • Amazon Translate for converting the text between languages
  • Amazon Textract to extract fields from the source document

What are Comprehend's two custom-model paths?

  • Custom classification and custom entity recognition
  • Custom optical character recognition and custom translation
  • Sentiment-model training and topic-model training
  • Fine-tuning and continued pre-training on your corpus

You got correct