Amazon Rekognition
Amazon Rekognition is AWS's computer-vision API. You send an image or video; it returns detected objects, scenes, faces, text, celebrities, or content-moderation labels as structured output — no model to train, no GPU to provision, no ML expertise required.
Launched in 2016, it is the right starting point when the question is "what is in this image or video" rather than "let's train a custom model for our specific problem."
What It Detects
Core features include label detection (objects and scenes with confidence scores), face detection and analysis (bounding boxes plus attributes), face comparison and search against an indexed collection, content moderation for unsafe content, celebrity recognition, PPE detection for safety monitoring, and text-in-the-wild detection.
Custom Labels trains a domain-specific classifier on your own labeled images — for a specific product line or defect pattern the built-in labels do not cover.
Image vs Video, and Faces
The image API is synchronous — send one image, get a response in milliseconds to seconds. The video API is asynchronous — start a job on an S3-hosted video, get notified via SNS, and read time-coded results. A streaming-video variant integrates with Kinesis Video Streams.
A face collection stores face vectors (not original images) so SearchFacesByImage can find matches. Those vectors are still PII in many jurisdictions, and face features carry Region-specific restrictions — check the current documentation before building on them.
Rekognition — "what is in this image or video" — objects, faces, moderation, in-the-wild text.
Textract — document understanding — structured text, forms, and tables from documents.
Bedrock multimodal — reasoning about image content in natural language, beyond structured detection.
- Stitching repeated image-API calls together to analyze a video instead of using the asynchronous video API.
- Leaving confidence thresholds at the default for tasks like face matching where a higher bar is needed.
- Sending large images as request bytes instead of an S3 reference, wasting bandwidth.
- Using Rekognition text-detection for document OCR, where Textract is the right tool.
- Storing more face data, for longer, than needed — even allowed face features carry privacy and regulatory weight.
- Trying to coerce built-in labels for a domain-specific task instead of training Custom Labels.
- Use the image API for synchronous workflows and the video API for stored or streaming video.
- Tune confidence thresholds to the task rather than relying on the default.
- Use S3 references for large images.
- Train Custom Labels for domain-specific detection.
- Store the minimum face data for the minimum time and document its purpose.
- Use Textract for document OCR and Bedrock for multimodal reasoning.
Knowledge Check
What is Rekognition best suited for?
- Answering "what is in this image or video" — objects, faces, moderation — no model to train
- Extracting key-value form fields, signatures, and tables from scanned documents and receipts
- Translating extracted text between dozens of languages
- Training arbitrary custom vision models from scratch
How do the Rekognition image and video APIs differ?
- The image API is synchronous; the video API is an asynchronous S3 job
- The image API is the asynchronous one; the video API answers synchronously in seconds
- Both APIs are synchronous and return in milliseconds
- The video API only runs on edge devices, not in the cloud
What does a Rekognition face collection store?
- Face vectors (not original images), still PII under many jurisdictions' laws
- The full original source photographs, each one indexed by the person's name and ID
- Only a running count of the faces it has seen
- Nothing — it processes each face without storing anything
For extracting structured data from a scanned form, which service is the right choice?
- Amazon Textract — document understanding with forms and tables
- Rekognition text detection run across the full scanned form image
- Amazon Comprehend to analyze the form's contents and pull out its labelled fields
- Amazon Polly to read each of the form fields aloud as narrated audio
You got correct