Service 50

Amazon SQS

IntegrationQueueAsync

Amazon SQS (Simple Queue Service) is a managed message queue. Producers send messages; consumers poll and process them; SQS stores messages durably in between, with no servers to manage. It was AWS's very first service (2006) and is among the cheapest — a moderate-traffic queue costs single-digit dollars per month.

It decouples producers from consumers and absorbs spikes — the simplest, most durable way to buffer work.

Standard vs FIFO

Standard queues prioritize throughput: at-least-once delivery (occasional duplicates under failure), best-effort ordering, unlimited throughput. FIFO queues guarantee strict ordering and exactly-once processing within a 5-minute deduplication window, at capped throughput (300/sec, 3,000 batched).

Start with Standard unless you can name a specific reason FIFO is required — most workloads are duplicate-tolerant if the consumer is idempotent.

Visibility Timeout and Dead-Letter Queues

Visibility timeout hides a received message from other consumers for a window (default 30s). If the consumer deletes it before the timeout, it is gone; if the consumer crashes, it reappears for another consumer. Set the timeout a bit longer than maximum processing time — too short causes duplicate work, too long delays retries.

A dead-letter queue catches messages that have been received and not deleted N times (typically 3–5), keeping poison messages from blocking the queue forever and preserving failures for investigation. Every production queue should have one.

Long Polling and Batching

Long polling lets a ReceiveMessage call wait up to 20 seconds for a message instead of returning instantly empty — set WaitTimeSeconds non-zero on every receive or idle consumers generate millions of wasteful API calls. Batching sends or receives up to 10 messages per call, cutting cost proportionally.

SQS vs SNS vs EventBridge

SQS — durable point-to-point buffering — one queue, consumers pull and process at their own pace.

SNS — pub/sub fan-out — one message to many subscribers in parallel.

EventBridge — content-based routing of events to many targets with rules.

Common Mistakes

Using short polling on idle queues, turning empty receives into millions of pointless API calls — always set WaitTimeSeconds.
Running production queues with no dead-letter queue, so a poison message blocks processing indefinitely.
Setting visibility timeout shorter than processing time, causing the same message to be processed twice.
Writing non-idempotent consumers, so any retry (even on FIFO) produces wrong or duplicated results.
Reaching for FIFO by default when Standard is cheaper and faster and the consumer can be idempotent.
Ignoring a growing ApproximateAgeOfOldestMessage, the leading indicator of a stuck downstream consumer.

Best Practices

Use Standard unless you can name a reason FIFO is required.
Always attach a dead-letter queue with a maxReceiveCount of 3–5.
Set WaitTimeSeconds to 20 on every receive (long polling).
Batch sends and receives where the access pattern allows.
Make consumers idempotent regardless of queue type.
Set visibility timeout a bit longer than maximum processing time and monitor queue depth and age.

Comparable services GCP Cloud Tasks, Pub/SubAzure Azure Queue Storage, Service Bus

Knowledge Check

What is the difference between Standard and FIFO SQS queues?

Standard gives at-least-once delivery and unlimited throughput; FIFO guarantees ordering and exactly-once
FIFO is cheaper per request and faster than Standard queues
Standard guarantees strict ordering and exactly-once delivery, while FIFO is only best-effort and unordered
Only FIFO queues support a dead-letter queue via redrive

What does the visibility timeout control?

How long a received message stays hidden from other consumers before reappearing
How long a message lives in the queue before it expires and is silently dropped by SQS
How long the producer waits before sending the next message
How many consumers can poll the queue at the same time

Why should every production queue have a dead-letter queue?

To stop poison messages from blocking the queue and to preserve failures for investigation
To increase the main queue's maximum sustained throughput by offloading messages to a second queue
To enable strict FIFO ordering on the main queue
To reduce the per-request cost of the main queue

What does long polling prevent?

Wasteful empty receives — idle consumers making millions of pointless API calls
Duplicate message delivery to a consumer caused by at-least-once redelivery
Oversized messages exceeding the 256 KB payload limit from being silently dropped
Out-of-order processing of messages across many concurrent consumers

You got correct