Amazon S3
Amazon S3 (Simple Storage Service) was the first AWS service, launched in 2006, and it is the storage backbone of the platform: you upload a file (an object) into a bucket and get a URL to read it from anywhere with the right permissions. It holds backups, logs, media, datasets, static sites, and ML training data — and many other AWS services use it underneath.
S3 is designed for eleven nines of durability (99.999999999%): put 10 million objects in S3 and you would lose one about every 10,000 years. There are no servers, no capacity to provision, and effectively no upper limit on how much you store.
Buckets and Objects
A bucket is a container with a globally unique name that lives in one Region; an object is the file, identified by a key, plus its data and metadata. You pay for what is inside buckets, not for the buckets themselves.
S3 has no real folders. A key like photos/2024/sunset.jpg is a single string — the slashes are part of the name, and the console's folder view is a convenience built from those prefixes.
Storage Classes
Storage classes trade access speed for cost so you pay only for what a dataset needs. Standard is for hot, frequently read data; Standard-IA and One Zone-IA for monthly-or-less access; the Glacier classes for archives; and S3 Express One Zone for sub-10ms single-AZ performance. A single bucket can hold any mix.
When access patterns are unknown, use Intelligent-Tiering: S3 watches usage and moves each object to the cheapest class that still meets its access needs — the safest default for most modern workloads.
Versioning and Lifecycle
Versioning keeps every version of an object, so an overwrite or delete is recoverable — the simplest protection against accidental deletion and ransomware. Lifecycle Rules automate transitions by age, prefix, or tag: move logs to Standard-IA after 30 days, to Glacier Deep Archive after 90, and delete after a year, with no manual cleanup.
Encryption and Security
S3 always encrypts at rest; the only question is who holds the keys. SSE-S3 (AWS-managed) is enough for most workloads; SSE-KMS adds per-key access control and a CloudTrail record of every decryption. Block Public Access is on by default — leave it on unless you are deliberately hosting a public site.
Misconfigured buckets are a leading cause of public data leaks. Use bucket policies rather than legacy ACLs, enable versioning on anything you cannot rebuild, and treat making a bucket public as a deliberate, audited decision.
Built-in Features
Beyond plain storage, S3 hosts static websites, generates time-limited pre-signed URLs for direct upload or download, fires event notifications to Lambda, SQS, or SNS on object changes, and replicates objects across Regions or accounts automatically. S3 Select once let you run a small SQL query against a single object, but AWS closed it to new customers in 2024 — reach for Athena or S3 Object Lambda instead.
S3 — object storage over HTTP — unlimited, durable, and web-reachable. The default for files, backups, media, and data lakes.
EBS — a block disk attached to one EC2 instance. For a database or filesystem that needs low-latency reads and writes.
EFS / FSx — a shared file system many instances mount at once. For web farms and content shared across hosts.
- Turning off Block Public Access for one object and leaving the whole bucket public — buckets stay public until you change them. Make publicness deliberate, never a default.
- Using legacy bucket ACLs instead of bucket policies — ACLs are an old, confusing model that modern IAM-based policies replace.
- Storing everything in Standard forever — without Lifecycle Rules, cold data costs many times what it should.
- Leaving Versioning off on buckets that hold irreplaceable data, so an accidental overwrite is unrecoverable.
- Putting an extremely hot single object directly on S3 and hitting per-object request limits — front it with CloudFront to spread reads.
- Ignoring request costs on busy buckets — millions of GETs and LISTs add up; check Storage Lens and Cost Explorer.
- Enable Block Public Access at the account level; override only on buckets that truly serve public content.
- Encrypt with SSE-S3 by default and SSE-KMS for sensitive data needing key-level access control.
- Enable Versioning on any bucket holding data you cannot easily rebuild.
- Use Lifecycle Rules to tier and expire data automatically; prefer Intelligent-Tiering when access is unpredictable.
- Use a Gateway VPC endpoint for private, free S3 access from inside a VPC, avoiding NAT data-transfer cost.
- Tag buckets by team, project, and environment for cost allocation, and watch spend with S3 Storage Lens.
Knowledge Check
How are 'folders' represented in S3?
- There are no real folders — a key like
a/b/c.jpgis one string, and the console builds the folder view from prefixes - Each folder is a separate nested sub-bucket living inside its parent bucket
- Folders are stored as objects of a dedicated special directory type
- S3 stores a real hierarchical POSIX filesystem complete with inodes, nested directories, hard links, and per-file permission bits
Access patterns for a new dataset are unknown. Which storage class is the safest default?
- Intelligent-Tiering — S3 moves each object to the cheapest class that still meets its access needs
- Glacier Deep Archive — it is always the cheapest class for any new dataset regardless of how it is accessed
- Standard — leave every object there and never move it
- One Zone-IA — a single-AZ copy is perfectly fine for any data
What protection does S3 Versioning primarily provide?
- Recovery from accidental overwrite or deletion, since prior versions are retained
- Lower storage cost by deduplicating identical versions of each object
- Faster reads by automatically caching the most recent version at edge locations worldwide
- Automatic cross-Region replication of every object you upload
When should you choose EBS or EFS over S3?
- When an app needs low-latency block storage on one instance (EBS) or a shared file system (EFS), not object access
- Whenever the total volume of stored data grows beyond roughly 5 GB spread across the entire bucket's key namespace
- Whenever you need a full eleven nines of object durability
- Whenever the data must be fully encrypted at rest on disk
You got correct