Amazon Macie
Macie discovers sensitive data sitting in your S3 buckets. It scans object contents, identifies patterns that look like personal information (names, credit-card numbers, Social Security numbers) or operational secrets (AWS keys, private keys, connection strings), and raises findings. It is laser-focused on S3 — where most accidental data leaks happen because buckets are easy to misconfigure and forget.
It does not look at RDS, DynamoDB, or EBS — only object storage, deliberately.
What Macie Detects
Bucket-level findings flag misconfiguration continuously: public-readable buckets, unencrypted buckets, cross-account shares, no logging — regardless of whether you have configured object scanning. Object-level findings come from discovery jobs that sample or scan objects for sensitive-data patterns.
Built-in detectors cover personal, financial, government-ID, credential, and health-information patterns; custom data identifiers (regex plus keyword proximity) catch business-specific patterns like internal customer IDs.
How Discovery Works
A discovery job has three knobs: bucket selection (specific buckets, all, or a filter), schedule (one-time or recurring), and sampling depth (full scan of every new object, or a percentage sample). For large buckets, sampling is the cost-effective default — find the hot spots, then full-scan only those.
Findings include the object key, the type and count of sensitive data, and a redacted preview; the actual content stays in S3, Macie never stores it.
Macie — discovering what sensitive data lives in S3 and which buckets are misconfigured.
GuardDuty — detecting active threats from account and network activity.
AWS Config — tracking and evaluating resource configuration broadly, not S3 data contents.
- Full-scanning every bucket from the start instead of sampling to find where sensitive data lives, then full-scanning only those.
- Relying only on built-in detectors and missing business-specific patterns that need custom data identifiers.
- Enabling Macie on accounts that hold only infrastructure or build artifacts, paying to scan data that is never sensitive.
- Leaving bucket-level misconfiguration checks off — they are cheap and catch the most common S3 mistakes.
- Not routing findings to EventBridge/Security Hub, so a discovered AWS key in a bucket goes unnoticed.
- Skipping re-discovery after major data migrations, when sensitive data can land in unexpected buckets.
- Enable Macie for accounts that hold customer data; skip infrastructure-only accounts.
- Keep bucket-level evaluation always on for cheap misconfiguration detection.
- Sample first to locate sensitive data, then full-scan only the buckets that have it.
- Add custom data identifiers for your business-specific patterns.
- Route findings to EventBridge and Security Hub; alert on discovered credentials immediately.
- Re-run discovery after major data migrations.
Knowledge Check
What data does Macie scan?
- The object contents and the configuration of S3 buckets only — never RDS, DynamoDB, or EBS
- Every storage and database service in the account, including RDS, DynamoDB, and EBS volumes
- The runtime memory of running applications, looking for sensitive values held in process
- Network traffic flowing between instances, watching for sensitive data crossing the wire
What is the cost-effective default for scanning large buckets with Macie?
- Sampling a percentage of objects to find hot spots, then full-scanning only buckets with sensitive data
- Full-scanning every single object in every bucket immediately on the very first classification job that runs
- Scanning only the bucket and object names, never reaching into the contents themselves
- Disabling object-level findings entirely to keep the classification bill as low as possible
How do you make Macie detect a business-specific pattern like an internal customer ID?
- Add a custom data identifier that pairs a regex with keyword proximity rules
- Rely on the built-in personal-information detectors to recognize the internal ID format
- Rename the bucket so its name matches the customer ID pattern you want found
- Macie simply cannot detect any custom or business-specific patterns at all
What do Macie's bucket-level findings cover?
- Misconfiguration like public-readable or unencrypted buckets, continuously and independent of object scanning
- The full byte-for-byte contents of every single object stored inside the bucket
- Live network attacks against the bucket, such as floods aimed at its public endpoint
- The full IAM policy evaluation logic that decides exactly who is allowed to reach each of the account's resources
You got correct