Service 37

Amazon Macie

SecurityDataManaged

Macie discovers sensitive data sitting in your S3 buckets. It scans object contents, identifies patterns that look like personal information (names, credit-card numbers, Social Security numbers) or operational secrets (AWS keys, private keys, connection strings), and raises findings. It is laser-focused on S3 — where most accidental data leaks happen because buckets are easy to misconfigure and forget.

It does not look at RDS, DynamoDB, or EBS — only object storage, deliberately.

What Macie Detects

Bucket-level findings flag misconfiguration continuously: public-readable buckets, unencrypted buckets, cross-account shares, no logging — regardless of whether you have configured object scanning. Object-level findings come from discovery jobs that sample or scan objects for sensitive-data patterns.

Built-in detectors cover personal, financial, government-ID, credential, and health-information patterns; custom data identifiers (regex plus keyword proximity) catch business-specific patterns like internal customer IDs.

How Discovery Works

A discovery job has three knobs: bucket selection (specific buckets, all, or a filter), schedule (one-time or recurring), and sampling depth (full scan of every new object, or a percentage sample). For large buckets, sampling is the cost-effective default — find the hot spots, then full-scan only those.

Findings include the object key, the type and count of sensitive data, and a redacted preview; the actual content stays in S3, Macie never stores it.

Macie vs GuardDuty vs Config

Macie — discovering what sensitive data lives in S3 and which buckets are misconfigured.

GuardDuty — detecting active threats from account and network activity.

AWS Config — tracking and evaluating resource configuration broadly, not S3 data contents.

Common Mistakes

Full-scanning every bucket from the start instead of sampling to find where sensitive data lives, then full-scanning only those.
Relying only on built-in detectors and missing business-specific patterns that need custom data identifiers.
Enabling Macie on accounts that hold only infrastructure or build artifacts, paying to scan data that is never sensitive.
Leaving bucket-level misconfiguration checks off — they are cheap and catch the most common S3 mistakes.
Not routing findings to EventBridge/Security Hub, so a discovered AWS key in a bucket goes unnoticed.
Skipping re-discovery after major data migrations, when sensitive data can land in unexpected buckets.

Best Practices

Enable Macie for accounts that hold customer data; skip infrastructure-only accounts.
Keep bucket-level evaluation always on for cheap misconfiguration detection.
Sample first to locate sensitive data, then full-scan only the buckets that have it.
Add custom data identifiers for your business-specific patterns.
Route findings to EventBridge and Security Hub; alert on discovered credentials immediately.
Re-run discovery after major data migrations.

Comparable services GCP Sensitive Data Protection (DLP)Azure Microsoft Purview

Knowledge Check

What data does Macie scan?

The object contents and the configuration of S3 buckets only — never RDS, DynamoDB, or EBS
Every storage and database service in the account, including RDS, DynamoDB, and EBS volumes
The runtime memory of running applications, looking for sensitive values held in process
Network traffic flowing between instances, watching for sensitive data crossing the wire

What is the cost-effective default for scanning large buckets with Macie?

Sampling a percentage of objects to find hot spots, then full-scanning only buckets with sensitive data
Full-scanning every single object in every bucket immediately on the very first classification job that runs
Scanning only the bucket and object names, never reaching into the contents themselves
Disabling object-level findings entirely to keep the classification bill as low as possible

How do you make Macie detect a business-specific pattern like an internal customer ID?

Add a custom data identifier that pairs a regex with keyword proximity rules
Rely on the built-in personal-information detectors to recognize the internal ID format
Rename the bucket so its name matches the customer ID pattern you want found
Macie simply cannot detect any custom or business-specific patterns at all

What do Macie's bucket-level findings cover?

Misconfiguration like public-readable or unencrypted buckets, continuously and independent of object scanning
The full byte-for-byte contents of every single object stored inside the bucket
Live network attacks against the bucket, such as floods aimed at its public endpoint
The full IAM policy evaluation logic that decides exactly who is allowed to reach each of the account's resources

You got correct