Amazon Macie
Service 37

Amazon Macie

SecurityDataManaged

Macie discovers sensitive data sitting in your S3 buckets. It scans object contents, identifies patterns that look like personal information (names, credit-card numbers, Social Security numbers) or operational secrets (AWS keys, private keys, connection strings), and raises findings. It is laser-focused on S3 — where most accidental data leaks happen because buckets are easy to misconfigure and forget.

It does not look at RDS, DynamoDB, or EBS — only object storage, deliberately.

What Macie Detects

Bucket-level findings flag misconfiguration continuously: public-readable buckets, unencrypted buckets, cross-account shares, no logging — regardless of whether you have configured object scanning. Object-level findings come from discovery jobs that sample or scan objects for sensitive-data patterns.

Built-in detectors cover personal, financial, government-ID, credential, and health-information patterns; custom data identifiers (regex plus keyword proximity) catch business-specific patterns like internal customer IDs.

How Discovery Works

A discovery job has three knobs: bucket selection (specific buckets, all, or a filter), schedule (one-time or recurring), and sampling depth (full scan of every new object, or a percentage sample). For large buckets, sampling is the cost-effective default — find the hot spots, then full-scan only those.

Findings include the object key, the type and count of sensitive data, and a redacted preview; the actual content stays in S3, Macie never stores it.

Macie vs GuardDuty vs Config

Macie — discovering what sensitive data lives in S3 and which buckets are misconfigured.

GuardDuty — detecting active threats from account and network activity.

AWS Config — tracking and evaluating resource configuration broadly, not S3 data contents.

Common Mistakes
  • Full-scanning every bucket from the start instead of sampling to find where sensitive data lives, then full-scanning only those.
  • Relying only on built-in detectors and missing business-specific patterns that need custom data identifiers.
  • Enabling Macie on accounts that hold only infrastructure or build artifacts, paying to scan data that is never sensitive.
  • Leaving bucket-level misconfiguration checks off — they are cheap and catch the most common S3 mistakes.
  • Not routing findings to EventBridge/Security Hub, so a discovered AWS key in a bucket goes unnoticed.
  • Skipping re-discovery after major data migrations, when sensitive data can land in unexpected buckets.
Best Practices
  • Enable Macie for accounts that hold customer data; skip infrastructure-only accounts.
  • Keep bucket-level evaluation always on for cheap misconfiguration detection.
  • Sample first to locate sensitive data, then full-scan only the buckets that have it.
  • Add custom data identifiers for your business-specific patterns.
  • Route findings to EventBridge and Security Hub; alert on discovered credentials immediately.
  • Re-run discovery after major data migrations.
Comparable services GCP Sensitive Data Protection (DLP)Azure Microsoft Purview

Knowledge Check

What data does Macie scan?

  • The object contents and the configuration of S3 buckets only — never RDS, DynamoDB, or EBS
  • Every storage and database service in the account, including RDS, DynamoDB, and EBS volumes
  • The runtime memory of running applications, looking for sensitive values held in process
  • Network traffic flowing between instances, watching for sensitive data crossing the wire

What is the cost-effective default for scanning large buckets with Macie?

  • Sampling a percentage of objects to find hot spots, then full-scanning only buckets with sensitive data
  • Full-scanning every single object in every bucket immediately on the very first classification job that runs
  • Scanning only the bucket and object names, never reaching into the contents themselves
  • Disabling object-level findings entirely to keep the classification bill as low as possible

How do you make Macie detect a business-specific pattern like an internal customer ID?

  • Add a custom data identifier that pairs a regex with keyword proximity rules
  • Rely on the built-in personal-information detectors to recognize the internal ID format
  • Rename the bucket so its name matches the customer ID pattern you want found
  • Macie simply cannot detect any custom or business-specific patterns at all

What do Macie's bucket-level findings cover?

  • Misconfiguration like public-readable or unencrypted buckets, continuously and independent of object scanning
  • The full byte-for-byte contents of every single object stored inside the bucket
  • Live network attacks against the bucket, such as floods aimed at its public endpoint
  • The full IAM policy evaluation logic that decides exactly who is allowed to reach each of the account's resources

You got correct