Debugging
When Terraform behaves inexplicably — a diff that never goes away, a provider error with no obvious cause, an apply that hangs for ten minutes — guessing is the slow path. The fast path is to see what the engine is actually doing, and TF_LOG is the single most useful tool for that. It exposes Terraform's internal logs, including the exact AWS API calls the provider makes, which is usually where the real answer is hiding.
This topic covers reading those logs and the systematic approach to the failures Terraform users actually hit: the perpetual diff, the hang, and the opaque provider error. The technique underneath all of them is the same — stop theorizing about what Terraform might be doing and read what it is doing.
TF_LOG Levels
TF_LOG is an environment variable that turns on logging at a chosen verbosity: TRACE, DEBUG, INFO, WARN, or ERROR, from most to least verbose. TRACE is everything including the internals of the graph walk; DEBUG is the practical default for most investigations. TF_LOG_PROVIDER scopes logging to just the provider, and TF_LOG_PATH writes the logs to a file instead of flooding your terminal.
# DEBUG is usually the right level; TRACE is firehose export TF_LOG=DEBUG # log only the provider's activity, write it to a file export TF_LOG_PROVIDER=DEBUG export TF_LOG_PATH=./tf-debug.log terraform plan
Scope the level to the question. Running TF_LOG=TRACE for a provider problem buries the relevant API call under the engine's graph-walk noise; TF_LOG_PROVIDER=DEBUG isolates exactly the provider activity you care about. Capture to TF_LOG_PATH so you can search the log rather than scroll it, then unset the variables when you are done — leaving TRACE on makes every subsequent run unreadable.
Reading Provider API Calls
The highest-value thing in the logs is the provider's API traffic. At DEBUG, the AWS provider logs each request it sends and each response it gets back, so an opaque "error creating resource" becomes a specific API error with a specific reason — an AccessDenied on a particular action, a ValidationException naming the bad parameter, a throttling response. The error Terraform prints is a summary; the log is the actual exchange with AWS.
This is the universal move. Almost every confusing Terraform failure resolves to one API call going wrong, and the log shows you which call, with what arguments, and what AWS said back. Once you can read that exchange, most opaque errors stop being opaque.
The Perpetual Diff
A perpetual diff is a resource that shows a change on every plan even though you keep applying it — apply succeeds, the next plan proposes the same change again, forever. The usual causes are a small set: the provider normalizes a value differently from how you wrote it (a JSON policy reordered, a string recased), an attribute is being changed outside Terraform by another system, or a genuine provider bug. The fix depends on which.
Diagnose it by comparing what the provider reads back against what your config declares — the DEBUG log shows both. If the difference is cosmetic normalization, match your config to the canonical form the provider expects; if another system legitimately owns the attribute, ignore_changes on that specific attribute stops the fight. Letting a perpetual diff sit for months means every plan is noise and real changes hide inside it.
Hangs and Timeouts
An apply that hangs is usually waiting on something, and the logs reveal what. API throttling shows up as repeated retry-with-backoff entries — the provider is being rate-limited and waiting between attempts. A dependency wait shows up as Terraform sitting on a resource whose prerequisite has not converged. A resource-level timeout shows up as the provider polling for a state that never arrives, like an RDS instance stuck in modifying.
Without logs, a hang is indistinguishable from a crash, and people kill the run — sometimes mid-write, corrupting state. With DEBUG on, you can see whether Terraform is throttled, blocked, or genuinely stuck, and decide whether to wait it out, lower parallelism, or fix the underlying resource. Reading the log is how you tell a slow apply from a dead one.
Crash Logs and Reporting
When Terraform itself panics, it writes a crash.log with the stack trace and the recent log output. That file plus a minimal reproduction — the smallest config that triggers the crash — is what makes a provider bug report actionable. A report that says "it crashed" with no logs and no repro is unactionable; maintainers cannot fix what they cannot reproduce.
Before filing, isolate the problem to the smallest config that still fails, and attach the scoped DEBUG log and the crash.log. The effort of building a minimal reproduction often surfaces the cause yourself — and when it does not, it is exactly what a maintainer needs to fix it fast.
- Guessing at the cause of a perpetual diff or an opaque error instead of turning on
TF_LOGto see the actual API calls behind it. - Running
TF_LOG=TRACEand drowning in graph-walk noise whenDEBUGorTF_LOG_PROVIDERwould have isolated the issue. - Killing a hanging apply mid-write instead of reading the log to see it is throttled and waiting, sometimes corrupting state in the process.
- Filing a provider bug with no logs and no minimal reproduction, making it unactionable for the maintainers.
- Ignoring a perpetual diff for months instead of diagnosing the normalization or
ignore_changesfix, so every plan is noise that hides real changes.
- Reach for
TF_LOGandTF_LOG_PATHearly when behavior is inexplicable, scoping the level toDEBUGto keep the log readable. - Use
TF_LOG_PROVIDERto isolate the provider's exact API request and response behind a failure. - Diagnose a perpetual diff by comparing what the provider reads back against what the config declares, then normalize the config or scope
ignore_changesto that attribute. - Read the log before killing a hanging apply, to tell a throttled wait from a genuine stall.
- When reporting a bug, attach the scoped log and
crash.logwith the smallest config that reproduces it.
Knowledge Check
What does TF_LOG expose that the normal error output does not?
- Terraform's internal logs, including the exact AWS API requests and responses the provider makes
- A rendered graphical visualization of the dependency graph, drawn in DOT and exported to an SVG file
- The decrypted plaintext contents of the state file's sensitive values, pulled straight from the S3 backend
- A predicted monthly cost estimate for every resource in the planned changes
You are debugging a provider error. How should you scope the log verbosity?
- Use
TF_LOG_PROVIDER=DEBUGto isolate the provider's API activity rather thanTF_LOG=TRACE, which buries it in noise - Always use
TF_LOG=TRACEwith logs sent to a file, since the most detail is always the safest choice for any problem - Use
TF_LOG=ERROR, which shows only the API calls that failed and keeps the output most concise - Disable logging entirely and rely on the plan summary's change counts instead
A resource shows the same change on every plan no matter how many times you apply. What is a common cause?
- The provider normalizes the value differently from how you wrote it, or another system changes the attribute out of band
- The state file is missing its entry for the resource, so every plan recreates it from scratch
- Locking is disabled on the backend, so the apply reports success in the output but never actually writes the change back to the remote state
- The resource has
create_before_destroyset in its lifecycle block, which forces a fresh re-plan
What makes a provider bug report actionable for maintainers?
- A minimal reproduction config plus the scoped
DEBUGlog andcrash.log - A screenshot of the terminal showing the red error summary line and the exit code
- The full unredacted state file so maintainers can inspect every managed resource
- A description of the symptom with the Terraform version number only
You got correct