Service 42

Application Insights

APM

Application Insights is the application performance monitoring component of Azure Monitor. Instrument an application and it captures requests, dependency calls, exceptions, and traces, correlates them into end-to-end transactions, and surfaces latency, failure rates, and bottlenecks. It is observability for your code, as opposed to the infrastructure metrics Monitor collects automatically.

In the current workspace-based model (the retired classic resources stored data separately), its data lands in a Log Analytics workspace and is queried with the same KQL, so it is part of one platform rather than a separate tool. The discipline it demands is sampling: a high-traffic app generates enormous telemetry, and ingesting all of it is expensive and unnecessary when a representative sample answers the same questions.

Telemetry Types

Application Insights collects requests (incoming calls and their latency and result), dependencies (outgoing calls to databases, queues, and other services), exceptions (with stack traces), and custom events and metrics you emit. Together these show not just that an app is slow but where — which dependency, which operation, which exception is driving the failure rate.

Distributed Tracing

By propagating a correlation context across service boundaries, Application Insights stitches a single user request into an end-to-end trace spanning every service and dependency it touched. The application map visualizes these flows and their health, so a failure three services deep is traced to its source rather than guessed at. This is the core value for a microservice architecture.

Live Metrics and Availability Tests

Live Metrics streams telemetry in near real time for watching a deployment as it rolls out. Availability tests probe an endpoint from multiple regions on a schedule and alert when it fails or slows, giving outside-in monitoring that catches outages a user would see even when internal metrics look fine.

Sampling and Cost

Because billing follows ingested telemetry, sampling — keeping a representative fraction of traces while preserving accurate aggregate statistics — is how high-traffic apps stay affordable without losing insight. Adaptive sampling adjusts the rate to the volume automatically. Turning sampling off on a busy app is a fast route to a large and avoidable bill.

Common Mistakes

Disabling sampling on a high-traffic app, ingesting every trace and inflating the bill for data a sample would have covered.
Not propagating correlation context across services, so distributed traces break and a deep failure cannot be followed to its source.
Relying only on internal metrics with no availability tests, missing outages an external user would immediately see.
Instrumenting requests but not dependencies, so a slow database call shows up as an unexplained slow request.
Treating Application Insights as separate from Monitor and Log Analytics rather than one platform sharing KQL.
Ignoring exceptions and custom events, leaving the richest debugging signal uncaptured.

Best Practices

Enable adaptive sampling on high-traffic apps to control cost while preserving accurate aggregates.
Propagate correlation context across all services for unbroken end-to-end distributed traces.
Add availability tests from multiple regions for outside-in monitoring of user-facing endpoints.
Instrument requests, dependencies, and exceptions together so latency is attributable to its cause.
Query App Insights data with KQL alongside other logs in the shared workspace.
Use Live Metrics to watch deployments roll out in near real time.

Comparable servicesAWS X-Ray / CloudWatchGCP Cloud Trace / Cloud Profiler

Knowledge Check

What does distributed tracing in Application Insights provide?

End-to-end correlation of a request across every service and dependency it touched
Automatic CPU, memory, and disk metrics gathered from the host with zero instrumentation
A cheaper storage tier that bypasses Log Analytics workspace billing entirely
DNS-based routing that steers each request to the nearest healthy regional endpoint

Why is sampling important on a high-traffic application?

It keeps a representative fraction of telemetry, controlling cost while preserving accurate aggregate statistics
It encrypts the telemetry in transit between the SDK and the ingestion endpoint, protecting it from interception on the wire
It is required for distributed tracing to correlate requests across service boundaries
It increases the workspace retention period at no additional charge

What do availability tests add that internal metrics do not?

Outside-in probing from multiple regions that catches outages a real user would see
Lower ingestion cost for traces by sampling them away more aggressively at the source
Automatic correlation across services with no added instrumentation
Detailed stack traces captured for every unhandled exception in the app

You got correct