Service 51

Event Hubs

Streaming

Azure Event Hubs is a big-data streaming ingestion service: it accepts millions of events per second, retains them in an ordered, replayable log, and lets multiple independent consumers read the stream at their own pace. It is the front door for telemetry, application logs, clickstreams, and IoT data flowing into analytics.

It is a stream, not a message queue. Events are not deleted when read — they sit in the log for a retention window, and consumers track their own position, so the same data can be replayed and read by many pipelines. Treating it like Service Bus (delete-on-consume, per-message handling) misses the entire point, and vice versa.

Partitions and Throughput

An event hub is divided into partitions — parallel ordered logs that set the maximum read parallelism. Capacity is provisioned as throughput units on Standard, processing units on Premium, and capacity units on the Dedicated tier — each capping ingress and egress. Partition count is chosen up front for the consumer parallelism you will need, and too few partitions caps throughput no matter how many consumers you add; the Premium and Dedicated tiers can scale partitions up dynamically, but Basic and Standard cannot.

Consumer Groups and Checkpointing

A consumer group is an independent view of the stream, with its own position, so several pipelines (real-time dashboard, archival, anomaly detection) read the same events without interfering. Each consumer checkpoints its progress, so on restart it resumes where it left off rather than reprocessing everything or skipping data. Checkpointing is what makes consumption reliable across restarts.

Capture

Event Hubs Capture automatically writes the incoming stream to Blob or Data Lake Storage in Avro format on a time or size interval, with no code. It gives you a durable, queryable archive of the raw stream for batch analytics alongside the real-time consumers — the streaming-plus-batch pattern without building the archival path yourself.

Kafka Compatibility

Event Hubs exposes a Kafka-compatible endpoint, so applications and frameworks written for Apache Kafka can produce and consume from it with a configuration change and no code rewrite. This lets teams bring existing Kafka workloads to a managed Azure service rather than operating their own Kafka cluster.

Event Hubs vs Service Bus vs Event Grid

Event Hubs — High-volume, replayable event streams read by many consumers. Choose it for telemetry, logs, and analytics ingestion.

Service Bus — Reliable discrete messages (commands) with ordering and dead-letter. Choose it for work that must be processed once.

Event Grid — Discrete event notifications, push-based, reactive. Choose it for event-driven integration, not high-volume streams.

Common Mistakes

Treating Event Hubs like a queue — expecting delete-on-consume and per-message handling instead of a replayable stream consumers track position in.
Under-provisioning partitions, capping consumer parallelism and throughput regardless of how many consumers you add.
Not checkpointing, so a consumer reprocesses the whole window or loses its place on restart.
Using Event Hubs for low-volume discrete commands where Service Bus or Event Grid fits better.
Building a custom archival path when Event Hubs Capture would write the stream to storage automatically.
Operating a self-managed Kafka cluster when the Kafka-compatible endpoint would offer the same API as a managed service.

Best Practices

Use Event Hubs for high-volume, replayable streams consumed by multiple independent pipelines.
Choose partition count up front for the consumer parallelism you will need.
Use a consumer group per independent pipeline and checkpoint progress for reliable restart.
Enable Capture to archive the raw stream to storage for batch analytics with no custom code.
Use the Kafka-compatible endpoint to bring existing Kafka workloads onto a managed service.
Reserve Service Bus and Event Grid for discrete messages and events rather than streams.

Comparable servicesAWS Kinesis Data Streams / MSKGCP Pub/Sub

Knowledge Check

How does Event Hubs differ from a message queue like Service Bus?

Events are retained in a replayable log and consumers track their own position, rather than being deleted on consume
Event Hubs guarantees exactly-once command processing with strict per-message ordering and removes each command once its worker acknowledges completion
Event Hubs deletes each event as soon as the first consumer reads it off the stream
Event Hubs cannot have more than one consumer attached to a stream at a time

Why does partition count matter, and when is it decided?

Partitions set the maximum read parallelism, chosen up front — too few caps throughput regardless of consumer count
Partitions can be increased freely at any time on any tier with no effect on throughput
Partition count sets how long events are retained in the log before deletion
Partition count determines which version of the Kafka-compatible endpoint is exposed to producer and consumer clients

What does Event Hubs Capture provide?

Automatic archival of the incoming stream to Blob or Data Lake storage with no code
Exactly-once delivery of each event to a single designated consumer
A fully managed Apache Kafka cluster provisioned and running alongside the namespace
Real-time alerting that fires when stream contents match a rule

You got correct