Service 01

Azure Virtual Machines

ComputeIaaS

Azure Virtual Machines is infrastructure-as-a-service compute: you choose an OS image and a size, and Azure provides the hypervisor, the physical host, and the maintenance underneath. You own everything above the virtual hardware — the operating system, patching, the runtime, and how the application scales and stays available.

VMs are the most flexible and the least managed option in this chapter. They are the right choice for lift-and-shift migrations, software that expects a full operating system, GPU and HPC workloads, and anything that needs kernel-level control. The cost of that flexibility is operational: a VM that nobody patches, sizes, or spreads across zones is a liability, not an asset.

VM Series and Sizes

A size such as Standard_D4s_v5 encodes a family, a vCPU count, and capabilities. The letter is the family — B for burstable, D for general-purpose, E for memory-optimized, F for compute-optimized, L for storage, and N for GPU. The number is the vCPU count, the s means premium-SSD capable, and v5 is the hardware generation.

Match the family to the bottleneck. A cache server is memory-bound and belongs on an E-size; a batch transcoder is CPU-bound and belongs on an F-size; a dev box that idles most of the day fits a B-size, which banks CPU credits while idle and spends them under load. Picking the wrong family means paying for a resource the workload never touches.

Family	Optimized for	Typical workload
B	Burstable, credit-banking	Dev/test, low-traffic web
D	Balanced vCPU:memory	Web tier, app servers
E	Memory	Caches, in-memory databases
F	Compute	Batch, gaming, transcoding
N	GPU	Training, inference, rendering

Disks and Images

Every VM boots from an OS disk and can attach data disks, all backed by Managed Disks (covered in Chapter 2). Disk tier — Standard HDD, Standard SSD, Premium SSD, Premium SSD v2, or Ultra Disk — sets IOPS and throughput, and it is the most common quiet performance bug: a database on Standard HDD will be slow no matter how large the VM.

An ephemeral OS disk lives on the host's local storage instead of remote Managed Disk storage. It is free, faster to provision, and resets on deallocation — ideal for stateless, immutable fleets, and wrong for anything that keeps state on the OS disk. Custom images and the Azure Compute Gallery let you bake a golden image once and roll it out across regions.

Availability

A single VM does have a single-instance SLA, but the number depends entirely on its disks: 99.9% with all-Premium SSD or Ultra disks, 99.5% with Standard SSD, and just 95% with Standard HDD. Real availability comes from spreading two or more VMs across an Availability Set (99.95%) or, better, across Availability Zones (99.99%). Availability is something you architect, not something one VM gives you.

Availability Zones are physically separate datacenters within a region, with independent power and networking — spreading across zones survives a datacenter failure. Availability Sets spread VMs across fault and update domains within a single datacenter, surviving rack-level hardware failure and host maintenance, but not a datacenter outage. Zones are the stronger guarantee where the region offers them.

Pricing Models

Pay-as-you-go bills per second with no commitment and is the most expensive per hour. A Reserved Instance commits to a specific VM family in a region for one or three years and cuts the rate by up to ~72%. A Savings Plan commits to an hourly dollar amount across families and regions instead — more flexible, slightly less discount.

Spot VMs use spare capacity at up to ~90% off, but Azure can evict them with 30 seconds' notice when it needs the capacity back. Spot is excellent for fault-tolerant batch and CI, and wrong for anything that cannot checkpoint and resume.

Identity and Access

Assign a managed identity to a VM so its applications authenticate to Key Vault, Storage, and other services through Entra ID with no credentials stored on disk. This is the single most effective way to keep secrets out of config files and environment variables. Network access should run through a bastion or a VPN, never an SSH or RDP port open to the public internet.

Availability Zones vs Availability Sets

Availability Zone — Physically separate datacenters in a region. Two or more VMs across zones earns a 99.99% SLA and survives a full datacenter failure. Choose zones whenever the region supports them.

Availability Set — Fault and update domains inside one datacenter. Two or more VMs earns a 99.95% SLA and survives rack failure and host maintenance, not a datacenter outage. The fallback in regions without zones.

Common Mistakes

Leaning on a lone VM's single-instance SLA for production — all-Premium disks earn only 99.9% (and Standard SSD just 99.5%, Standard HDD 95%); surviving a datacenter failure requires spreading across Availability Zones, not one VM.
Putting a latency-sensitive database on a Standard HDD OS or data disk — the VM size is irrelevant when the disk caps IOPS at a few hundred.
Leaving RDP (3389) or SSH (22) open to 0.0.0.0/0 — internet-facing management ports are scanned and brute-forced within minutes. Use Azure Bastion or a VPN.
Running Spot VMs for stateful or time-critical work — a 30-second eviction notice will drop the workload, and there is no negotiating it.
Oversizing the VM to be safe instead of right-sizing then buying a reservation — you pay the on-demand rate for capacity you never use.
Storing application secrets in environment variables or config when a managed identity would eliminate them entirely.

Best Practices

Spread production VMs across Availability Zones, and use Availability Sets only in regions without zones.
Match the VM family to the bottleneck — E-size for memory-bound, F-size for compute-bound, B-size for bursty idle workloads.
Use Premium SSD for any production workload with a latency requirement, and Ultra Disk only when you have measured the IOPS need.
Assign a managed identity to every VM and pull secrets from Key Vault — never store credentials on the disk.
Right-size first, then commit with a Reserved Instance or Savings Plan for steady workloads to cut the rate by up to ~72%.
Reach management ports through Azure Bastion; keep 22 and 3389 closed to the internet on the network security group.

Comparable servicesAWS EC2GCP Compute Engine

Knowledge Check

A single Azure VM is deployed with Premium SSD disks but is not placed in any Availability Zone or Set. What availability does it have?

A lower single-instance SLA, but it cannot survive a datacenter outage
The full 99.99% SLA, because Premium SSD disks guarantee zonal redundancy
No SLA whatsoever under any configuration
The same SLA as a two-VM zonal deployment

What does spreading VMs across Availability Zones protect against that an Availability Set does not?

A full datacenter failure — zones are physically separate datacenters
A single application crash on one VM
A region-wide outage across all datacenters
A noisy-neighbor performance problem caused by other tenants on shared host hardware

What is the defining trade-off of a Spot VM?

Up to ~90% cost savings in exchange for eviction with 30 seconds' notice
A guaranteed lower latency in exchange for a higher hourly rate
Free egress in exchange for being locked to one region
More vCPUs allocated for the same hourly price in exchange for losing GPU support

A workload is memory-bound. Which VM family fits best?

E-series (memory-optimized)
F-series (compute-optimized)
B-series (burstable)
N-series (GPU)

You got correct