Service 02

AWS Lambda

ServerlessFaaSCompute

AWS Lambda is the original serverless service: you upload a function, AWS runs it when a trigger fires, and you pay per millisecond of execution — nothing when idle. There are no servers to start, patch, or right-size. Launched in 2014, it has become the glue of AWS-native systems, appearing in APIs, data pipelines, file processing, and scheduled jobs.

How Lambda Works

A function has three parts: the code (a handler AWS calls), the trigger (the event that starts it), and the configuration (memory, timeout, IAM role, environment variables). When the trigger fires, AWS spins up an isolated environment, loads the code, and calls the handler with the event.

Managed runtimes cover Python, Node.js, Java, .NET, and Ruby. Go now runs on the OS-only provided.al2023 custom runtime — you compile a static binary and deploy it — and the same path supports Rust, C++, or a container image up to 10 GB.

The hard limit that surprises people most is the 15-minute maximum execution time. Memory ranges from 128 MB to 10 GB, and the default account concurrency is 1,000 simultaneous executions per Region (a soft limit you can raise).

Triggers and Event Sources

Lambda is driven by events from across AWS: API Gateway for HTTP APIs, S3 for object events, DynamoDB Streams for row changes, SQS for queued batches, SNS for notifications, EventBridge for schedules and service events, Step Functions for workflow steps, and Application Load Balancer for direct HTTP. Each trigger delivers a JSON event your handler reads and acts on.

Event-Driven Execution — Trigger to Downstream

TriggerAPI Gateway · S3 · DynamoDB Streams · SQS · EventBridge

→

Lambda functionAWS spins up an environment and runs your handler

→

DownstreamDynamoDB, S3, other services — pay per ms, nothing when idle

The first invocation pays a cold start while the environment is created; it stays warm for later calls. Connections and SDK clients created outside the handler are reused on warm starts.

Cold Starts and Concurrency

The first event creates a new execution environment — the delay is a cold start, typically 100 ms to a few seconds depending on runtime and package size. The environment is then kept warm for minutes, so subsequent warm starts are far faster.

Three consequences follow: initialize database connections and SDK clients outside the handler so warm invocations reuse them; never share state in globals across invocations, since each runs in its own environment; and treat /tmp as local to one environment, not shared storage.

For latency-sensitive APIs, Provisioned Concurrency keeps a set number of environments warm so cold starts disappear for that portion of traffic — at extra cost.

Pricing

Billing has two parts: about USD 0.20 per million requests, plus duration measured in GB-seconds (more memory means more cost per second, and memory and CPU scale together). The Free Tier covers 1 million requests and 400,000 GB-seconds per month, indefinitely.

A function invoked 5 million times a month at 256 MB and 200 ms costs roughly USD 5 total — cheaper than an always-on instance. But the economics invert at high, steady traffic: a function running all day at high concurrency can cost more than EC2 or Fargate.

Lambda vs Fargate vs EC2

Lambda — short, event-driven, sub-15-minute work with no idle cost. The default for glue, APIs, and reactive processing.

Fargate — containerized services that run continuously or exceed Lambda's limits, without managing instances.

EC2 — long-running, stateful, or high-steady-throughput workloads where a flat-rate instance is cheaper than per-invocation billing.

Common Mistakes

Using Lambda for work that can exceed 15 minutes — it will be killed mid-run. Use ECS, Batch, or break it up with Step Functions.
Opening a database connection inside the handler — every invocation reconnects, exhausting the database's connection pool under load. Initialize clients outside the handler.
Storing secrets as plaintext environment variables — read them from Secrets Manager or Parameter Store instead.
Setting the timeout to the 15-minute maximum 'to be safe' — a long timeout hides hung calls and inflates cost. Set it just above expected duration.
Running very high, steady traffic on Lambda and being surprised by the bill — at constant high concurrency, Fargate or EC2 is often cheaper.
Expecting state to persist in a global variable between invocations — concurrent invocations each get a separate environment with no shared memory.

Best Practices

Keep each function to one job — small functions are easier to test and faster to deploy.
Initialize connections and SDK clients outside the handler so warm starts reuse them.
Store configuration in environment variables and secrets in Secrets Manager or Parameter Store.
Right-size memory by testing a few settings — more memory adds CPU and can lower total cost by cutting duration.
Set Provisioned Concurrency on any synchronous API with a latency SLO.
Emit structured logs to CloudWatch and enable X-Ray tracing for every function.

Comparable services GCP Cloud Functions, Cloud RunAzure Azure Functions

Knowledge Check

A nightly job sometimes takes 40 minutes to finish. Why is Lambda the wrong tool?

Lambda enforces a hard 15-minute maximum execution time; the job will be killed mid-run
Lambda cannot be triggered on a schedule, so a nightly job has no way to start itself
Lambda cannot access S3 or DynamoDB, so the job has no place to read its input or write results
Lambda functions cannot exceed 128 MB of memory, so the job runs out of room and crashes partway

Why should database connections and SDK clients be created outside the handler function?

Code outside the handler runs once per environment, so warm invocations reuse the connection instead of reopening one each time
Code placed outside the handler runs with elevated IAM permissions beyond what the function's execution role grants the handler body
AWS bills only for code inside the handler, so anything in the init section runs for free
Connections created inside the handler are encrypted in transit, while those opened outside it are not

What does Provisioned Concurrency solve?

It keeps a set number of environments warm so cold starts disappear for that portion of traffic, at extra cost
It raises the hard 15-minute timeout so long-running jobs can keep executing past the cap
It removes the per-request charge so high-volume traffic stops adding to the bill
It lets multiple concurrent invocations safely share global in-memory state across the same execution environment

At what point does Lambda tend to become more expensive than Fargate or EC2?

At high, steady traffic running all day at high concurrency, where per-invocation billing exceeds a flat-rate instance
At any traffic level above the monthly Free Tier, which is the point where the per-invocation charge first starts to add up
Whenever the function is configured with more than 128 MB of memory, raising its GB-second rate
Only when using Provisioned Concurrency to keep environments pre-warmed around the clock

You got correct