AWS Lambda
AWS Lambda is the original serverless service: you upload a function, AWS runs it when a trigger fires, and you pay per millisecond of execution — nothing when idle. There are no servers to start, patch, or right-size. Launched in 2014, it has become the glue of AWS-native systems, appearing in APIs, data pipelines, file processing, and scheduled jobs.
How Lambda Works
A function has three parts: the code (a handler AWS calls), the trigger (the event that starts it), and the configuration (memory, timeout, IAM role, environment variables). When the trigger fires, AWS spins up an isolated environment, loads the code, and calls the handler with the event.
Managed runtimes cover Python, Node.js, Java, .NET, and Ruby. Go now runs on the OS-only provided.al2023 custom runtime — you compile a static binary and deploy it — and the same path supports Rust, C++, or a container image up to 10 GB.
The hard limit that surprises people most is the 15-minute maximum execution time. Memory ranges from 128 MB to 10 GB, and the default account concurrency is 1,000 simultaneous executions per Region (a soft limit you can raise).
Triggers and Event Sources
Lambda is driven by events from across AWS: API Gateway for HTTP APIs, S3 for object events, DynamoDB Streams for row changes, SQS for queued batches, SNS for notifications, EventBridge for schedules and service events, Step Functions for workflow steps, and Application Load Balancer for direct HTTP. Each trigger delivers a JSON event your handler reads and acts on.
Cold Starts and Concurrency
The first event creates a new execution environment — the delay is a cold start, typically 100 ms to a few seconds depending on runtime and package size. The environment is then kept warm for minutes, so subsequent warm starts are far faster.
Three consequences follow: initialize database connections and SDK clients outside the handler so warm invocations reuse them; never share state in globals across invocations, since each runs in its own environment; and treat /tmp as local to one environment, not shared storage.
For latency-sensitive APIs, Provisioned Concurrency keeps a set number of environments warm so cold starts disappear for that portion of traffic — at extra cost.
Pricing
Billing has two parts: about USD 0.20 per million requests, plus duration measured in GB-seconds (more memory means more cost per second, and memory and CPU scale together). The Free Tier covers 1 million requests and 400,000 GB-seconds per month, indefinitely.
A function invoked 5 million times a month at 256 MB and 200 ms costs roughly USD 5 total — cheaper than an always-on instance. But the economics invert at high, steady traffic: a function running all day at high concurrency can cost more than EC2 or Fargate.
Lambda — short, event-driven, sub-15-minute work with no idle cost. The default for glue, APIs, and reactive processing.
Fargate — containerized services that run continuously or exceed Lambda's limits, without managing instances.
EC2 — long-running, stateful, or high-steady-throughput workloads where a flat-rate instance is cheaper than per-invocation billing.
- Using Lambda for work that can exceed 15 minutes — it will be killed mid-run. Use ECS, Batch, or break it up with Step Functions.
- Opening a database connection inside the handler — every invocation reconnects, exhausting the database's connection pool under load. Initialize clients outside the handler.
- Storing secrets as plaintext environment variables — read them from Secrets Manager or Parameter Store instead.
- Setting the timeout to the 15-minute maximum 'to be safe' — a long timeout hides hung calls and inflates cost. Set it just above expected duration.
- Running very high, steady traffic on Lambda and being surprised by the bill — at constant high concurrency, Fargate or EC2 is often cheaper.
- Expecting state to persist in a global variable between invocations — concurrent invocations each get a separate environment with no shared memory.
- Keep each function to one job — small functions are easier to test and faster to deploy.
- Initialize connections and SDK clients outside the handler so warm starts reuse them.
- Store configuration in environment variables and secrets in Secrets Manager or Parameter Store.
- Right-size memory by testing a few settings — more memory adds CPU and can lower total cost by cutting duration.
- Set Provisioned Concurrency on any synchronous API with a latency SLO.
- Emit structured logs to CloudWatch and enable X-Ray tracing for every function.
Knowledge Check
A nightly job sometimes takes 40 minutes to finish. Why is Lambda the wrong tool?
- Lambda enforces a hard 15-minute maximum execution time; the job will be killed mid-run
- Lambda cannot be triggered on a schedule, so a nightly job has no way to start itself
- Lambda cannot access S3 or DynamoDB, so the job has no place to read its input or write results
- Lambda functions cannot exceed 128 MB of memory, so the job runs out of room and crashes partway
Why should database connections and SDK clients be created outside the handler function?
- Code outside the handler runs once per environment, so warm invocations reuse the connection instead of reopening one each time
- Code placed outside the handler runs with elevated IAM permissions beyond what the function's execution role grants the handler body
- AWS bills only for code inside the handler, so anything in the init section runs for free
- Connections created inside the handler are encrypted in transit, while those opened outside it are not
What does Provisioned Concurrency solve?
- It keeps a set number of environments warm so cold starts disappear for that portion of traffic, at extra cost
- It raises the hard 15-minute timeout so long-running jobs can keep executing past the cap
- It removes the per-request charge so high-volume traffic stops adding to the bill
- It lets multiple concurrent invocations safely share global in-memory state across the same execution environment
At what point does Lambda tend to become more expensive than Fargate or EC2?
- At high, steady traffic running all day at high concurrency, where per-invocation billing exceeds a flat-rate instance
- At any traffic level above the monthly Free Tier, which is the point where the per-invocation charge first starts to add up
- Whenever the function is configured with more than 128 MB of memory, raising its GB-second rate
- Only when using Provisioned Concurrency to keep environments pre-warmed around the clock
You got correct