Topic 62

Web SaaS Platform

SaaSArchitecture

The first case study is the most common one: a customer-facing SaaS — a multi-service web application serving many tenants over the internet, with the usual demands of availability, predictable releases, and elastic capacity. It assembles the stateless-web pattern, ingress, autoscaling, managed data, and GitOps into one architecture.

Nothing here is new; the value is seeing the earlier pieces fit together, and the decisions and trade-offs that shape a real system rather than a diagram.

Requirements and Traffic Shape

The workload is HTTP, bursty (daytime peaks, marketing spikes), latency-sensitive, and multi-tenant. It must deploy frequently without downtime, scale with traffic, isolate tenants reasonably, and survive a zone failure. These requirements, not technology preference, drive every choice that follows — the architecture is a response to them.

The Architecture

The shape is the stateless-web pattern at scale. An Ingress/Gateway terminates TLS and routes to Deployments per service (frontend, API, a few backend services), each fronted by a Service and scaled by an HPA on request rate. The primary database is a managed service, not in-cluster, with a managed cache alongside; the cluster stays stateless. Replicas spread across zones with topology spread and PDBs for zone-failure survival and safe operations.

Request path

Gateway / Ingress (TLS)

→

Service → frontend / API Deployments (HPA)

→

Managed database + cache (outside the cluster)

Tenancy, Config, and Release

Tenancy here is soft and application-level — one shared deployment serving all tenants, isolated in the data layer — with namespaces separating environments rather than tenants. Config and per-environment values come from ConfigMaps and Secrets (secrets from an external store). Releases run through GitOps with canary rollouts: a new version takes a slice of traffic, metrics are watched, and it ramps or rolls back automatically. Readiness probes make each rollout zero-downtime.

Alternatives Considered

Two roads were not taken. Running the database in-cluster (a StatefulSet or operator) was rejected — the operational cost of self-running the primary datastore outweighed the marginal control, and a managed service is the safer default. Skipping Kubernetes entirely for a serverless-container platform (Cloud Run / ECS) was viable and simpler for a smaller system, but the team valued portability, the ecosystem, and consistent tooling across many services. The honest note: at low scale, the serverless-container option would have been less to operate.

Kubernetes SaaS vs serverless-container platform

Kubernetes — portable, rich ecosystem, consistent across many services — at the cost of operating a cluster.

Serverless containers (Cloud Run/ECS) — less to operate and often simpler at small scale — at the cost of portability and ecosystem depth.

Common Mistakes

Running the primary database in-cluster by default instead of using a managed service.
Deploying without readiness probes or PDBs, so releases and zone events cause downtime.
Spreading replicas across nodes but not zones, leaving a zone failure fatal.
Per-service LoadBalancers instead of one Gateway/Ingress, multiplying cost and complexity.
Over-engineering hard tenancy when application-level isolation met the requirement.

Best Practices

Keep the cluster stateless; put the primary datastore and cache on managed services.
Front services with one Gateway/Ingress; scale on request rate with HPAs.
Spread across zones with topology spread + PDBs for zone-failure survival and safe rollouts.
Release through GitOps with canary and automated rollback; rely on readiness probes.
Right-size tenancy to the requirement — application-level isolation is often enough.

AlternativesServerless containers — Cloud Run / ECS as the without-Kubernetes optionPaaS — even higher-level, less controlSelf-run DB on Kubernetes — the rejected stateful-in-cluster path

Knowledge Check

Why keep the primary database outside the cluster in this SaaS design?

The operational cost of self-running the datastore outweighs the marginal control; managed is safer
Kubernetes simply cannot run a stateful relational database inside the cluster at all, even with StatefulSets and operators
StatefulSets cannot attach any durable persistent storage to a replica through a PersistentVolumeClaim
A managed database service is always strictly cheaper per gigabyte than self-running one

What makes the SaaS releases zero-downtime?

Readiness probes plus canary rollouts via GitOps with automated rollback
Using the Recreate deployment strategy for every release
Running exactly one replica per service so there is nothing to drain
Disabling the HorizontalPodAutoscaler during every deploy

When might skipping Kubernetes for a serverless-container platform have been better?

At small scale, where it is simpler to operate, trading away portability and ecosystem
Never — running a full Kubernetes cluster is always the simpler option at any scale whatsoever
Only for heavy stateful workloads that need durable persistent volumes attached to every replica
Only when the whole system happens to consist of exactly one single service

You got correct