Topic 39

Operators and Controllers

AutomationController

An operator packages a CRD with a controller that encodes the operational knowledge to run a complex system — a database, a message broker, a certificate manager. You declare a custom resource saying what you want; the operator's reconcile loop does the work a human operator would otherwise do.

It is the same reconcile loop that powers the rest of Kubernetes, applied to your own objects. Understanding the pattern explains both how to extend Kubernetes and how much of its ecosystem (cert-manager, Prometheus, database operators) is built.

The Controller Pattern

An operator's reconcile loop

Watchthe custom resource

→

Observeactual vs desired

→

Actcreate / back up / fail over

→

Repeatcontinuously

A controller watches objects of some kind and continuously drives actual state toward the desired state in their spec. It is a loop: observe the resource and the world, compute the difference, take actions to close it, repeat. The built-in controllers do this for Deployments and Services; a custom controller does it for your CRD. Crucially, reconciliation must be idempotent — running it repeatedly with the same desired state must be safe, because it runs constantly, not once.

What Makes It an Operator

An operator is a controller plus a CRD plus domain knowledge. Where a Deployment controller knows how to keep N Pods running, a database operator knows how to bootstrap a cluster, add a replica, take a backup, fail over a primary, and run a version upgrade — the things a skilled human operator would do. You express intent (Database with 3 replicas, version 17); the operator performs the multi-step, stateful procedure to make it so and keep it so.

Building Operators

You rarely write the reconcile machinery from scratch. controller-runtime (the library), Kubebuilder, and the Operator SDK scaffold the watches, queues, and reconcile loop so you write only the domain logic. Operators are also rated by maturity levels — from basic install, through upgrades and lifecycle, up to full auto-pilot with metrics and tuning — which is a useful lens when judging whether a third-party operator is production-ready.

When to Use One

Reach for an operator when a system needs ongoing operational logic that a Deployment cannot express — stateful databases, clustered brokers, anything with backups and failover. But weigh the alternatives honestly: for many teams a managed service is the better answer to "we need a database," trading control for not operating it at all, and for simple apps a Deployment or Helm chart is enough. Running a low-maturity community operator for critical data can be riskier than the database it manages — don't adopt an operator without trusting its maturity and your ability to debug it.

Operator vs Helm chart vs StatefulSet

StatefulSet — stable identity and storage, but no operational logic — you handle backups, failover, upgrades.

Helm chart — templated install and upgrade of manifests; still no day-2 operational behavior.

Operator — install plus ongoing operations (backup, failover, scaling) encoded as a controller.

Common Mistakes

Writing an operator when a Deployment, StatefulSet, or Helm chart would do — adding a controller you must maintain.
Trusting a low-maturity community operator with production-critical data.
Writing non-idempotent reconcile logic, so repeated runs cause drift or duplication.
Ignoring the operator's own RBAC — operators often need broad permissions, which are a risk to scope.
Assuming an operator removes operational burden entirely, rather than shifting it to operating the operator.

Best Practices

Use operators for systems with genuine day-2 operational logic (stateful, clustered, backup/failover).
Prefer a managed service or a simple Deployment/Helm chart when those suffice.
Build on controller-runtime / Kubebuilder / Operator SDK rather than hand-rolling the loop.
Write reconcile logic to be idempotent — it runs continuously, not once.
Evaluate a third-party operator's maturity level and scope its RBAC before trusting it with data.

RelatedCustom Resource Definitions — the API half of an operator (Topic 38)StatefulSets — what operators build on for stateful systems (Topic 14)Managed databases — often the better alternative to self-operating

Knowledge Check

What distinguishes an operator from a plain controller?

An operator is a controller plus a CRD plus domain operational knowledge (backup, failover, upgrades)
An operator must run on the control plane alongside the API server while a plain controller is confined to worker nodes
An operator drives its work purely through one-shot events and needs no reconcile loop watching desired state
They are exactly the same thing under two fully interchangeable names with no difference in scope

Why must reconcile logic be idempotent?

The loop runs continuously with the same desired state; repeated runs must be safe and not cause drift or duplication
Because reconcile fires exactly once per object over its lifetime and must complete every step of the work within that single pass
To satisfy the OpenAPI schema's structural validation rules on every field of the custom resource it manages
So the controller can safely rewrite the user's spec fields on each pass without losing prior edits

When is a managed service often a better choice than a self-run operator?

For a production database, where not operating it at all can outweigh the control an operator gives
Whenever the workload happens to be defined by a CRD rather than one of the built-in Kubernetes object kinds
Only for stateless web apps that hold no persistent data of their own and can be restarted freely
When you need the most configurability and fine-grained tuning over every part of the system

You got correct