cert-manager is the CNCF graduated project that turns Kubernetes into a certificate-aware control plane. It introduces native CRDs (Issuer, ClusterIssuer, Certificate) so workloads can request, renew, and rotate X.509 credentials declaratively. This guide covers its architecture, supported backends, YAML examples, and production guardrails.
cert-manager is the CNCF graduated project that turns Kubernetes into a certificate-aware control plane. It runs as an in-cluster controller that watches a set of custom resources — Issuer, ClusterIssuer, Certificate, CertificateRequest, Order, Challenge — and reconciles them by requesting, storing, and renewing X.509 certificates. Workloads consume the resulting Kubernetes Secret the same way they would consume any other configuration object, and renewal happens behind the scenes.
The project supports many issuer backends. The same Certificate resource can be backed by an ACME server (Let's Encrypt, ZeroSSL, or a private ACME endpoint), HashiCorp Vault PKI, Venafi TPP / TLS Protect Cloud, Google CAS, AWS Private CA, an in-cluster self-signed or CA Issuer, or any third-party CA reachable through the External Issuer plugin model. cert-manager treats the protocol differences as backend details and exposes a single, uniform API to application teams.
In practice, it has become the default mechanism for TLS, mTLS, and short-lived workload identity inside Kubernetes. Ingress controllers, service meshes (Istio, Linkerd, Cilium), API gateways, and bespoke operators all integrate with it. The canonical reference is cert-manager.io; the project source lives at github.com/cert-manager/cert-manager and is governed by the CNCF after its graduation in 2024.
Before cert-manager, certificate handling inside Kubernetes looked like a patchwork. A platform team would script `certbot` on a jump host and copy the resulting PEM files into a `Secret` by hand. Another team would use Vault's agent injector with a sidecar that rewrote files on disk. A third would mount static, year-long certificates baked into a Helm chart and quietly forget about renewal until an outage reminded them. The cluster knew nothing about any of this: certificates were opaque blobs that arrived from outside and expired without anyone watching.
cert-manager closed that gap by treating certificates as first-class Kubernetes objects. You declare what you want — a hostname, a duration, an issuer, a key algorithm — in YAML. A controller compares the declared state to the cluster state, contacts the right backend, writes a `Secret`, and keeps reconciling until the live certificate matches the declaration. When the certificate is two-thirds of the way through its lifetime, the controller renews automatically. When the spec changes, it re-issues. The model is the same one Kubernetes uses for Pods, Deployments, and Services, applied to X.509.
The project began as `kube-lego` at Jetstack in 2016, was renamed `cert-manager` in 2018, donated to the CNCF Sandbox in 2020, promoted to Incubation in 2022, and reached Graduated status in September 2024. By that point it was deployed in tens of thousands of clusters and effectively unavoidable for anyone running TLS on Kubernetes at scale.
The lifecycle inside the cluster is short to describe and worth pinning down before discussing backends and edge cases. Each step is a reconciliation loop owned by a specific controller, and each produces one or more child resources that you can inspect with `kubectl`.
The reconciliation never stops. If the `Secret` is deleted, the controller re-issues. If the spec changes (a SAN added, an algorithm switched), it re-issues. If the certificate is approaching `renewBefore`, it re-issues. This is what makes cert-manager fundamentally different from running `certbot` on a cron: there is no schedule, only a loop that closes the gap between declared and actual.
An `Issuer` is namespaced and serves Certificates within its own namespace; a `ClusterIssuer` is cluster-scoped and can be referenced from anywhere. Both describe how to obtain a certificate — which ACME directory URL, which Vault path, which CA bundle, which credentials — but do not themselves issue anything. They are the configuration object the controller reads when a request comes in.
The application team (or a platform abstraction on top of it) creates a `Certificate` CRD declaring the subject, the DNS names, the duration, the renewal window, the secret name, and a reference to an Issuer or ClusterIssuer. This is the only object most workload teams ever touch directly.
The cert-manager controller observes the `Certificate`, generates a fresh private key in-cluster, builds a CSR, and writes a `CertificateRequest` resource that captures the signing request and the issuer reference. This object is what would be reviewed in a GitOps audit, and it is what an approval controller would gate on if one were installed.
A second controller, specific to the chosen backend (ACME, Vault, CA, Venafi, External Issuer), picks up the `CertificateRequest`, runs the protocol exchange — for ACME that means creating `Order` and `Challenge` children and solving HTTP-01, DNS-01, or TLS-ALPN-01 — and writes the signed certificate back onto the `CertificateRequest` once the CA returns it.
The controller serialises the certificate, the chain, and the private key into a Kubernetes `Secret` of type `kubernetes.io/tls`, with keys `tls.crt`, `tls.key`, and `ca.crt`. Pods mount it as a file volume or read it via environment, and the renewal loop overwrites the same Secret in place when the certificate is two-thirds of the way through its lifetime.
The Issuer abstraction is the part that makes cert-manager interesting at the platform level. Application teams declare a Certificate; the platform team chooses which authority sits behind the scenes. Swapping a Let's Encrypt-backed ClusterIssuer for a private ACME endpoint pointed at an internal CA is, from the application's point of view, a no-op.
A common platform pattern is one `ClusterIssuer` per environment and per trust domain: a public ACME issuer for ingress hostnames that need browser-trusted certificates, and a private ACME or Vault issuer for everything internal. The namespace-scoped `Issuer` exists for the cases where an application owns its own credentials and the platform team explicitly does not want them exposed cluster-wide.
The most common backend. Works against any RFC 8555 directory: Let's Encrypt (staging and production), ZeroSSL, Buypass, Google's public CA, and crucially any private ACME endpoint exposed by a corporate CA. Supports HTTP-01, DNS-01 (with built-in solvers for Route 53, Cloud DNS, Azure DNS, Cloudflare, RFC 2136, and webhook-based solvers for the rest), and TLS-ALPN-01.
Talks to a Vault PKI secrets engine using AppRole, Kubernetes ServiceAccount, or JWT/OIDC auth. Suited to environments that already run Vault for application secrets and want certificates to follow the same governance and audit model.
Native integration with Venafi TPP and Venafi TLS Protect Cloud, so existing policy folders and approval workflows remain authoritative. Useful when Venafi is the corporate CLM of record and cert-manager is just the cluster-side delivery layer.
An in-cluster CA whose key material lives in a Secret. Convenient for short-lived intermediate CAs, ephemeral test environments, and bootstrap scenarios. Not appropriate as a long-lived root — there is no offline protection, no HSM, no separation of duties.
A plugin contract that lets any CA ship a controller implementing the CertificateRequest API. Public examples include AWS Private CA, Google CAS, Step CA, SmallStep, GlobalSign Atlas, and a long tail of vendor-specific issuers. From the user's perspective, an External Issuer looks identical to a built-in one.
The three resources you will use day to day are `Certificate`, `CertificateRequest`, and the Issuer pair. They look similar in YAML but play very different roles in the lifecycle.
The pattern that surprises newcomers is the lifetime mismatch between `Certificate` and `CertificateRequest`. A single `Certificate` named `api-tls` can produce dozens of `CertificateRequest` children over time, one per issuance. The history is preserved (within `revisionHistoryLimit`) so that auditors can answer "who renewed this, when, against which issuer, and what was in the CSR?" without needing access to the CA's own logs.
Two solvers are declared and selected by DNS zone. Public hostnames on `example.com` are validated by HTTP-01 through the nginx Ingress controller. Internal hostnames on `internal.example.com` — which a public CA could not reach over HTTP — are validated by DNS-01 against Route 53. The same `ClusterIssuer` serves both because cert-manager picks the matching solver per Certificate. For a private ACME endpoint backed by a corporate CA, only the `server` URL and the credentials change; the rest of the resource looks identical.
This is the shape of a modern workload identity certificate: 24-hour duration, renewal triggered at the ⅓-lifetime mark (8h before expiry), ECDSA P-256 keys, both server-auth and client-auth EKUs for mTLS in both directions, and a SPIFFE URI SAN so a service mesh can use the certificate as a portable identity. `rotationPolicy: Always` forces a fresh private key on every renewal, which is the conservative default for anything that crosses a network boundary.
The `kubectl cert-manager` plugin (installable via `krew`) is worth keeping in the platform toolkit. `cmctl status certificate` and `cmctl renew` save a lot of guessing in incident response and print readable error chains when a renewal is stuck.
| Certificate | CertificateRequest | Issuer / ClusterIssuer | |
|---|---|---|---|
| Scope | Namespaced | Namespaced | Issuer = namespaced; ClusterIssuer = cluster-wide |
| Author | Application team or platform abstraction | cert-manager controller (auto-generated) | Platform / PKI team |
| Lifetime | Long — same name reused across renewals | Short — one per issuance event, kept for audit | Long — rarely changes once configured |
| Triggers renewal | Yes, via `renewBefore` or spec change | No, it is the *artefact* of a renewal, not the trigger | No, it is configuration |
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: platform-pki@example.com
privateKeySecretRef:
name: letsencrypt-prod-account-key
solvers:
- http01:
ingress:
ingressClassName: nginx
selector:
dnsZones:
- example.com
- dns01:
route53:
region: eu-west-1
hostedZoneID: Z2KZENXMP3JV5Y
selector:
dnsZones:
- internal.example.com apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: payments-api-mtls
namespace: payments
spec:
secretName: payments-api-mtls
secretTemplate:
annotations:
reloader.stakater.com/match: "true"
issuerRef:
name: internal-acme
kind: ClusterIssuer
commonName: payments-api.payments.svc.cluster.local
dnsNames:
- payments-api.payments.svc.cluster.local
- payments-api.payments
uris:
- spiffe://example.com/ns/payments/sa/payments-api
duration: 24h
renewBefore: 8h
privateKey:
algorithm: ECDSA
size: 256
rotationPolicy: Always
usages:
- server auth
- client auth # Watch the full chain of children for a Certificate
kubectl -n payments get certificate,certificaterequest,order,challenge
# Inspect what was actually issued (Secret contents, not the CRD)
kubectl -n payments get secret payments-api-mtls -o jsonpath='{.data.tls\.crt}' \
| base64 -d \
| openssl x509 -noout -subject -issuer -dates -ext subjectAltName,extendedKeyUsage
# Force a renewal without touching the spec (useful for incident response)
kubectl cert-manager renew -n payments payments-api-mtls
# Confirm the controller saw it and produced a new CertificateRequest
kubectl -n payments get certificaterequest \
--sort-by=.metadata.creationTimestamp \
-o custom-columns=NAME:.metadata.name,READY:.status.conditions[?(@.type=='Ready')].status,AGE:.metadata.creationTimestamp cert-manager makes the easy case trivially easy. The hard cases — multi-cluster fleets, private ACME endpoints, mixed public-and-private trust, audit obligations — are where operational details start to matter. The list below is not exhaustive but captures the items most teams learn the hard way.
The temptation, especially in self-service platforms, is to let each application team create its own Issuer pointing at its own credentials. This scales badly. The Issuer is configuration that the PKI team needs to govern: which CA, which policy template, which approval gate, which audit destination. The right granularity is one `ClusterIssuer` per trust domain per environment (production-public, production-internal, staging-internal, dev-internal), referenced by reference, never copied. Application teams pick an issuer name; they do not configure one.
The cert-manager default renews at two-thirds of lifetime, which leaves the final third as the safety window. For a 24-hour certificate that gives 8 hours of slack — enough for an ACME outage, a DNS propagation delay, or a controller restart. Shorter ratios save nothing meaningful and remove your margin; longer ratios mean you reissue more often than necessary and put extra load on the CA. The ⅓ rule scales linearly: 90-day certificates renew at day 60, 47-day certificates renew at day 31, 24-hour certificates renew at the 16-hour mark.
When the controller writes a new `Secret`, the kubelet on each node detects the change and updates the projected volume — but only at its sync interval, which defaults to roughly 60 seconds. Long-running pods that read the certificate once at startup will keep using the old one until they restart. Either use a sidecar like `reloader` to trigger rolling restarts when a Secret changes, or use a library (Go's `tls.GetCertificate` callback, Envoy's SDS subscription) that re-reads the file on every connection. The trap is silent: the certificate is renewed, the dashboards are green, and the workload is still presenting the old one.
Let's Encrypt enforces, at the time of writing, 300 certificates per registered domain per week (raised from 50 in January 2024) and 5 duplicate certificates per week (same exact SAN set). A cluster running 200 microservices on `*.example.com` with daily rotation will hit the duplicate limit in two days and the per-domain limit not long after. The realistic options are a wildcard certificate at the apex (one cert, many services), a private ACME endpoint with no such limits, or both — public ACME for ingress, private ACME for everything inside the cluster.
cert-manager keeps the last few `CertificateRequest` revisions per Certificate, and that history is enough for an in-cluster postmortem. It is not enough for an enterprise inventory: it lives inside one cluster, it is bounded by `revisionHistoryLimit`, and it has no notion of certificates issued outside the cluster (load balancers, code-signing, IoT, ADCS). A serious deployment ships every `CertificateRequest` and every `Certificate` event to an external CLM system, where they sit alongside the rest of the organisation's certificates and can be queried, alerted on, and reported against in a single place.
Private ACME for cert-manager out of the box — Evertrust PKI exposes an RFC 8555 ACME endpoint backed by your private CA — point any cert-manager ClusterIssuer at it. No public rate limits, full policy control on key types, durations, SANs, and EKUs, and credentials issued via your existing IAM. The same `ClusterIssuer` your application teams already understand now points at an authority you actually govern.
Unified multi-cluster visibility — Evertrust CLM ingests every issuance event from every cert-manager instance across every cluster and correlates it with the rest of the organisation's certificates (load balancers, code signing, IoT, ADCS, public CAs). One inventory, one expiry view, one place to answer "where is this CN deployed?" — which is exactly what cert-manager's in-cluster history cannot give you on its own.
Policy guardrails across the platform — define once which CAs, key algorithms, certificate durations, and SAN patterns are allowed for which clusters, namespaces, and trust domains. cert-manager requests that violate the policy are rejected at the PKI boundary, not after the fact in an audit. Application teams keep their declarative workflow; the PKI team keeps the controls.