ADR-0001 — Per-vhost cert split for the CCAT step-ca endpoint#

Status. Accepted, 2026-05-05.

Supersedes. None. Captures a decision previously embedded in commits, lessons-learned, and ad-hoc operator knowledge.

Context#

The CCAT step-ca endpoint must be reachable by clients across two populations: machines and operators on Uni Köln subnets, and external partners (REUNA, partner workstations off the uni network). step-cli clients pin trust to the CCAT root via step ca bootstrap --fingerprint … and then refuse to talk to a TLS endpoint whose chain doesn’t lead to that root.

We worked through three approaches before landing on the current one, each ruled out by a hard constraint:

Direct exposure of step-ca on :9000 to all clients. Uni Köln IT firewalls drop :9000 between subnets even within the university network, so cross-subnet uni clients (let alone external partners) cannot reach it at all. :9000 is reachable only from input-b’s own /24.
Let’s Encrypt on the ca.ccat.uni-koeln.de vhost via nginx-proxy. Clean from a deployment standpoint (acme-companion handles every other vhost this way), but it would require bootstrapping clients against an LE-issued cert. step-cli’s pinned trust model treats this as a chain mismatch — there’s no clean way to ask clients to trust both a CCAT root and a public CA for the same hostname.
Bridge the two roots client-side (append the system CA bundle to ~/.step/certs/root_ca.crt). We tried this; it works for step ssh login (OIDC flow) but fails for the JWK-flow commands like step ssh certificate, which read root_ca.crt for an internal cert-chain code path and reject multi-PEM input. It is also ergonomically miserable: every laptop needs the hack reapplied after each step ca bootstrap --force. The full ping-pong is in docs/source/ceremony/lessons-learned-cutover-2026-05-04.md.

Decision#

step-ca lives behind nginx-proxy with a per-vhost cert split:

The ca.ccat.uni-koeln.de vhost serves a CCAT-rooted cert (issued by step-ca itself via the prod-services JWK provisioner, written to /opt/proxy/certs/ca.ccat.uni-koeln.de.{crt,key}). acme-companion is opted out for this vhost.
Every other vhost served by nginx-proxy keeps Let’s Encrypt via acme-companion.
step-ca’s own port :9000 remains open on input-b but is firewalled to Uni Köln /16 (defaulting to the ca_allowed_source_cidrs group var) and is used only by the same-host issuance/renewal scripts that run on input-b. Cross-subnet clients always use :443.
Access to ca.ccat.uni-koeln.de:443 is policy-enforced in-repo at proxy/data/vhost.d/ca.ccat.uni-koeln.de: an explicit allowlist of partner CIDRs followed by deny all. Adding a partner is a PR plus an nginx -s reload.

Consequences#

Accepted, positive:

All cross-subnet clients use a single bootstrap path: step ca bootstrap --ca-url https://ca.ccat.uni-koeln.de. No port, no client-side trust hacks, no per-client variance.
Adding a partner is a PR plus a proxy reload. There is no per-partner cert ceremony.
The policy enforcement point (vhost.d allowlist) is in git, code- reviewed, and audit-trailable through commit history.

Accepted, costs:

The ca.ccat.uni-koeln.de vhost cert needs its own renewal lifecycle — step-ca-vhost-renew.timer (every 12 h) calling step-ca/renew-vhost-cert.sh, plus an emergency re-issue path via step-ca/issue-vhost-cert.sh. acme-companion does not manage this cert; if anyone re-enables it for this vhost the ACME challenge will overwrite the CCAT cert.
:9000 is unusable cross-subnet by design. Operators outside Uni Köln cannot issue or renew the vhost cert directly without first reaching input-b (which they have to do for the rest of CCAT operations anyway).
The vhost.d allowlist is the only access control on the public CA endpoint. If it’s misconfigured (e.g. lost during a proxy config refactor), the CA endpoint becomes world-reachable. The default config explicitly fails closed (deny all after the allow lines) to mitigate this.

Cross-references#

CCAT Certificate Authority — Architecture and Design — current-state explanation of the trust posture, including why the CA vhost opts out of LE.
CA day-to-day operations — operator how-to for the vhost cert lifecycle (timer, inspection, adding a partner subnet).
CA rotation and disaster recovery — runbook for vhost cert routine renewal and emergency re-issue.
Client setup — SSH with step-ca certificates — partner-facing bootstrap procedure including the off-network tunnel option.
Lessons learned — Phase 2 HSM cutover 2026-05-04 — full retrospective of the attempts that ruled out (1)–(3) above.