ADR-0001 — Per-vhost cert split for the CCAT step-ca endpoint#

Status. Accepted, 2026-05-05.

Supersedes. None. Captures a decision previously embedded in commits, lessons-learned, and ad-hoc operator knowledge.

Context#

The CCAT step-ca endpoint must be reachable by clients across two populations: machines and operators on Uni Köln subnets, and external partners (REUNA, partner workstations off the uni network). step-cli clients pin trust to the CCAT root via step ca bootstrap --fingerprint and then refuse to talk to a TLS endpoint whose chain doesn’t lead to that root.

We worked through three approaches before landing on the current one, each ruled out by a hard constraint:

  1. Direct exposure of step-ca on :9000 to all clients. Uni Köln IT firewalls drop :9000 between subnets even within the university network, so cross-subnet uni clients (let alone external partners) cannot reach it at all. :9000 is reachable only from input-b’s own /24.

  2. Let’s Encrypt on the ca.ccat.uni-koeln.de vhost via nginx-proxy. Clean from a deployment standpoint (acme-companion handles every other vhost this way), but it would require bootstrapping clients against an LE-issued cert. step-cli’s pinned trust model treats this as a chain mismatch — there’s no clean way to ask clients to trust both a CCAT root and a public CA for the same hostname.

  3. Bridge the two roots client-side (append the system CA bundle to ~/.step/certs/root_ca.crt). We tried this; it works for step ssh login (OIDC flow) but fails for the JWK-flow commands like step ssh certificate, which read root_ca.crt for an internal cert-chain code path and reject multi-PEM input. It is also ergonomically miserable: every laptop needs the hack reapplied after each step ca bootstrap --force. The full ping-pong is in docs/source/ceremony/lessons-learned-cutover-2026-05-04.md.

Decision#

step-ca lives behind nginx-proxy with a per-vhost cert split:

  • The ca.ccat.uni-koeln.de vhost serves a CCAT-rooted cert (issued by step-ca itself via the prod-services JWK provisioner, written to /opt/proxy/certs/ca.ccat.uni-koeln.de.{crt,key}). acme-companion is opted out for this vhost.

  • Every other vhost served by nginx-proxy keeps Let’s Encrypt via acme-companion.

  • step-ca’s own port :9000 remains open on input-b but is firewalled to Uni Köln /16 (defaulting to the ca_allowed_source_cidrs group var) and is used only by the same-host issuance/renewal scripts that run on input-b. Cross-subnet clients always use :443.

  • Access to ca.ccat.uni-koeln.de:443 is policy-enforced in-repo at proxy/data/vhost.d/ca.ccat.uni-koeln.de: an explicit allowlist of partner CIDRs followed by deny all. Adding a partner is a PR plus an nginx -s reload.

Consequences#

Accepted, positive:

  • All cross-subnet clients use a single bootstrap path: step ca bootstrap --ca-url https://ca.ccat.uni-koeln.de. No port, no client-side trust hacks, no per-client variance.

  • Adding a partner is a PR plus a proxy reload. There is no per-partner cert ceremony.

  • The policy enforcement point (vhost.d allowlist) is in git, code- reviewed, and audit-trailable through commit history.

Accepted, costs:

  • The ca.ccat.uni-koeln.de vhost cert needs its own renewal lifecycle — step-ca-vhost-renew.timer (every 12 h) calling step-ca/renew-vhost-cert.sh, plus an emergency re-issue path via step-ca/issue-vhost-cert.sh. acme-companion does not manage this cert; if anyone re-enables it for this vhost the ACME challenge will overwrite the CCAT cert.

  • :9000 is unusable cross-subnet by design. Operators outside Uni Köln cannot issue or renew the vhost cert directly without first reaching input-b (which they have to do for the rest of CCAT operations anyway).

  • The vhost.d allowlist is the only access control on the public CA endpoint. If it’s misconfigured (e.g. lost during a proxy config refactor), the CA endpoint becomes world-reachable. The default config explicitly fails closed (deny all after the allow lines) to mitigate this.

Cross-references#