# Lessons learned — root ceremony 2026-04-29 This document captures the surprises and corrections from CCAT's first HSM root ceremony. The [playbook](playbook.md) has been updated to bake these in, so a future ceremony following the playbook should not re-encounter them. This page exists so future operators understand the *why* behind the playbook's odd-looking choices — and so a future incident-responder reading the audit trail knows what to expect. Run details: - **Date:** 2026-04-29 - **Operators:** two-person rule, ceremony witnessed - **HSMs:** Nitrokey HSM 2 ×2 (serials on the paper sheet in the safe) - **Software pinned:** step-cli, step-kms-plugin, opensc — versions in the supplies-USB `VERSIONS.txt` - **Live media:** Ubuntu 24.04 LTS Desktop, RAM-only ## 1. OpenSC virtual-token quirk: the label gets renamed When you run `sc-hsm-tool --initialize --label "ccat-root"`, the **CKA_LABEL stored on the card** is `ccat-root`. But OpenSC's PKCS#11 module (`opensc-pkcs11.so`) presents *one PKCS#11 token per PIN type*, and **renames** them. The token surfaced for User-PIN access is named **`ccat-root (UserPIN)`** at the URI layer. `pkcs11-tool --token-label ccat-root` does prefix-matching, so it works without the suffix. `step-cli` does **exact** matching, so its URI must include the full string with spaces and parens percent-encoded: | char | escape | |---|---| | space | `%20` | | `(` | `%28` | | `)` | `%29` | i.e.\ `token=ccat-root%20%28UserPIN%29` — and similarly `token=ccat-intermediate%20%28UserPIN%29` for HSM #2. **Playbook decision:** use `serial=$SERIAL` instead of `token=`. The device serial is recorded on the paper sheet in §4, doesn't change on re-init, and needs no URL encoding. The token form still works for operators who prefer it; `serial=` is just cleaner. ## 2. `step certificate create` URI shape This took multiple iterations. The shape that actually works: ``` step certificate create --profile root-ca --not-after 240960h \ --no-password --insecure \ --kms "pkcs11:module-path=...;serial=$SERIAL?pin-value=$PIN" \ --key "pkcs11:id=01" \ "CCAT Observatory Root CA" \ root_ca.crt ``` Things that **do not work**: - `--key` with the full URI (`pkcs11:token=...;object=...;id=%01`) — step's `getPublicKey` rejects it with "uri not matching." `--key "pkcs11:id=01"` (minimal, just the object id) is the form that resolves. - `pin-source=/path/to/pin-file` in the URI — did not log into the token reliably in our run. `pin-value=$PIN` (with `$PIN` captured via `read -rs PIN` so the literal value never lands in shell history) is what worked. - `--kms-key-id` flag — does not exist in this step-cli version. Earlier docs reference it; ignore. - Three positionals for `create` (subject, crt-file, key-file) when the key already exists on HSM — `` is *output-only*. Use `--key` flag and only two positionals. The smallstep doc example pattern (two positionals + `--kms` + `--key` flags) is what works: ``` step certificate create --kms 'pkcs11:...' --key 'pkcs11:id=NNNN' '' ``` ## 3. `step certificate create` reflex password prompt When step generates a self-signed cert from an HSM key, it still prompts to encrypt the (non-existent) output key file. Suppress with `--no-password --insecure`. The `--insecure` flag is just step's required acknowledgement that `--no-password` was a deliberate choice; it does not weaken the resulting cert. The cert is public PEM either way. ## 4. `step certificate sign` argument order The sign command's positional handling is opposite to `create`: | Command | 3rd positional `` | Existing-key URI goes | |---|---|---| | `step certificate create` | output (write a new key here) | `--key ` flag | | `step certificate sign` | issuing CA's existing private key | 3rd positional, directly | So sign uses three positionals with the URI in slot 3, no `--key` flag at all. But also: **all flags must come BEFORE all positionals** in this step-cli version. With positionals first, step rejected `root_ca.crt` with `scheme is missing` — it was applying URI parsing to the cert positional because a `--kms` flag followed. Adding `file://$PWD/root_ca.crt` as a workaround failed with `scheme not expected`. The actual fix is to reorder so flags come first: ``` step certificate sign \ --profile intermediate-ca \ --not-after 87600h \ --kms "pkcs11:module-path=...;serial=$SERIAL?pin-value=$PIN" \ intermediate.csr \ root_ca.crt \ "pkcs11:id=01" \ > intermediate_ca.crt ``` `root_ca.crt` is a bare relative path, no `file://` prefix. ## 5. `step certificate sign` writes to stdout No output-file password prompt to suppress, so `--no-password --insecure` are not needed (and not used) in §12 — unlike §7/§10. ## 6. SSH CA pubkey export — version-portable conversion `step-kms-plugin key URI` outputs PKCS#8 PEM by default. The `--format ssh` flag was added in later step-kms-plugin releases and was missing from the version on the supplies USB. The portable path: ``` step-kms-plugin key "pkcs11:..." > /tmp/.ca.pem step crypto key format --ssh /tmp/.ca.pem > ca.pub rm /tmp/.ca.pem ``` `step crypto key format --ssh` is part of step-cli (already on the USB). For older step-cli that lacks it, fall back to OpenSSH's `ssh-keygen -i -m PKCS8 -f ` — same result, different binary. The expected output is a single line starting with `ecdsa-sha2-nistp384 AAAA...` for the P-384 keys we use. ## 7. Stick-swap gate procedure The pre-ceremony plan was to leave both HSMs plugged in and rely on token labels to address each. That's brittle because: - Before §5 (init), token labels don't exist yet. - After §5/8 (inits), `sc-hsm-tool --initialize` and `pkcs11-tool` without explicit slot filtering pick whichever HSM `pcscd` enumerated first — operator could initialise the wrong stick as root. - Cross-talk between sticks during a multi-step session is silent. The playbook now uses a **single-stick-at-a-time** discipline. At each phase boundary, a "Gate: only HSM #N plugged in" marker tells the operator to: 1. Unplug whichever HSM is currently in. 2. Plug in **only** HSM #N. 3. Capture the serial: `SERIAL=$(pkcs11-tool ... | awk '/serial num/{print $NF; exit}')` 4. Operator reads `$SERIAL` aloud (serials are non-secret); witness compares against the paper sheet's HSM #N row. 5. Both confirm "HSM #N, serial matches" before continuing. The captured `$SERIAL` shell variable then drives every subsequent URI in the phase via `serial=$SERIAL`, eliminating ambiguity. ## 8. Phase grouping reduces 7 swaps to 2 The pre-ceremony plan was to alternate HSMs for each step: ``` init#1, init#2, keygen#1, keygen#2, cert#1, csr#2, sign#1, ssh-keys#2 ``` That's 7 swaps. Each swap is a chance to plug in the wrong stick or confuse `pcscd`. The dependency graph allows grouping: - **Phase A — HSM #1 (root):** §5 init, §6 keygen, §7 root cert - **Phase B — HSM #2 (intermediate):** §8 init, §9 keygen, §10 CSR, §11 SSH CA keys - **Phase C — HSM #1 (root):** §12 sign intermediate CSR - **Phase D — paper:** §13 fingerprint (no HSM) **2 swaps total.** The playbook is structured around these phases. ## 9. Read-aloud rule scoped to non-secrets only Initial threat model: "read everything aloud so the witness verifies." That leaks PINs, SO-PINs, and the root fingerprint to ambient microphones (phones, smartwatches, voice assistants in the room). Refined rule: | Value | Verbalisation rule | Reason | |---|---|---| | Device serial | **Spoken aloud** at every gate | Non-secret (printed on Nitrokey case); witness needs to hear it to cross-check the right stick is plugged in. | | User PIN | Silent — visual verification only | Audio leak risk; PIN is the User-PIN-counter (3 tries). | | SO PIN | Silent — visual verification only | Same; SO-PIN counter is 15 attempts then card brick. | | Root fingerprint | Silent — visual verification on screen vs paper | Same audio-leak hygiene; even though the fingerprint is *eventually* public, leaking it before distribution gives an attacker the value to substitute. | ## 10. PIN format, length, and retry counters SC-HSM 2 (Nitrokey HSM 2) PIN behaviour, set during `sc-hsm-tool --initialize`: | PIN | Format | Length | Retry counter | After exhaustion | |---|---|---|---|---| | User PIN (`--pin`) | ASCII, case-sensitive | 6–15 chars | 3 attempts (default; configurable via `--pin-retries`) | User PIN blocked; SO-PIN can unblock | | SO PIN (`--so-pin`) | hex, case-insensitive on the wire | exactly 16 hex chars (= 8 bytes) | 15 attempts | Device permanently bricked | User PIN content discipline (per the PIN sheet's expanded guidance): - **Allowed:** `A-Z a-z 0-9 - _ .` - **Avoid:** shell metacharacters; layout-dependent symbols (`@ ^ ~ : / + =` differ between US and DE keyboards on the Live USB); visually-ambiguous glyphs (`l 1 I`, `O 0`, `B 8`). SO PIN: generate with `openssl rand -hex 8`. Never use the sc-hsm-tool factory default `3537363231383830`. ## 11. PKCS#11 slot indices are non-stable The original §4 had operators record `slot: ____` alongside `serial: ____` on the paper sheet. Slot indices re-enumerate on every plug-in event, so they have **no stable meaning** across the ceremony — let alone across years of safe storage. Slots are not recorded on the sheet; only the serial is. ## 12. Single USB at install vs three USBs at ceremony The supplies USB ships pre-built (this is the one with `MANIFEST.sha256` and the air-gap manifest-verification model). The boot USB is a vanilla Ubuntu LTS Live image. The export USB is empty until the ceremony's §14, when it receives the public artefacts (cert + SSH CA pubkeys + fingerprint). Three USBs total. The original plan conflated the supplies and export USBs into one; the actual model needs them separate so the supplies USB stays sealed for re-verification at the *next* ceremony, and the export USB is the only stick that touches a networked machine afterwards. ## What the playbook does not yet capture A short list of things that surfaced during the ceremony but didn't warrant a procedure change: - The Nitrokey docs page on the supplies-USB shows token URI in `UserPIN (label)` form (PIN type first), but the actual OpenSC rendering on Ubuntu 24.04 is `