Lessons learned — root ceremony 2026-04-29#
This document captures the surprises and corrections from CCAT’s first HSM root ceremony. The playbook has been updated to bake these in, so a future ceremony following the playbook should not re-encounter them. This page exists so future operators understand the why behind the playbook’s odd-looking choices — and so a future incident-responder reading the audit trail knows what to expect.
Run details:
Date: 2026-04-29
Operators: two-person rule, ceremony witnessed
HSMs: Nitrokey HSM 2 ×2 (serials on the paper sheet in the safe)
Software pinned: step-cli, step-kms-plugin, opensc — versions in the supplies-USB
VERSIONS.txtLive media: Ubuntu 24.04 LTS Desktop, RAM-only
1. OpenSC virtual-token quirk: the label gets renamed#
When you run sc-hsm-tool --initialize --label "ccat-root", the
CKA_LABEL stored on the card is ccat-root. But OpenSC’s PKCS#11
module (opensc-pkcs11.so) presents one PKCS#11 token per PIN type,
and renames them. The token surfaced for User-PIN access is named
ccat-root (UserPIN) at the URI layer.
pkcs11-tool --token-label ccat-root does prefix-matching, so it
works without the suffix. step-cli does exact matching, so its
URI must include the full string with spaces and parens
percent-encoded:
char |
escape |
|---|---|
space |
|
|
|
|
|
i.e.\ token=ccat-root%20%28UserPIN%29 — and similarly
token=ccat-intermediate%20%28UserPIN%29 for HSM #2.
Playbook decision: use serial=$SERIAL instead of token=. The
device serial is recorded on the paper sheet in §4, doesn’t change on
re-init, and needs no URL encoding. The token form still works for
operators who prefer it; serial= is just cleaner.
2. step certificate create URI shape#
This took multiple iterations. The shape that actually works:
step certificate create --profile root-ca --not-after 240960h \
--no-password --insecure \
--kms "pkcs11:module-path=...;serial=$SERIAL?pin-value=$PIN" \
--key "pkcs11:id=01" \
"CCAT Observatory Root CA" \
root_ca.crt
Things that do not work:
--keywith the full URI (pkcs11:token=...;object=...;id=%01) — step’sgetPublicKeyrejects it with “uri not matching.”--key "pkcs11:id=01"(minimal, just the object id) is the form that resolves.pin-source=/path/to/pin-filein the URI — did not log into the token reliably in our run.pin-value=$PIN(with$PINcaptured viaread -rs PINso the literal value never lands in shell history) is what worked.--kms-key-idflag — does not exist in this step-cli version. Earlier docs reference it; ignore.Three positionals for
create(subject, crt-file, key-file) when the key already exists on HSM —<key-file>is output-only. Use--keyflag and only two positionals.
The smallstep doc example pattern (two positionals + --kms + --key
flags) is what works:
step certificate create --kms 'pkcs11:...' --key 'pkcs11:id=NNNN' '<subject>' <out-file>
3. step certificate create reflex password prompt#
When step generates a self-signed cert from an HSM key, it still
prompts to encrypt the (non-existent) output key file. Suppress with
--no-password --insecure. The --insecure flag is just step’s
required acknowledgement that --no-password was a deliberate choice;
it does not weaken the resulting cert. The cert is public PEM either
way.
4. step certificate sign argument order#
The sign command’s positional handling is opposite to create:
Command |
3rd positional |
Existing-key URI goes |
|---|---|---|
|
output (write a new key here) |
|
|
issuing CA’s existing private key |
3rd positional, directly |
So sign uses three positionals with the URI in slot 3, no --key
flag at all.
But also: all flags must come BEFORE all positionals in this
step-cli version. With positionals first, step rejected
root_ca.crt with scheme is missing — it was applying URI parsing
to the cert positional because a --kms flag followed.
Adding file://$PWD/root_ca.crt as a workaround failed with scheme not expected. The actual fix is to reorder so flags come first:
step certificate sign \
--profile intermediate-ca \
--not-after 87600h \
--kms "pkcs11:module-path=...;serial=$SERIAL?pin-value=$PIN" \
intermediate.csr \
root_ca.crt \
"pkcs11:id=01" \
> intermediate_ca.crt
root_ca.crt is a bare relative path, no file:// prefix.
5. step certificate sign writes to stdout#
No output-file password prompt to suppress, so --no-password --insecure are not needed (and not used) in §12 — unlike §7/§10.
6. SSH CA pubkey export — version-portable conversion#
step-kms-plugin key URI outputs PKCS#8 PEM by default. The
--format ssh flag was added in later step-kms-plugin releases and
was missing from the version on the supplies USB. The portable path:
step-kms-plugin key "pkcs11:..." > /tmp/.ca.pem
step crypto key format --ssh /tmp/.ca.pem > ca.pub
rm /tmp/.ca.pem
step crypto key format --ssh is part of step-cli (already on the
USB). For older step-cli that lacks it, fall back to OpenSSH’s
ssh-keygen -i -m PKCS8 -f <pem> — same result, different binary.
The expected output is a single line starting with
ecdsa-sha2-nistp384 AAAA... for the P-384 keys we use.
7. Stick-swap gate procedure#
The pre-ceremony plan was to leave both HSMs plugged in and rely on token labels to address each. That’s brittle because:
Before §5 (init), token labels don’t exist yet.
After §5/8 (inits),
sc-hsm-tool --initializeandpkcs11-toolwithout explicit slot filtering pick whichever HSMpcscdenumerated first — operator could initialise the wrong stick as root.Cross-talk between sticks during a multi-step session is silent.
The playbook now uses a single-stick-at-a-time discipline. At each phase boundary, a “Gate: only HSM #N plugged in” marker tells the operator to:
Unplug whichever HSM is currently in.
Plug in only HSM #N.
Capture the serial:
SERIAL=$(pkcs11-tool ... | awk '/serial num/{print $NF; exit}')Operator reads
$SERIALaloud (serials are non-secret); witness compares against the paper sheet’s HSM #N row.Both confirm “HSM #N, serial matches” before continuing.
The captured $SERIAL shell variable then drives every subsequent
URI in the phase via serial=$SERIAL, eliminating ambiguity.
8. Phase grouping reduces 7 swaps to 2#
The pre-ceremony plan was to alternate HSMs for each step:
init#1, init#2, keygen#1, keygen#2, cert#1, csr#2, sign#1, ssh-keys#2
That’s 7 swaps. Each swap is a chance to plug in the wrong stick or
confuse pcscd. The dependency graph allows grouping:
Phase A — HSM #1 (root): §5 init, §6 keygen, §7 root cert
Phase B — HSM #2 (intermediate): §8 init, §9 keygen, §10 CSR, §11 SSH CA keys
Phase C — HSM #1 (root): §12 sign intermediate CSR
Phase D — paper: §13 fingerprint (no HSM)
2 swaps total. The playbook is structured around these phases.
9. Read-aloud rule scoped to non-secrets only#
Initial threat model: “read everything aloud so the witness verifies.” That leaks PINs, SO-PINs, and the root fingerprint to ambient microphones (phones, smartwatches, voice assistants in the room).
Refined rule:
Value |
Verbalisation rule |
Reason |
|---|---|---|
Device serial |
Spoken aloud at every gate |
Non-secret (printed on Nitrokey case); witness needs to hear it to cross-check the right stick is plugged in. |
User PIN |
Silent — visual verification only |
Audio leak risk; PIN is the User-PIN-counter (3 tries). |
SO PIN |
Silent — visual verification only |
Same; SO-PIN counter is 15 attempts then card brick. |
Root fingerprint |
Silent — visual verification on screen vs paper |
Same audio-leak hygiene; even though the fingerprint is eventually public, leaking it before distribution gives an attacker the value to substitute. |
10. PIN format, length, and retry counters#
SC-HSM 2 (Nitrokey HSM 2) PIN behaviour, set during
sc-hsm-tool --initialize:
PIN |
Format |
Length |
Retry counter |
After exhaustion |
|---|---|---|---|---|
User PIN ( |
ASCII, case-sensitive |
6–15 chars |
3 attempts (default; configurable via |
User PIN blocked; SO-PIN can unblock |
SO PIN ( |
hex, case-insensitive on the wire |
exactly 16 hex chars (= 8 bytes) |
15 attempts |
Device permanently bricked |
User PIN content discipline (per the PIN sheet’s expanded guidance):
Allowed:
A-Z a-z 0-9 - _ .Avoid: shell metacharacters; layout-dependent symbols (
@ ^ ~ : / + =differ between US and DE keyboards on the Live USB); visually-ambiguous glyphs (l 1 I,O 0,B 8).
SO PIN: generate with openssl rand -hex 8. Never use the sc-hsm-tool
factory default 3537363231383830.
11. PKCS#11 slot indices are non-stable#
The original §4 had operators record slot: ____ alongside serial: ____ on the paper sheet. Slot indices re-enumerate on every plug-in
event, so they have no stable meaning across the ceremony — let
alone across years of safe storage. Slots are not recorded on the
sheet; only the serial is.
12. Single USB at install vs three USBs at ceremony#
The supplies USB ships pre-built (this is the one with MANIFEST.sha256
and the air-gap manifest-verification model). The boot USB is a
vanilla Ubuntu LTS Live image. The export USB is empty until the
ceremony’s §14, when it receives the public artefacts (cert + SSH CA
pubkeys + fingerprint).
Three USBs total. The original plan conflated the supplies and export USBs into one; the actual model needs them separate so the supplies USB stays sealed for re-verification at the next ceremony, and the export USB is the only stick that touches a networked machine afterwards.
What the playbook does not yet capture#
A short list of things that surfaced during the ceremony but didn’t warrant a procedure change:
The Nitrokey docs page on the supplies-USB shows token URI in
UserPIN (label)form (PIN type first), but the actual OpenSC rendering on Ubuntu 24.04 is<label> (UserPIN)(label first). The Nitrokey doc is wrong / out-of-date for this driver.gnutls-bin(providesp11tool) is not on the original supplies USB.prepare-ceremony-usb.shnow installs it; the 2026-04-29 ceremony got by without it.libengine-pkcs11-opensslwas not on the supplies USB but added to the script as the OpenSSL-engine fallback if step-cli ever fully fails to drive PKCS#11 in a future ceremony.