Lessons learned — root ceremony 2026-04-29#

This document captures the surprises and corrections from CCAT’s first HSM root ceremony. The playbook has been updated to bake these in, so a future ceremony following the playbook should not re-encounter them. This page exists so future operators understand the why behind the playbook’s odd-looking choices — and so a future incident-responder reading the audit trail knows what to expect.

Run details:

  • Date: 2026-04-29

  • Operators: two-person rule, ceremony witnessed

  • HSMs: Nitrokey HSM 2 ×2 (serials on the paper sheet in the safe)

  • Software pinned: step-cli, step-kms-plugin, opensc — versions in the supplies-USB VERSIONS.txt

  • Live media: Ubuntu 24.04 LTS Desktop, RAM-only

1. OpenSC virtual-token quirk: the label gets renamed#

When you run sc-hsm-tool --initialize --label "ccat-root", the CKA_LABEL stored on the card is ccat-root. But OpenSC’s PKCS#11 module (opensc-pkcs11.so) presents one PKCS#11 token per PIN type, and renames them. The token surfaced for User-PIN access is named ccat-root (UserPIN) at the URI layer.

pkcs11-tool --token-label ccat-root does prefix-matching, so it works without the suffix. step-cli does exact matching, so its URI must include the full string with spaces and parens percent-encoded:

char

escape

space

%20

(

%28

)

%29

i.e.\ token=ccat-root%20%28UserPIN%29 — and similarly token=ccat-intermediate%20%28UserPIN%29 for HSM #2.

Playbook decision: use serial=$SERIAL instead of token=. The device serial is recorded on the paper sheet in §4, doesn’t change on re-init, and needs no URL encoding. The token form still works for operators who prefer it; serial= is just cleaner.

2. step certificate create URI shape#

This took multiple iterations. The shape that actually works:

step certificate create --profile root-ca --not-after 240960h \
    --no-password --insecure \
    --kms "pkcs11:module-path=...;serial=$SERIAL?pin-value=$PIN" \
    --key "pkcs11:id=01" \
    "CCAT Observatory Root CA" \
    root_ca.crt

Things that do not work:

  • --key with the full URI (pkcs11:token=...;object=...;id=%01) — step’s getPublicKey rejects it with “uri not matching.” --key "pkcs11:id=01" (minimal, just the object id) is the form that resolves.

  • pin-source=/path/to/pin-file in the URI — did not log into the token reliably in our run. pin-value=$PIN (with $PIN captured via read -rs PIN so the literal value never lands in shell history) is what worked.

  • --kms-key-id flag — does not exist in this step-cli version. Earlier docs reference it; ignore.

  • Three positionals for create (subject, crt-file, key-file) when the key already exists on HSM — <key-file> is output-only. Use --key flag and only two positionals.

The smallstep doc example pattern (two positionals + --kms + --key flags) is what works:

step certificate create --kms 'pkcs11:...' --key 'pkcs11:id=NNNN' '<subject>' <out-file>

3. step certificate create reflex password prompt#

When step generates a self-signed cert from an HSM key, it still prompts to encrypt the (non-existent) output key file. Suppress with --no-password --insecure. The --insecure flag is just step’s required acknowledgement that --no-password was a deliberate choice; it does not weaken the resulting cert. The cert is public PEM either way.

4. step certificate sign argument order#

The sign command’s positional handling is opposite to create:

Command

3rd positional <key-file>

Existing-key URI goes

step certificate create

output (write a new key here)

--key <URI> flag

step certificate sign

issuing CA’s existing private key

3rd positional, directly

So sign uses three positionals with the URI in slot 3, no --key flag at all.

But also: all flags must come BEFORE all positionals in this step-cli version. With positionals first, step rejected root_ca.crt with scheme is missing — it was applying URI parsing to the cert positional because a --kms flag followed.

Adding file://$PWD/root_ca.crt as a workaround failed with scheme not expected. The actual fix is to reorder so flags come first:

step certificate sign \
    --profile intermediate-ca \
    --not-after 87600h \
    --kms "pkcs11:module-path=...;serial=$SERIAL?pin-value=$PIN" \
    intermediate.csr \
    root_ca.crt \
    "pkcs11:id=01" \
    > intermediate_ca.crt

root_ca.crt is a bare relative path, no file:// prefix.

5. step certificate sign writes to stdout#

No output-file password prompt to suppress, so --no-password --insecure are not needed (and not used) in §12 — unlike §7/§10.

6. SSH CA pubkey export — version-portable conversion#

step-kms-plugin key URI outputs PKCS#8 PEM by default. The --format ssh flag was added in later step-kms-plugin releases and was missing from the version on the supplies USB. The portable path:

step-kms-plugin key "pkcs11:..." > /tmp/.ca.pem
step crypto key format --ssh /tmp/.ca.pem > ca.pub
rm /tmp/.ca.pem

step crypto key format --ssh is part of step-cli (already on the USB). For older step-cli that lacks it, fall back to OpenSSH’s ssh-keygen -i -m PKCS8 -f <pem> — same result, different binary.

The expected output is a single line starting with ecdsa-sha2-nistp384 AAAA... for the P-384 keys we use.

7. Stick-swap gate procedure#

The pre-ceremony plan was to leave both HSMs plugged in and rely on token labels to address each. That’s brittle because:

  • Before §5 (init), token labels don’t exist yet.

  • After §5/8 (inits), sc-hsm-tool --initialize and pkcs11-tool without explicit slot filtering pick whichever HSM pcscd enumerated first — operator could initialise the wrong stick as root.

  • Cross-talk between sticks during a multi-step session is silent.

The playbook now uses a single-stick-at-a-time discipline. At each phase boundary, a “Gate: only HSM #N plugged in” marker tells the operator to:

  1. Unplug whichever HSM is currently in.

  2. Plug in only HSM #N.

  3. Capture the serial: SERIAL=$(pkcs11-tool ... | awk '/serial num/{print $NF; exit}')

  4. Operator reads $SERIAL aloud (serials are non-secret); witness compares against the paper sheet’s HSM #N row.

  5. Both confirm “HSM #N, serial matches” before continuing.

The captured $SERIAL shell variable then drives every subsequent URI in the phase via serial=$SERIAL, eliminating ambiguity.

8. Phase grouping reduces 7 swaps to 2#

The pre-ceremony plan was to alternate HSMs for each step:

init#1, init#2, keygen#1, keygen#2, cert#1, csr#2, sign#1, ssh-keys#2

That’s 7 swaps. Each swap is a chance to plug in the wrong stick or confuse pcscd. The dependency graph allows grouping:

  • Phase A — HSM #1 (root): §5 init, §6 keygen, §7 root cert

  • Phase B — HSM #2 (intermediate): §8 init, §9 keygen, §10 CSR, §11 SSH CA keys

  • Phase C — HSM #1 (root): §12 sign intermediate CSR

  • Phase D — paper: §13 fingerprint (no HSM)

2 swaps total. The playbook is structured around these phases.

9. Read-aloud rule scoped to non-secrets only#

Initial threat model: “read everything aloud so the witness verifies.” That leaks PINs, SO-PINs, and the root fingerprint to ambient microphones (phones, smartwatches, voice assistants in the room).

Refined rule:

Value

Verbalisation rule

Reason

Device serial

Spoken aloud at every gate

Non-secret (printed on Nitrokey case); witness needs to hear it to cross-check the right stick is plugged in.

User PIN

Silent — visual verification only

Audio leak risk; PIN is the User-PIN-counter (3 tries).

SO PIN

Silent — visual verification only

Same; SO-PIN counter is 15 attempts then card brick.

Root fingerprint

Silent — visual verification on screen vs paper

Same audio-leak hygiene; even though the fingerprint is eventually public, leaking it before distribution gives an attacker the value to substitute.

10. PIN format, length, and retry counters#

SC-HSM 2 (Nitrokey HSM 2) PIN behaviour, set during sc-hsm-tool --initialize:

PIN

Format

Length

Retry counter

After exhaustion

User PIN (--pin)

ASCII, case-sensitive

6–15 chars

3 attempts (default; configurable via --pin-retries)

User PIN blocked; SO-PIN can unblock

SO PIN (--so-pin)

hex, case-insensitive on the wire

exactly 16 hex chars (= 8 bytes)

15 attempts

Device permanently bricked

User PIN content discipline (per the PIN sheet’s expanded guidance):

  • Allowed: A-Z a-z 0-9 - _ .

  • Avoid: shell metacharacters; layout-dependent symbols (@ ^ ~ : / + = differ between US and DE keyboards on the Live USB); visually-ambiguous glyphs (l 1 I, O 0, B 8).

SO PIN: generate with openssl rand -hex 8. Never use the sc-hsm-tool factory default 3537363231383830.

11. PKCS#11 slot indices are non-stable#

The original §4 had operators record slot: ____ alongside serial: ____ on the paper sheet. Slot indices re-enumerate on every plug-in event, so they have no stable meaning across the ceremony — let alone across years of safe storage. Slots are not recorded on the sheet; only the serial is.

12. Single USB at install vs three USBs at ceremony#

The supplies USB ships pre-built (this is the one with MANIFEST.sha256 and the air-gap manifest-verification model). The boot USB is a vanilla Ubuntu LTS Live image. The export USB is empty until the ceremony’s §14, when it receives the public artefacts (cert + SSH CA pubkeys + fingerprint).

Three USBs total. The original plan conflated the supplies and export USBs into one; the actual model needs them separate so the supplies USB stays sealed for re-verification at the next ceremony, and the export USB is the only stick that touches a networked machine afterwards.

What the playbook does not yet capture#

A short list of things that surfaced during the ceremony but didn’t warrant a procedure change:

  • The Nitrokey docs page on the supplies-USB shows token URI in UserPIN (label) form (PIN type first), but the actual OpenSC rendering on Ubuntu 24.04 is <label> (UserPIN) (label first). The Nitrokey doc is wrong / out-of-date for this driver.

  • gnutls-bin (provides p11tool) is not on the original supplies USB. prepare-ceremony-usb.sh now installs it; the 2026-04-29 ceremony got by without it.

  • libengine-pkcs11-openssl was not on the supplies USB but added to the script as the OpenSSL-engine fallback if step-cli ever fully fails to drive PKCS#11 in a future ceremony.