# Lessons learned — root ceremony 2026-04-29

This document captures the surprises and corrections from CCAT's first
HSM root ceremony. The [playbook](playbook.md) has been updated to bake
these in, so a future ceremony following the playbook should not
re-encounter them. This page exists so future operators understand the
*why* behind the playbook's odd-looking choices — and so a future
incident-responder reading the audit trail knows what to expect.

Run details:
- **Date:** 2026-04-29
- **Operators:** two-person rule, ceremony witnessed
- **HSMs:** Nitrokey HSM 2 ×2 (serials on the paper sheet in the safe)
- **Software pinned:** step-cli, step-kms-plugin, opensc — versions in the supplies-USB `VERSIONS.txt`
- **Live media:** Ubuntu 24.04 LTS Desktop, RAM-only

## 1. OpenSC virtual-token quirk: the label gets renamed

When you run `sc-hsm-tool --initialize --label "ccat-root"`, the
**CKA_LABEL stored on the card** is `ccat-root`. But OpenSC's PKCS#11
module (`opensc-pkcs11.so`) presents *one PKCS#11 token per PIN type*,
and **renames** them. The token surfaced for User-PIN access is named
**`ccat-root (UserPIN)`** at the URI layer.

`pkcs11-tool --token-label ccat-root` does prefix-matching, so it
works without the suffix. `step-cli` does **exact** matching, so its
URI must include the full string with spaces and parens
percent-encoded:

| char | escape |
|---|---|
| space | `%20` |
| `(` | `%28` |
| `)` | `%29` |

i.e.\ `token=ccat-root%20%28UserPIN%29` — and similarly
`token=ccat-intermediate%20%28UserPIN%29` for HSM #2.

**Playbook decision:** use `serial=$SERIAL` instead of `token=`. The
device serial is recorded on the paper sheet in §4, doesn't change on
re-init, and needs no URL encoding. The token form still works for
operators who prefer it; `serial=` is just cleaner.

## 2. `step certificate create` URI shape

This took multiple iterations. The shape that actually works:

```
step certificate create --profile root-ca --not-after 240960h \
    --no-password --insecure \
    --kms "pkcs11:module-path=...;serial=$SERIAL?pin-value=$PIN" \
    --key "pkcs11:id=01" \
    "CCAT Observatory Root CA" \
    root_ca.crt
```

Things that **do not work**:

- `--key` with the full URI (`pkcs11:token=...;object=...;id=%01`) — step's `getPublicKey` rejects it with "uri not matching." `--key "pkcs11:id=01"` (minimal, just the object id) is the form that resolves.
- `pin-source=/path/to/pin-file` in the URI — did not log into the token reliably in our run. `pin-value=$PIN` (with `$PIN` captured via `read -rs PIN` so the literal value never lands in shell history) is what worked.
- `--kms-key-id` flag — does not exist in this step-cli version. Earlier docs reference it; ignore.
- Three positionals for `create` (subject, crt-file, key-file) when the key already exists on HSM — `<key-file>` is *output-only*. Use `--key` flag and only two positionals.

The smallstep doc example pattern (two positionals + `--kms` + `--key`
flags) is what works:

```
step certificate create --kms 'pkcs11:...' --key 'pkcs11:id=NNNN' '<subject>' <out-file>
```

## 3. `step certificate create` reflex password prompt

When step generates a self-signed cert from an HSM key, it still
prompts to encrypt the (non-existent) output key file. Suppress with
`--no-password --insecure`. The `--insecure` flag is just step's
required acknowledgement that `--no-password` was a deliberate choice;
it does not weaken the resulting cert. The cert is public PEM either
way.

## 4. `step certificate sign` argument order

The sign command's positional handling is opposite to `create`:

| Command | 3rd positional `<key-file>` | Existing-key URI goes |
|---|---|---|
| `step certificate create` | output (write a new key here) | `--key <URI>` flag |
| `step certificate sign` | issuing CA's existing private key | 3rd positional, directly |

So sign uses three positionals with the URI in slot 3, no `--key`
flag at all.

But also: **all flags must come BEFORE all positionals** in this
step-cli version. With positionals first, step rejected
`root_ca.crt` with `scheme is missing` — it was applying URI parsing
to the cert positional because a `--kms` flag followed.

Adding `file://$PWD/root_ca.crt` as a workaround failed with `scheme
not expected`. The actual fix is to reorder so flags come first:

```
step certificate sign \
    --profile intermediate-ca \
    --not-after 87600h \
    --kms "pkcs11:module-path=...;serial=$SERIAL?pin-value=$PIN" \
    intermediate.csr \
    root_ca.crt \
    "pkcs11:id=01" \
    > intermediate_ca.crt
```

`root_ca.crt` is a bare relative path, no `file://` prefix.

## 5. `step certificate sign` writes to stdout

No output-file password prompt to suppress, so `--no-password
--insecure` are not needed (and not used) in §12 — unlike §7/§10.

## 6. SSH CA pubkey export — version-portable conversion

`step-kms-plugin key URI` outputs PKCS#8 PEM by default. The
`--format ssh` flag was added in later step-kms-plugin releases and
was missing from the version on the supplies USB. The portable path:

```
step-kms-plugin key "pkcs11:..." > /tmp/.ca.pem
step crypto key format --ssh /tmp/.ca.pem > ca.pub
rm /tmp/.ca.pem
```

`step crypto key format --ssh` is part of step-cli (already on the
USB). For older step-cli that lacks it, fall back to OpenSSH's
`ssh-keygen -i -m PKCS8 -f <pem>` — same result, different binary.

The expected output is a single line starting with
`ecdsa-sha2-nistp384 AAAA...` for the P-384 keys we use.

## 7. Stick-swap gate procedure

The pre-ceremony plan was to leave both HSMs plugged in and rely on
token labels to address each. That's brittle because:

- Before §5 (init), token labels don't exist yet.
- After §5/8 (inits), `sc-hsm-tool --initialize` and `pkcs11-tool`
  without explicit slot filtering pick whichever HSM `pcscd` enumerated
  first — operator could initialise the wrong stick as root.
- Cross-talk between sticks during a multi-step session is silent.

The playbook now uses a **single-stick-at-a-time** discipline. At each
phase boundary, a "Gate: only HSM #N plugged in" marker tells the
operator to:

1. Unplug whichever HSM is currently in.
2. Plug in **only** HSM #N.
3. Capture the serial: `SERIAL=$(pkcs11-tool ... | awk '/serial num/{print $NF; exit}')`
4. Operator reads `$SERIAL` aloud (serials are non-secret); witness
   compares against the paper sheet's HSM #N row.
5. Both confirm "HSM #N, serial matches" before continuing.

The captured `$SERIAL` shell variable then drives every subsequent
URI in the phase via `serial=$SERIAL`, eliminating ambiguity.

## 8. Phase grouping reduces 7 swaps to 2

The pre-ceremony plan was to alternate HSMs for each step:

```
init#1, init#2, keygen#1, keygen#2, cert#1, csr#2, sign#1, ssh-keys#2
```

That's 7 swaps. Each swap is a chance to plug in the wrong stick or
confuse `pcscd`. The dependency graph allows grouping:

- **Phase A — HSM #1 (root):** §5 init, §6 keygen, §7 root cert
- **Phase B — HSM #2 (intermediate):** §8 init, §9 keygen, §10 CSR, §11 SSH CA keys
- **Phase C — HSM #1 (root):** §12 sign intermediate CSR
- **Phase D — paper:** §13 fingerprint (no HSM)

**2 swaps total.** The playbook is structured around these phases.

## 9. Read-aloud rule scoped to non-secrets only

Initial threat model: "read everything aloud so the witness verifies."
That leaks PINs, SO-PINs, and the root fingerprint to ambient
microphones (phones, smartwatches, voice assistants in the room).

Refined rule:

| Value | Verbalisation rule | Reason |
|---|---|---|
| Device serial | **Spoken aloud** at every gate | Non-secret (printed on Nitrokey case); witness needs to hear it to cross-check the right stick is plugged in. |
| User PIN | Silent — visual verification only | Audio leak risk; PIN is the User-PIN-counter (3 tries). |
| SO PIN | Silent — visual verification only | Same; SO-PIN counter is 15 attempts then card brick. |
| Root fingerprint | Silent — visual verification on screen vs paper | Same audio-leak hygiene; even though the fingerprint is *eventually* public, leaking it before distribution gives an attacker the value to substitute. |

## 10. PIN format, length, and retry counters

SC-HSM 2 (Nitrokey HSM 2) PIN behaviour, set during
`sc-hsm-tool --initialize`:

| PIN | Format | Length | Retry counter | After exhaustion |
|---|---|---|---|---|
| User PIN (`--pin`) | ASCII, case-sensitive | 6–15 chars | 3 attempts (default; configurable via `--pin-retries`) | User PIN blocked; SO-PIN can unblock |
| SO PIN (`--so-pin`) | hex, case-insensitive on the wire | exactly 16 hex chars (= 8 bytes) | 15 attempts | Device permanently bricked |

User PIN content discipline (per the PIN sheet's expanded guidance):
- **Allowed:** `A-Z a-z 0-9 - _ .`
- **Avoid:** shell metacharacters; layout-dependent symbols
  (`@ ^ ~ : / + =` differ between US and DE keyboards on the Live USB);
  visually-ambiguous glyphs (`l 1 I`, `O 0`, `B 8`).

SO PIN: generate with `openssl rand -hex 8`. Never use the sc-hsm-tool
factory default `3537363231383830`.

## 11. PKCS#11 slot indices are non-stable

The original §4 had operators record `slot: ____` alongside `serial:
____` on the paper sheet. Slot indices re-enumerate on every plug-in
event, so they have **no stable meaning** across the ceremony — let
alone across years of safe storage. Slots are not recorded on the
sheet; only the serial is.

## 12. Single USB at install vs three USBs at ceremony

The supplies USB ships pre-built (this is the one with `MANIFEST.sha256`
and the air-gap manifest-verification model). The boot USB is a
vanilla Ubuntu LTS Live image. The export USB is empty until the
ceremony's §14, when it receives the public artefacts (cert + SSH CA
pubkeys + fingerprint).

Three USBs total. The original plan conflated the supplies and export
USBs into one; the actual model needs them separate so the supplies
USB stays sealed for re-verification at the *next* ceremony, and the
export USB is the only stick that touches a networked machine
afterwards.

## What the playbook does not yet capture

A short list of things that surfaced during the ceremony but didn't
warrant a procedure change:

- The Nitrokey docs page on the supplies-USB shows token URI in
  `UserPIN (label)` form (PIN type first), but the actual OpenSC
  rendering on Ubuntu 24.04 is `<label> (UserPIN)` (label first). The
  Nitrokey doc is wrong / out-of-date for this driver.
- `gnutls-bin` (provides `p11tool`) is not on the original supplies
  USB. `prepare-ceremony-usb.sh` now installs it; the 2026-04-29
  ceremony got by without it.
- `libengine-pkcs11-openssl` was not on the supplies USB but added to
  the script as the OpenSSL-engine fallback if step-cli ever fully
  fails to drive PKCS#11 in a future ceremony.