Skip to content

[Guardian] Add rotate_kps: ceremony-mode KP rotation#592

Open
mskd12 wants to merge 3 commits into
mainfrom
deepak/iop-85-kp-rotation-v2
Open

[Guardian] Add rotate_kps: ceremony-mode KP rotation#592
mskd12 wants to merge 3 commits into
mainfrom
deepak/iop-85-kp-rotation-v2

Conversation

@mskd12
Copy link
Copy Markdown
Contributor

@mskd12 mskd12 commented May 22, 2026

Summary

Adds the ceremony-mode rotate_kps gRPC handler that rotates the KP set holding the existing BTC key, without changing the key itself.

Each current KP submits a RotateKpsRequest carrying their HPKE-encrypted old share plus the desired new KP set (OpenPGP certs + new N/T). Once T submissions arrive with matching state, the enclave reconstructs the BTC key in memory, re-splits it with fresh randomness, encrypts the new shares to the new KPs' OpenPGP certs, and writes a CeremonyLogMessage at sharing_seq = prev + 1 to ceremony/ in S3. The enclave never persists the reconstructed key. Asymmetric (old_n, old_t)(new_n, new_t) is supported.

Built on the OpenPGP/YubiKey share model (KPs hold shares on YubiKeys): new shares are delivered as ASCII-armored PGP messages, same as setup_new_key.

Notable design choices

  • One ceremony per enclave instance. setup_new_key (genesis) and rotate_kps (rotation) are the two ceremony-mode operations, and an enclave performs exactly one. A shared ceremony_complete guard (generalized from setup_new_key's lock) is held across each flow: the two can't interleave, and once a rotation finalizes, the remaining old KPs' submissions are rejected rather than re-splitting the key again at the same sharing_seq. On completion the accumulated shares are cleared from the scratchpad.
  • Current-side state is operator-supplied at operator_init (commitments, N, T, sharing_seq via SecretSharingInstance). KPs don't resubmit it in RotateKpsRequest; the enclave reads its stored instance. KPs verify off-enclave (S3 + GuardianInfo) that the operator gave the right state — that pre-flight lives in the KP CLI (deferred, see below).
  • RotateKpsState has private fields + a validating new(): enforces certs.len() == new_num_shares, valid (N, T), and distinct certs. It also sorts the new cert set into a canonical order, so the state digest — which all T old KPs must agree on, and which is bound as HPKE AAD on the encrypted old share — is independent of the order each KP lists the certs in.
  • combine_shares returns k256::SecretKey, with a k256_sk_to_btc_keypair helper for the BTC callers.
  • ShareCommitments::verify_share method and Enclave::check_or_set_state_hash consolidate logic shared with provisioner_init.
  • "Ceremony" naming. The secret-sharing log is now CeremonyLogMessage / LogMessage::Ceremony, written to ceremony/ in S3; "setup mode" is now "ceremony mode", including the env var SETUP_MODECEREMONY_MODE. (SecretSharingInstance, the scheme, keeps its name.)

⚠️ Deployment: the SETUP_MODECEREMONY_MODE env-var rename must be applied to any infra/runbook that sets it, or a ceremony enclave silently boots in normal mode.

Testing

  • rotate.rs flow tests (against a mock S3 logger that captures every PutObject): happy path (symmetric + asymmetric N/T), duplicate share id, state-hash mismatch (panics), share not matching commitments, before operator_init, and rejection of submissions after the rotation completes. Happy-path assertions check the captured ceremony/ log — sharing_seq=1, new N/T, one PGP-armored share per new KP.
  • RotateKpsState::new unit tests (wrong cert count, duplicate certs, digest order-independence) live in hashi-types next to the struct.
  • TODO: strengthen the happy-path tests (and setup_new_key's) to decrypt the armored shares with the new KPs' PGP keys and verify they reconstruct the original BTC key — needs a PGP-decrypt test helper.

Follow-ups

  • KP CLI that builds + submits RotateKpsRequest (off-enclave pre-flight checks). [IOP-409]
  • Don't persist encrypted shares in S3 — return to the operator or serve via get_guardian_info.

Test plan

  • hashi-guardian + hashi-types unit tests pass locally
  • CI green

🤖 Generated with Claude Code

@mskd12 mskd12 requested a review from bmwill as a code owner May 22, 2026 16:51
@mskd12 mskd12 requested review from benr-ml and dlukegordon May 22, 2026 16:57
@mskd12 mskd12 force-pushed the deepak/iop-85-kp-rotation-v2 branch from 964eb7a to 2505ecb Compare May 22, 2026 16:58
mskd12 added a commit that referenced this pull request May 22, 2026
Prep for KP rotation. No new RPC surface.
  
  Moves the secret-sharing scheme from compile-time constants
  (`THRESHOLD = 3`, `NUM_OF_SHARES = 5`) to runtime-configurable
  operator-supplied state. N and T now flow through
  `OperatorInitRequest.secret_sharing_config`, bundled with the
  share commitments and a new `sharing_seq` (versions instances:
  setup writes 0, future rotations append prev+1).

  Adds a new flat `secret_sharing/` S3 log stream — entries are
  `SecretSharingLogMessage { encrypted_shares, secret_sharing_config }`
  written by `setup_new_key` today and by `rotate_kps` in the
  follow-up. KPs read the lex-last entry to learn the current
  authoritative scheme and fetch their encrypted shares.

  `GuardianInfo` returns the full `SecretSharingConfig` so KPs can
  cross-check the enclave's stored state against S3 off-enclave.
  Crypto tests parameterized over (2,2), (3,2), (5,3), (10,7).
  Object locks unified at 7 days; session IDs truncated to 16
  hex chars in S3 keys.

  Stacked PR #592 adds `rotate_kps` on top.
Base automatically changed from deepak/iop-85-kp-prep to main May 22, 2026 18:53
@mskd12 mskd12 force-pushed the deepak/iop-85-kp-rotation-v2 branch 2 times, most recently from 3de5d7e to 4a399b9 Compare May 22, 2026 19:13
@mskd12 mskd12 mentioned this pull request May 22, 2026
3 tasks
@mskd12 mskd12 marked this pull request as draft May 22, 2026 21:57
Comment thread crates/hashi-types/src/guardian/mod.rs Outdated
pub struct RotateKpsState {
/// HPKE public keys for the new KP set. Length equals `new_num_shares`.
new_kp_pubkeys: Vec<EncPubKeyBytes>,
new_num_shares: usize,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed, isn't that new_kp_pubkeys.len()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is technically redundant. i prefer to have it tho for ease of understanding.

Copy link
Copy Markdown
Contributor

@benr-ml benr-ml May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, pls document that reason

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the introduction of SecretSharingParams, the num_shares field has gone a level deeper. do you still want me to add some doc?

Comment thread crates/hashi-types/src/guardian/mod.rs Outdated

impl RotateKpsState {
pub fn new(
new_kp_pubkeys: Vec<EncPubKeyBytes>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - sort new_kp_pubkeys as the canonical rep of this set

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

/// 1. reconstructs the BTC key in memory,
/// 2. re-splits it with fresh randomness for `state.new_kp_pubkeys`,
/// 3. writes `SecretSharingLogMessage` with `sharing_seq = prev + 1`.
/// New KPs fetch their encrypted shares from there.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of putting those encryptions in public files, also for setup (maybe let's add a todo to improve that later)

Copy link
Copy Markdown
Contributor Author

@mskd12 mskd12 May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, two options:

  1. have a separate log message with just the encryptions and a shorter object lock duration (say a month) + auto delete turned on
  2. rely on the operator to call rotate_kps & setup_new_key and distribute the encryptions back to kp's. kp's check if these decrypted shares match the commitments on S3 and complain if not.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

option 2 sounds better (maybe log to S3 the hash of each encryption)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, a related thing we could do is have a getter endpoint in guardian that returns the ciphertexts. Even if the operator loses the ciphertexts, we can still retrieve them. I will make these changes in a subsequent PR.

info!("Total shares received: {count}/{old_t}.");

// 5) On threshold: reconstruct, re-split, emit new SecretSharingLogMessage
if count >= old_t {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this can be invoked more than once? both in the happy and unhappy flows:

  • (old_num_shares - old_t + 1) times - when all old parties call it
  • replay attack old messages

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean by "replay attack old messages". the first is a problem that i fixed with a "finish" flag.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a lock/mutex on this function?
can it be that:

  • n requests reach line 81 and then all of them enter the if, invoking finalize_rotation n-t-1 times

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added a ceremony_complete lock on line 34 yesterday

mskd12 and others added 2 commits May 26, 2026 17:32
Adds the rotate_kps gRPC handler. Each current KP submits an encrypted
old share along with a state portion describing the rotation target
(new KP pubkeys, new N, new T, current commitments, current seq).
T-of-N digest-matched across submissions; the same digest is bound as
HPKE AAD on the encrypted share.

On reaching the current threshold the enclave:
- reconstructs the BTC key in memory from the old shares,
- re-splits it with fresh randomness for the new KP set using the
  new (n, t),
- writes CurrentKeyState { seq = current_seq + 1, encrypted_shares,
  secret_sharing_config } to key_state/.

Asymmetric rotation (new N/T differs from old) is supported.

Cross-checks: the KP-supplied current_share_commitments must match
what the enclave was given at operator_init (via SecretSharingConfig);
each old share is verified against those commitments before reaching
the digest-match step.

Tests cover happy path (symmetric + asymmetric), dup share, state
mismatch panic, share-not-matching-commitments, state commitments
not matching enclave, wrong pubkey count, duplicate new pubkeys, and
pre-operator-init rejection.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Scratchpad: clarify that `shares` and `state_hash` cover both
  provisioner_init (non-setup) and rotate_kps (setup) flows. Drop the
  unrelated std::sync::Mutex investigation TODO on `shares`.
- main: bind thread_rng() once and pass &mut rng to both keypair ctors
  for consistency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mskd12 added a commit that referenced this pull request May 26, 2026
Once the threshold is reached and the new shares are logged, the
remaining old KPs would keep calling rotate_kps and re-enter the
finalize branch, re-splitting the key into a fresh share set at the
same sharing_seq. Add a `rotate_kps_complete` flag, set under the
shares lock after finalization, that rejects further submissions —
mirroring setup_new_key's one-shot semantics. Addresses PR #592 review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mskd12 mskd12 force-pushed the deepak/iop-85-kp-rotation-v2 branch from 9a710f0 to d7f9eb0 Compare May 26, 2026 21:58
mskd12 added a commit that referenced this pull request May 27, 2026
Review follow-ups for PR #592:

- One-shot ceremony: generalize setup_new_key's completion lock into a
  shared `ceremony_complete` (renamed from setup_new_key_lock) that both
  setup_new_key and rotate_kps hold across their flow. A setup-mode
  enclave now runs exactly one ceremony (genesis OR rotation, never
  both); once a rotation finalizes, the remaining old KPs' submissions
  are rejected instead of re-splitting the key at the same sharing_seq.
- Rename the secret-sharing log to "ceremony": SecretSharingLogMessage
  -> CeremonyLogMessage, LogMessage::SecretSharing -> ::Ceremony,
  log_secret_sharing() -> log_ceremony(), S3 prefix secret_sharing/ ->
  ceremony/. (SecretSharingInstance, the scheme, keeps its name.)
- rotate_kps RPC handler uses require_setup_mode() like the others.
- Remove the now-dead parse_enc_pubkeys (no callers since the PGP
  migration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mskd12 mskd12 force-pushed the deepak/iop-85-kp-rotation-v2 branch from d7f9eb0 to 39a2d8d Compare May 27, 2026 14:31
mskd12 added a commit that referenced this pull request May 27, 2026
Review follow-ups for PR #592:

- One-shot ceremony: generalize setup_new_key's completion lock into a
  shared `ceremony_complete` (renamed from setup_new_key_lock) that both
  setup_new_key and rotate_kps hold across their flow. A guardian runs
  exactly one ceremony per instance (genesis OR rotation, never both);
  once a rotation finalizes, the remaining old KPs' submissions are
  rejected instead of re-splitting the key at the same sharing_seq.
- "Ceremony" naming throughout:
  - log: SecretSharingLogMessage -> CeremonyLogMessage,
    LogMessage::SecretSharing -> ::Ceremony, log_secret_sharing() ->
    log_ceremony(), S3 prefix secret_sharing/ -> ceremony/.
  - mode: "setup mode" -> "ceremony mode", incl. the env var
    SETUP_MODE -> CEREMONY_MODE. Deployments setting SETUP_MODE must
    switch to CEREMONY_MODE, else the guardian boots in normal mode.
  - SecretSharingInstance (the scheme) keeps its name.
- rotate_kps RPC handler uses require_ceremony_mode() like the others.
- Remove the now-dead parse_enc_pubkeys (no callers since the PGP migration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mskd12 mskd12 force-pushed the deepak/iop-85-kp-rotation-v2 branch from 39a2d8d to bcd2ae3 Compare May 27, 2026 14:42
mskd12 added a commit that referenced this pull request May 27, 2026
Review follow-ups for PR #592:

- One-shot ceremony: generalize setup_new_key's completion lock into a
  shared `ceremony_complete` (renamed from setup_new_key_lock) that both
  setup_new_key and rotate_kps hold across their flow. A guardian runs
  exactly one ceremony per instance (genesis OR rotation, never both);
  once a rotation finalizes, the remaining old KPs' submissions are
  rejected instead of re-splitting the key at the same sharing_seq.
- "Ceremony" naming throughout:
  - log: SecretSharingLogMessage -> CeremonyLogMessage,
    LogMessage::SecretSharing -> ::Ceremony, log_secret_sharing() ->
    log_ceremony(), S3 prefix secret_sharing/ -> ceremony/.
  - mode: "setup mode" -> "ceremony mode", incl. the env var
    SETUP_MODE -> CEREMONY_MODE. Deployments setting SETUP_MODE must
    switch to CEREMONY_MODE, else the guardian boots in normal mode.
  - SecretSharingInstance (the scheme) keeps its name.
- rotate_kps RPC handler uses require_ceremony_mode() like the others.
- Remove the now-dead parse_enc_pubkeys (no callers since the PGP migration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mskd12 mskd12 force-pushed the deepak/iop-85-kp-rotation-v2 branch from bcd2ae3 to 06e2f2e Compare May 27, 2026 14:51
mskd12 added a commit that referenced this pull request May 27, 2026
Review follow-ups for PR #592:

- One-shot ceremony: generalize setup_new_key's completion lock into a
  shared `ceremony_complete` (renamed from setup_new_key_lock) that both
  setup_new_key and rotate_kps hold across their flow. A guardian runs
  exactly one ceremony per instance (genesis OR rotation, never both);
  once a rotation finalizes, the remaining old KPs' submissions are
  rejected instead of re-splitting the key at the same sharing_seq.
- "Ceremony" naming throughout:
  - log: SecretSharingLogMessage -> CeremonyLogMessage,
    LogMessage::SecretSharing -> ::Ceremony, log_secret_sharing() ->
    log_ceremony(), S3 prefix secret_sharing/ -> ceremony/.
  - mode: "setup mode" -> "ceremony mode", incl. the env var
    SETUP_MODE -> CEREMONY_MODE. Deployments setting SETUP_MODE must
    switch to CEREMONY_MODE, else the guardian boots in normal mode.
  - SecretSharingInstance (the scheme) keeps its name.
- rotate_kps RPC handler uses require_ceremony_mode() like the others.
- Remove the now-dead parse_enc_pubkeys (no callers since the PGP migration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mskd12 mskd12 force-pushed the deepak/iop-85-kp-rotation-v2 branch from 06e2f2e to 9f4960f Compare May 27, 2026 14:52
@mskd12 mskd12 marked this pull request as ready for review May 27, 2026 14:54
mskd12 added a commit that referenced this pull request May 27, 2026
Review follow-ups for PR #592:

- One-shot ceremony: generalize setup_new_key's completion lock into a
  shared `ceremony_complete` (renamed from setup_new_key_lock) that both
  setup_new_key and rotate_kps hold across their flow. A guardian runs
  exactly one ceremony per instance (genesis OR rotation, never both);
  once a rotation finalizes, the remaining old KPs' submissions are
  rejected instead of re-splitting the key at the same sharing_seq.
- RotateKpsState canonicalizes the new KP cert set (sorted in `new`), so
  its digest — which all T old KPs must agree on — is independent of the
  order each submits the certs in.
- "Ceremony" naming throughout:
  - log: SecretSharingLogMessage -> CeremonyLogMessage,
    LogMessage::SecretSharing -> ::Ceremony, log_secret_sharing() ->
    log_ceremony(), S3 prefix secret_sharing/ -> ceremony/.
  - mode: "setup mode" -> "ceremony mode", incl. the env var
    SETUP_MODE -> CEREMONY_MODE. Deployments setting SETUP_MODE must
    switch to CEREMONY_MODE, else the guardian boots in normal mode.
  - SecretSharingInstance (the scheme) keeps its name.
- rotate_kps RPC handler uses require_ceremony_mode() like the others.
- Remove the now-dead parse_enc_pubkeys (no callers since the PGP migration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mskd12 mskd12 force-pushed the deepak/iop-85-kp-rotation-v2 branch from 9f4960f to e1825ec Compare May 27, 2026 15:06
@mskd12 mskd12 requested a review from benr-ml May 27, 2026 15:07
@mskd12 mskd12 changed the title [Guardian] Add rotate_kps: setup-mode KP rotation [Guardian] Add rotate_kps: ceremony-mode KP rotation May 27, 2026
Review follow-ups for PR #592:

- One-shot ceremony: generalize setup_new_key's completion lock into a
  shared `ceremony_complete` (renamed from setup_new_key_lock) that both
  setup_new_key and rotate_kps hold across their flow. A guardian runs
  exactly one ceremony per instance (genesis OR rotation, never both);
  once a rotation finalizes, the remaining old KPs' submissions are
  rejected instead of re-splitting the key at the same sharing_seq.
- RotateKpsState canonicalizes the new KP cert set (sorted in `new`), so
  its digest — which all T old KPs must agree on — is independent of the
  order each submits the certs in.
- "Ceremony" naming throughout:
  - log: SecretSharingLogMessage -> CeremonyLogMessage,
    LogMessage::SecretSharing -> ::Ceremony, log_secret_sharing() ->
    log_ceremony(), S3 prefix secret_sharing/ -> ceremony/.
  - mode: "setup mode" -> "ceremony mode", incl. the env var
    SETUP_MODE -> CEREMONY_MODE. Deployments setting SETUP_MODE must
    switch to CEREMONY_MODE, else the guardian boots in normal mode.
  - SecretSharingInstance (the scheme) keeps its name.
- rotate_kps RPC handler uses require_ceremony_mode() like the others.
- Remove the now-dead parse_enc_pubkeys (no callers since the PGP migration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mskd12 mskd12 force-pushed the deepak/iop-85-kp-rotation-v2 branch from e1825ec to 848d58e Compare May 27, 2026 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants