Skip to content

[leader] drive guardian committee handoff#568

Merged
0xsiddharthks merged 6 commits into
mainfrom
siddharth/guardian-committee-handoff
May 28, 2026
Merged

[leader] drive guardian committee handoff#568
0xsiddharthks merged 6 commits into
mainfrom
siddharth/guardian-committee-handoff

Conversation

@0xsiddharthks
Copy link
Copy Markdown
Contributor

@0xsiddharthks 0xsiddharthks commented May 19, 2026

builds on top of #569. The leader drives the guardian's committee forward to match the on-chain epoch each tick.

Changes

  • SignCommitteeTransition RPC:
    • each peer rebuilds the transition payload from on-chain state at from_epoch + 1 and signs with the historical from_epoch BLS key.
    • uses Hashi::sign_message_proto_at_epoch for signing with a historical epoch's key rather than the current one.
  • leader loop:
    • Each leader tick spawns a bounded reconcile task to check / update the guardian epoch.
  • add a metric hashi_guardian_current_committee_epoch for the guardian's reported epoch.

@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-committee-handoff branch from 1efe9d7 to db82103 Compare May 19, 2026 12:03
@0xsiddharthks 0xsiddharthks changed the base branch from main to siddharth/guardian-update-committee-rpc May 19, 2026 12:03
@0xsiddharthks 0xsiddharthks changed the title [guardian] signed committee handoff via UpdateCommittee RPC [guardian] drive committee handoff each leader tick May 19, 2026
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-update-committee-rpc branch from 8bcd31d to 31419f7 Compare May 20, 2026 20:54
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-committee-handoff branch from db82103 to 7d1e4c9 Compare May 20, 2026 20:54
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-update-committee-rpc branch from 31419f7 to c8a0fa0 Compare May 27, 2026 06:21
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-committee-handoff branch from 7d1e4c9 to d6e7673 Compare May 27, 2026 06:21
@0xsiddharthks 0xsiddharthks changed the title [guardian] drive committee handoff each leader tick [leader] drive guardian committee handoff May 27, 2026
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-committee-handoff branch from 7d8b386 to 3043f68 Compare May 27, 2026 11:54
@0xsiddharthks 0xsiddharthks marked this pull request as ready for review May 27, 2026 15:20
@0xsiddharthks 0xsiddharthks requested a review from bmwill as a code owner May 27, 2026 15:20
Comment thread crates/hashi/src/leader/mod.rs Outdated
bmwill
bmwill previously requested changes May 27, 2026
Comment thread crates/hashi-types/proto/sui/hashi/v1alpha/bridge_service.proto Outdated
from_epoch,
hashi_epoch, "Driving guardian committee handoff"
);
let signed = Self::collect_committee_transition_signatures(inner, from_epoch).await?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one challenge with this is that we're expecting a previous committee to ratchet the guardian up to a new epoch. Assuming this happens quickly after epoch change this should be fine, and we can always recover by having the provisioners explicitly do the ratcheting themselves. But you need to worry about the "leader" maybe not even being a part of that previous committee.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmwill wait i think the leader doesn't have to be part of the outgoing commmittee, since it only fans out the SignCommitteeTransition over peer RPCs to the outgoing committee's members and aggregates a threshold cert.

Each follower signs with the historical epoch key it kept in db.signing_keys, so a hashi-server that has rotated out of the active committee can still sign for an epoch it was in.

so as long as enough former committee members are online i think we should be fine, right?

Comment thread crates/hashi/src/leader/mod.rs Outdated
Comment thread crates/hashi/src/leader/mod.rs Outdated
Comment thread crates/hashi/src/leader/mod.rs Outdated
Comment thread crates/hashi/src/leader/mod.rs
Comment thread crates/hashi/src/leader/mod.rs
Comment thread crates/hashi/src/leader/mod.rs Outdated
Comment thread crates/hashi/src/leader/mod.rs Outdated
Comment thread crates/hashi/src/withdrawals.rs Outdated
Base automatically changed from siddharth/guardian-update-committee-rpc to main May 28, 2026 05:03
0xsiddharthks added a commit that referenced this pull request May 28, 2026
- Route guardian RPC metrics through the existing
  RpcMetricsMakeCallbackHandler middleware via a new
  GuardianClient::with_metrics setter (mirrors grpc::client::Client),
  and drop the explicit time_guardian_rpc wrapper and the now-unused
  GUARDIAN_RPC_METHOD_UPDATE_COMMITTEE constant.
- Fix the sparse-epoch handling: leader and followers pick the next
  on-chain committee via range((from_epoch + 1)..).next() instead of
  from_epoch + 1. Aligns with the guardian-side fix on PR #569.
- Rename maybe_reconcile_guardian_committee to
  check_reconcile_guardian_committee to match the file's check_* helpers.
- Only spawn the reconcile task when the hashi epoch advances
  (track last_guardian_reconcile_epoch).
- Hoist the initial GetGuardianInfo out of the loop; reuse the
  current_committee_epoch from each UpdateCommittee response.
- Switch the inflight-task guard to is_some() so a completed-but-
  unconsumed reconcile result is never dropped.
- Warn when the guardian's epoch runs ahead of hashi's, and bail when
  UpdateCommittee omits current_committee_epoch.
The guardian's committee is set once at ProvisionerInit and can never
change, so signature verification fails as soon as hashi rotates past
the bootstrap epoch. The guardian-side fix (new `UpdateCommittee` RPC
and `current_committee_epoch` reporting) lands in the stacked PR
underneath this one. This PR wires the hashi-server side that drives
the handoff.

- Each leader tick spawns a bounded one-shot reconcile task. It reads
  the guardian's `current_committee_epoch`, and for each missing step
  fans out `SignCommitteeTransition` across the OUTGOING committee,
  aggregates a BLS cert with each member's historical per-epoch BLS
  signing key from `db.signing_keys`, and sends an `UpdateCommittee`
  to advance the guardian by one epoch.
- The new committee in the transition is reconstructed by each signer
  from on-chain state at `from_epoch + 1` — no committee bytes travel
  on the inter-node wire, so the leader can't get peers to sign
  attacker-crafted committees.
- Idempotency lives on the guardian side, so leader churn / lost RPC
  results are safe — the next leader simply repeats.
- New metric `hashi_guardian_current_committee_epoch` mirrors the
  guardian's reported epoch.
Apply review feedback over the original commit: factor the duplicated
"time + record outcome" pattern in `reconcile_guardian_committee` into a
small helper, and shorten doc comments throughout to match neighboring
RPC handlers and signing helpers.

No behavior change.
Drop the `maybe_reconcile_guardian_committee` and
`validate_and_sign_committee_transition` doc blocks (siblings have
none), pull `reconcile_guardian_committee`, `time_guardian_rpc`, and
`sign_message_proto_at_epoch` docs to one line, and shorten the inline
`ProvisionerInit` and `Bail out` comments. Also trim the proto-side
`SignCommitteeTransition` rationale.
- Route guardian RPC metrics through the existing
  RpcMetricsMakeCallbackHandler middleware via a new
  GuardianClient::with_metrics setter (mirrors grpc::client::Client),
  and drop the explicit time_guardian_rpc wrapper and the now-unused
  GUARDIAN_RPC_METHOD_UPDATE_COMMITTEE constant.
- Fix the sparse-epoch handling: leader and followers pick the next
  on-chain committee via range((from_epoch + 1)..).next() instead of
  from_epoch + 1. Aligns with the guardian-side fix on PR #569.
- Rename maybe_reconcile_guardian_committee to
  check_reconcile_guardian_committee to match the file's check_* helpers.
- Only spawn the reconcile task when the hashi epoch advances
  (track last_guardian_reconcile_epoch).
- Hoist the initial GetGuardianInfo out of the loop; reuse the
  current_committee_epoch from each UpdateCommittee response.
- Switch the inflight-task guard to is_some() so a completed-but-
  unconsumed reconcile result is never dropped.
- Warn when the guardian's epoch runs ahead of hashi's, and bail when
  UpdateCommittee omits current_committee_epoch.
Clearing last_guardian_reconcile_epoch when the reconcile task errors
keeps the original behavior of retrying a failed handoff on the next
checkpoint (e.g. transient guardian downtime), while a successful
reconcile still holds the gate until the hashi epoch advances.
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-committee-handoff branch from ab1cc55 to 55cfd4d Compare May 28, 2026 14:18
@0xsiddharthks 0xsiddharthks requested a review from bmwill May 28, 2026 14:28
@0xsiddharthks 0xsiddharthks dismissed bmwill’s stale review May 28, 2026 15:02

addressed comments, and changes approved by luke. dismissing to unblock merge. since github is blocking merge because of this.

@0xsiddharthks 0xsiddharthks merged commit 5116130 into main May 28, 2026
5 checks passed
@0xsiddharthks 0xsiddharthks deleted the siddharth/guardian-committee-handoff branch May 28, 2026 15:02
jessiemongeon1 added a commit to jessiemongeon1/hashi that referenced this pull request May 28, 2026
Automated update based on: New SignCommitteeTransition RPC added where peers sign with historical epoch BLS keys. Leader now drives guardian committee handoff to match on-chain epoch.
jessiemongeon1 added a commit to jessiemongeon1/hashi that referenced this pull request May 28, 2026
Automated update based on: New SignCommitteeTransition RPC added where peers sign with historical epoch BLS keys. Leader now drives guardian committee handoff to match on-chain epoch.
jessiemongeon1 added a commit to jessiemongeon1/hashi that referenced this pull request May 28, 2026
Automated update based on: New SignCommitteeTransition RPC added where peers sign with historical epoch BLS keys. Leader now drives guardian committee handoff to match on-chain epoch.
jessiemongeon1 added a commit to jessiemongeon1/hashi that referenced this pull request May 28, 2026
Automated update based on: New SignCommitteeTransition RPC added where peers sign with historical epoch BLS keys. Leader now drives guardian committee handoff to match on-chain epoch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants