feat(frost): multi-seat per-seat ROAST coordinator deps (7.3 PR2b-1.5)#4085
Merged
mswilkison merged 8 commits intoJun 19, 2026
Conversation
…oundation)
PR2b-1.5 multi-seat foundation (consult-locked Option A, Codex+Gemini converged).
A multi-seat operator runs one signer goroutine per local seat, but ROAST deps
were registered ONCE process-wide with a Coordinator bound to a single
SelfMember; whichever local seat is the elected ROAST coordinator for an attempt
could only aggregate if it happened to be deps.SelfMember, else AggregateBundle
returned ErrNotAggregator -> no transition bundle -> the next retry fail-closed.
This commit replaces the single registry slot with a per-member map and adds the
per-seat API, WITHOUT changing single-seat behavior (the call-sites still use the
legacy helpers; commit 2 threads the per-member ones through):
- roastRetryRegistrationByMember map[MemberIndex]RoastRetryDeps
- RegisterRoastRetryCoordinatorForMember(member, deps) -- enforces
deps.SelfMember == member (the coordinator is bound to deps.SelfMember at
construction, so a mismatch would let it aggregate as the wrong seat; rejected)
- RegisteredRoastRetryCoordinatorForMember(member)
- RoastRetryActiveForMember(member) (readiness AND producer AND this member
registered) -- per-seat activation so an unregistered seat stays on legacy
rather than fail-closing
- back-compat: RegisterRoastRetryCoordinator(deps) registers under
deps.SelfMember; RegisteredRoastRetryCoordinator()/RoastRetryActive() keep the
legacy "any registered" semantics (single-seat unchanged; the ~20 existing
callers + default-build stubs are untouched)
Single-seat is byte-identical (legacy wrapper round-trips one entry). Per-member
overwrite/coexist, self-member-mismatch rejection, and per-member activation are
tested; full signing suite green in all three producer states + -race.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lection (7.3 PR2b-1.5, activation)
Activates multi-seat ROAST using commit 1's per-member registry (Option A). Every
ROAST handle mint/consume now uses THIS seat's coordinator, so whichever local
seat is the elected coordinator for an attempt collects + aggregates with its own
binding (it no longer needs to be the single process-wide deps.SelfMember):
- signing_loop.go: active-attempt gate -> RoastRetryActiveForMember(member)
- roast_selection_consume: RoastRetryActiveForMember(member) +
RegisteredRoastRetryCoordinatorForMember(member) before NextAttempt
- roast_observe_attempt: RoastRetryActiveForMember(request.MemberIndex) + per-member
deps before BeginAttempt
- transition controller: build the exchange from RegisteredRoastRetryCoordinator
ForMember(template.MemberIndex)
- transition exchange: the elected-collector check is now `elected == e.member`
(the unambiguous seat identity) rather than deps.SelfMember
- BeginOrchestrationForSession gains a member param and mints the handle from that
member's coordinator; the executor entry passes request.MemberIndex; the
interactive drive re-reads deps for request.MemberIndex
Single-seat is byte-identical (one registered member -> the per-member lookup
returns the same deps the legacy any-entry path did). The Phase-4 coarse evidence
path (submitSnapshotIfActive / the session-keyed coarse handle registry) keeps the
legacy any-entry lookup -- its multi-seat collision is a SEPARATE, pre-existing
latent issue (the coarse receive-loop binding registry), out of scope for the
transition-aggregation fix and untouched here.
Verified: gofmt; vet 5 combos (incl. cgo); full pkg/frost/signing in all 3 producer
states; full pkg/tbtc tagged 208s + frost_roast_retry-only 146.7s + default 146.6s;
-race on the changed paths. The ~7 BeginOrchestrationForSession test callers updated
mechanically.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… PR2b-1.5) The PR2b-1.5 acceptance test + Codex's local-fanout guardrail. An operator controls multiple local seats, each with its own per-member coordinator (shared operator key); the deterministically-elected seat aggregates the transition bundle with ITS own coordinator and stores the next-attempt record. Pre-fix, a single process-wide coordinator bound to a different SelfMember returned ErrNotAggregator here, so no bundle was produced and the next retry fail-closed. Documents the local-fanout assumption at the test: in production the per-seat exchanges share ONE wallet BroadcastChannel, and a node's own broadcast reaches its OTHER local seats' subscribers via the channel's self-delivery (libp2p FloodSub delivers a node's publishes to its own subscriptions; the membership validator passes own-author). That self-delivery is a transport-contract assumption to confirm at prod-wiring; if a transport does not self-deliver, explicit local fanout is the remedy. The aggregation fix is independent of how the sibling snapshot arrives. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Folds my /review of the PR2b-1.5 commits. The highest-value finding was a real
miss: the multi-seat acceptance test was VACUOUS -- for its context the elected
coordinator is member 1 (= the legacy process-default SelfMember) and it only built
the elected's exchange, so it passed even against the pre-fix shared-coordinator
code.
- Rewrite TestRoastTransitionExchange_MultiSeatElectedSeatAggregates: build an
exchange + observe binding for EVERY local seat from its OWN per-member deps,
then assert (a) the elected seat aggregates with its own coordinator (record +
one bundle) AND (b) the NON-elected siblings do NOT aggregate (no bundle). (b)
is the non-vacuous half: pre-fix all seats shared one coordinator bound to the
elected member, so a sibling's AggregateBundle would have run as the elected
member and broadcast a bundle; per-member coordinators make it ErrNotAggregator.
Also assert a sibling stores its own record from the elected's bundle (the
sibling-unparking path).
- RegisterRoastRetryCoordinatorForMember now rejects member 0 (members are 1-based;
a selfMember-0 coordinator is the never-aggregating sentinel, so registering one
would silently mis-bind).
- signing_loop.go: the active-attempt gate comment said "deterministic, group-wide
RoastRetryActive" but it is now the PER-SEAT RoastRetryActiveForMember; corrected.
- Extract readinessAndProducerReady() shared by RoastRetryActive and
RoastRetryActiveForMember (was a copy-pasted 2-check prefix).
Adjudicated NOT to fold (documented): submitSnapshotIfActive / the session-keyed
coarse handle registry remain any-entry -- a pre-existing latent multi-seat issue
in the coarse receive-loop binding registry (no live binding producer wired), out of
scope for the transition-aggregation fix; the silent SelfMember-mismatch reject is
fail-safe to legacy; the local-fanout / aggregation-window / all-local-test-proxy
items are the documented transport assumptions to verify at prod-wiring.
Verified: gofmt; build 4 combos; full pkg/frost/signing 3 producer states + -race;
pkg/tbtc ROAST sanity (tagged + frost_roast_retry-only).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…egistration (7.3 PR2b-1.5, Codex P2) Folds Codex's review of PR #4085. Gemini approved; Codex found 3 P2s -- P2-1 is a real fracture I introduced when threading per-member deps. P2-1 (partial-activation fracture): commit 2 made the SELECTOR's activation gate per-member (RoastRetryActiveForMember). A multi-seat operator with member A registered and member B not would then drive A via the transition (ROAST) and B via the legacy shuffle for the SAME attempt -> divergent included sets -> fracture. The legacy-fallback sentinel is only safe when UNIFORM. Fix: the selector's ROAST-vs-legacy decision is now PROCESS-level (RoastRetryActive, any-entry), and a missing per-member coordinator (partial registration -- a wiring bug) FAILS CLOSED, never legacy (a legacy fallback for the unregistered seat while a sibling selects from the transition would split the included set). The per-member loop/observe gates stay as-is: an unregistered seat cannot observe and fails closed at the selector, so it never selects a divergent set. P2-2 (stale deps on reject): RegisterRoastRetryCoordinatorForMember now DELETES any existing entry when it rejects a registration (member 0 or SelfMember mismatch), so a bad re-registration deactivates the seat (fail-safe to legacy) instead of leaving stale deps active. P2-3 (coarse submitSnapshotIfActive any-entry): NOT folded -- pre-existing Phase-4 coarse evidence path (the session-keyed handle registry is not member-keyed, and no live binding producer is wired, so it is inert). A partial fix (per-member coordinator only) would pair it with a still-session-keyed sibling handle and be worse; the full fix needs the session-handle registry member-keyed too -- a separate scope and a coarse-path prod-wiring prerequisite. Gemini concurred it is out of scope here. Tests: PartialRegistrationFailsClosed (member 1 registered, select member 2 -> fail closed, not legacy); RejectDropsExistingEntry. Verified: gofmt; build 4 combos; full pkg/frost/signing 3 producer states + -race; full pkg/tbtc tagged 207s + frost_roast_retry-only 146.9s + default 146.9s (the earlier ROAST_ONLY/DEFAULT failures were timing flakes -- both pass clean on re-run). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
….5, Codex re-review P2) submitSnapshotIfActive is not yet member-aware: it looks up deps via the any-entry RegisteredRoastRetryCoordinator and keys the drive handle by sessionID alone. Under a multi-seat operator that would attribute one local seat's evidence to an arbitrary sibling (any-entry deps) and collide their drive handles (sessionID-keyed binding) -- Codex's re-review P2-A/P2-B. The session-handle binding cannot be cleanly member-keyed in this PR: binding validation reads the member-independent attempt context, so keying handles by member would split that read. The coarse/drive evidence is also inert in 1b/1.5 (no production caller submits; it is consumed only by the PR2b-2 blame bridge). So guard the path: a new registeredRoastRetryMemberCount() (real under frost_roast_retry, stubbed 0 in the default build) lets submitSnapshotIfActive no-op when more than one local seat is registered. This fails SAFE (disabling an inert path) rather than mis-attributing evidence; single-seat is unchanged. The full member-aware coarse/drive path (member-keyed deps + handle) is deferred to PR2b-2, where this evidence is actually consumed. Verification: gofmt clean; builds green across default / frost_native / frost_roast_retry / frost_native+frost_roast_retry / frost_native+frost_tbtc_signer cgo; pkg/frost/signing green across all 4 producer states + -race. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-1.5, Codex re-review P2) BeginOrchestrationForSession is member-aware (RegisteredRoastRetryCoordinatorForMember), but when THIS seat was unregistered it returned the ErrNoRoastRetryCoordinatorRegistered legacy-fallback sentinel unconditionally. For a multi-seat operator that registered some but not all seats, the unregistered seat would then run the coarse/legacy primitive while registered sibling seats drive bound ROAST messages -- mixed bound/unbound for the same attempt, a group fracture (Codex re-review of #4085). The legacy-fallback sentinel is only safe as a UNIFORM, group-wide decision: i.e. when NO seat is registered anywhere (registeredRoastRetryMemberCount() == 0). When a sibling seat IS registered, advertising the fallback for an unregistered seat is non-uniform within the operator and fractures the attempt, so fail CLOSED with a non-sentinel error (the dispatcher treats it as terminal and aborts Execute rather than falling through to coarse). Also fail closed for FULLY-registered multi-seat (count > 1): the session-handle binding (SetCurrentAttemptHandleForSession) is still keyed by sessionID alone, so two local seats in one attempt would collide. Member-keyed handles are deferred to PR2b-2; until then any multi-seat operator fails closed rather than mis-bind. Mirrors the coarse evidence path's multi-seat guard (submitSnapshotIfActive) in this same PR. Single-seat is unchanged. Tests: partial multi-seat (Codex's case) and fully-registered multi-seat both fail closed with a non-sentinel error and create no session binding. Verified: gofmt clean; builds green on default / frost_native / frost_roast_retry / frost_native+frost_roast_retry; pkg/frost/signing green across all 4 producer states + -race. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… (7.3 PR2b-1.5, Codex re-review P2) The multi-seat fail-closed guard in BeginOrchestrationForSession returned a bare non-sentinel error. That error flows through the tBTC executor into signingRetryLoop as a signingAttemptFn error, which the loop treats as a FAILED ATTEMPT: it drives transitionController.OnAttemptFailed, reportAttemptOutcome(false), and retries (`continue`). Because multi-seat is a STATIC condition no future attempt can resolve, the node would spin futile attempts until the signing timeout and synthesize garbage failed-attempt transitions, instead of failing closed (Codex re-review of #4085). Coarse fallback is not an option: interactive<->coarse mixing fractures the group. Fix (Codex+Gemini consult -> Option B, terminal error classification): - Add an exported ErrTerminalSigningFailure sentinel in pkg/frost/signing, documented as the THIRD disposition in the orchestration error taxonomy: STATIC -> coarse fallback; RUNTIME -> no fallback but RETRY (may be transient); TERMINAL -> ABORT the retry loop (static, futile to retry, unsafe to fall back). - BeginOrchestrationForSession wraps ErrTerminalSigningFailure for both multi-seat fail-closed branches (count>1, and the partial count==1 unregistered-seat case); count==0 stays the STATIC ErrNoRoastRetryCoordinatorRegistered fallback sentinel. - signingRetryLoop matches it via errors.Is and exits immediately (return nil, err) BEFORE the retry/transition machinery -- no OnAttemptFailed pollution, no spin. The loop only classifies retry disposition; FROST owns the terminal classification. Tests: orchestration fail-closed tests assert the terminal classification; a new signing_loop test proves a terminal attempt error aborts after exactly one attempt (no retry), while the existing RUNTIME signing-error case still retries. Verified: gofmt clean; builds green on default / frost_native / frost_roast_retry / frost_native+frost_roast_retry; pkg/frost/signing green across all 4 producer states + -race; full pkg/tbtc default suite green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2cc36ef
into
feat/frost-schnorr-migration-scaffold
15 of 16 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
RFC-21 Phase 7.3 PR2b-1.5 — multi-seat ROAST per-seat coordinator deps. A
multi-seat operator (one node controlling multiple signing-group members) runs one
signer goroutine per local seat, but ROAST deps were registered ONCE process-wide
(one
Coordinatorbound to oneSelfMember). So when the elected ROASTcoordinator for a failed attempt was a local seat OTHER than
deps.SelfMember,that seat could neither collect snapshots nor aggregate (
AggregateBundle→ErrNotAggregator) → no transition bundle → the next retry fail-closed for thewhole group.
Fix (consult-locked Option A — Codex+Gemini converged): a member-keyed registry
(
map[MemberIndex]RoastRetryDeps) + per-member API, so each local seat uses ITS OWNcoordinator (bound to that member). Whichever local seat is elected aggregates with
its own binding.
Gated + dormant. Behind
frost_native && frost_roast_retry+ the readiness envgate; no production code registers a coordinator (the path is inert until post-audit
prod-wiring). Single-seat is byte-identical — the legacy any-entry helpers are
kept as back-compat wrappers, so the ~20 test callers + default stubs are untouched.
Commits
4829c9223) — member-keyed registry +…ForMemberAPI +back-compat wrappers +
RoastRetryActiveForMember3b0017c4e) — thread per-member deps through observe / exchange(
elected == e.member) / selector / orchestration / interactive drive /controller / loop
e031eca19) + non-vacuous rewrite + review folds(
649ca2802) — the elected seat aggregates with its own coordinator AND thesiblings do NOT; member-0 reject; doc/DRY cleanups
Fracture-safety
A multi-seat operator's seats are cryptographically indistinguishable from
independent single-seat nodes: each has its own isolated coordinator instance,
observe handle, and exchange. Signing a bundle "as" the elected local member
(CoordinatorID = elected) with the operator-scoped key is sound — the payload
commits to the member, and the operator key verifies for multiple member IDs.
Out of scope / deferred (documented)
BroadcastChannel; asibling's broadcast reaches the elected seat via libp2p FloodSub self-delivery
(
channel.deliverfans to all handlers, no self-filter). A transport-contractassumption to confirm at prod-wiring; explicit local fanout is the remedy if a
transport does not self-deliver.
submitSnapshotIfActive/ the session-keyed coarsehandle registry) keeps the legacy any-entry lookup — a pre-existing latent
multi-seat issue (the coarse receive-loop binding registry is not member-keyed;
no live binding producer is wired). Separate from the transition-aggregation fix.
Hard gates (pre-production)
frost-secp256k1-tr =3.0.0external audit + recovery-leaf decision.Prod-wiring prerequisite: the step that first registers a ROAST coordinator must
register per-seat (one per local member); single-seat-only registration would
re-introduce the multi-seat fail-close.
Testing
gofmt; build + vet across default /
frost_native/frost_roast_retry-only /frost_native frost_roast_retry/frost_native frost_tbtc_signer cgo;-race;full
pkg/frost/signing× 3 producer states; fullpkg/tbtctagged 208s +frost_roast_retry-only 146.7s + default 146.6s.🤖 Generated with Claude Code