Skip to content

feat(frost): multi-seat per-seat ROAST coordinator deps (7.3 PR2b-1.5)#4085

Merged
mswilkison merged 8 commits into
feat/frost-schnorr-migration-scaffoldfrom
feat/frost-7.3-multiseat-per-seat-deps
Jun 19, 2026
Merged

feat(frost): multi-seat per-seat ROAST coordinator deps (7.3 PR2b-1.5)#4085
mswilkison merged 8 commits into
feat/frost-schnorr-migration-scaffoldfrom
feat/frost-7.3-multiseat-per-seat-deps

Conversation

@mswilkison

Copy link
Copy Markdown
Contributor

Summary

RFC-21 Phase 7.3 PR2b-1.5 — multi-seat ROAST per-seat coordinator deps. A
multi-seat operator (one node controlling multiple signing-group members) runs one
signer goroutine per local seat, but ROAST deps were registered ONCE process-wide
(one Coordinator bound to one SelfMember). So when the elected ROAST
coordinator for a failed attempt was a local seat OTHER than deps.SelfMember,
that seat could neither collect snapshots nor aggregate (AggregateBundle
ErrNotAggregator) → no transition bundle → the next retry fail-closed for the
whole group.

Fix (consult-locked Option A — Codex+Gemini converged): a member-keyed registry
(map[MemberIndex]RoastRetryDeps) + per-member API, so each local seat uses ITS OWN
coordinator (bound to that member). Whichever local seat is elected aggregates with
its own binding.

Gated + dormant. Behind frost_native && frost_roast_retry + the readiness env
gate; no production code registers a coordinator (the path is inert until post-audit
prod-wiring). Single-seat is byte-identical — the legacy any-entry helpers are
kept as back-compat wrappers, so the ~20 test callers + default stubs are untouched.

Commits

  • Foundation (4829c9223) — member-keyed registry + …ForMember API +
    back-compat wrappers + RoastRetryActiveForMember
  • Activation (3b0017c4e) — thread per-member deps through observe / exchange
    (elected == e.member) / selector / orchestration / interactive drive /
    controller / loop
  • Acceptance test (e031eca19) + non-vacuous rewrite + review folds
    (649ca2802) — the elected seat aggregates with its own coordinator AND the
    siblings do NOT; member-0 reject; doc/DRY cleanups

Fracture-safety

A multi-seat operator's seats are cryptographically indistinguishable from
independent single-seat nodes: each has its own isolated coordinator instance,
observe handle, and exchange. Signing a bundle "as" the elected local member
(CoordinatorID = elected) with the operator-scoped key is sound — the payload
commits to the member, and the operator key verifies for multiple member IDs.

Out of scope / deferred (documented)

  • Local fanout — the per-seat exchanges share one wallet BroadcastChannel; a
    sibling's broadcast reaches the elected seat via libp2p FloodSub self-delivery
    (channel.deliver fans to all handlers, no self-filter). A transport-contract
    assumption to confirm at prod-wiring; explicit local fanout is the remedy if a
    transport does not self-deliver.
  • Coarse evidence path (submitSnapshotIfActive / the session-keyed coarse
    handle registry) keeps the legacy any-entry lookup — a pre-existing latent
    multi-seat issue (the coarse receive-loop binding registry is not member-keyed;
    no live binding producer is wired). Separate from the transition-aggregation fix.

Hard gates (pre-production)

frost-secp256k1-tr =3.0.0 external audit + recovery-leaf decision.
Prod-wiring prerequisite: the step that first registers a ROAST coordinator must
register per-seat (one per local member); single-seat-only registration would
re-introduce the multi-seat fail-close.

Testing

gofmt; build + vet across default / frost_native / frost_roast_retry-only /
frost_native frost_roast_retry / frost_native frost_tbtc_signer cgo; -race;
full pkg/frost/signing × 3 producer states; full pkg/tbtc tagged 208s +
frost_roast_retry-only 146.7s + default 146.6s.

🤖 Generated with Claude Code

mswilkison and others added 4 commits June 18, 2026 22:32
…oundation)

PR2b-1.5 multi-seat foundation (consult-locked Option A, Codex+Gemini converged).
A multi-seat operator runs one signer goroutine per local seat, but ROAST deps
were registered ONCE process-wide with a Coordinator bound to a single
SelfMember; whichever local seat is the elected ROAST coordinator for an attempt
could only aggregate if it happened to be deps.SelfMember, else AggregateBundle
returned ErrNotAggregator -> no transition bundle -> the next retry fail-closed.

This commit replaces the single registry slot with a per-member map and adds the
per-seat API, WITHOUT changing single-seat behavior (the call-sites still use the
legacy helpers; commit 2 threads the per-member ones through):

  - roastRetryRegistrationByMember map[MemberIndex]RoastRetryDeps
  - RegisterRoastRetryCoordinatorForMember(member, deps) -- enforces
    deps.SelfMember == member (the coordinator is bound to deps.SelfMember at
    construction, so a mismatch would let it aggregate as the wrong seat; rejected)
  - RegisteredRoastRetryCoordinatorForMember(member)
  - RoastRetryActiveForMember(member) (readiness AND producer AND this member
    registered) -- per-seat activation so an unregistered seat stays on legacy
    rather than fail-closing
  - back-compat: RegisterRoastRetryCoordinator(deps) registers under
    deps.SelfMember; RegisteredRoastRetryCoordinator()/RoastRetryActive() keep the
    legacy "any registered" semantics (single-seat unchanged; the ~20 existing
    callers + default-build stubs are untouched)

Single-seat is byte-identical (legacy wrapper round-trips one entry). Per-member
overwrite/coexist, self-member-mismatch rejection, and per-member activation are
tested; full signing suite green in all three producer states + -race.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lection (7.3 PR2b-1.5, activation)

Activates multi-seat ROAST using commit 1's per-member registry (Option A). Every
ROAST handle mint/consume now uses THIS seat's coordinator, so whichever local
seat is the elected coordinator for an attempt collects + aggregates with its own
binding (it no longer needs to be the single process-wide deps.SelfMember):

  - signing_loop.go: active-attempt gate -> RoastRetryActiveForMember(member)
  - roast_selection_consume: RoastRetryActiveForMember(member) +
    RegisteredRoastRetryCoordinatorForMember(member) before NextAttempt
  - roast_observe_attempt: RoastRetryActiveForMember(request.MemberIndex) + per-member
    deps before BeginAttempt
  - transition controller: build the exchange from RegisteredRoastRetryCoordinator
    ForMember(template.MemberIndex)
  - transition exchange: the elected-collector check is now `elected == e.member`
    (the unambiguous seat identity) rather than deps.SelfMember
  - BeginOrchestrationForSession gains a member param and mints the handle from that
    member's coordinator; the executor entry passes request.MemberIndex; the
    interactive drive re-reads deps for request.MemberIndex

Single-seat is byte-identical (one registered member -> the per-member lookup
returns the same deps the legacy any-entry path did). The Phase-4 coarse evidence
path (submitSnapshotIfActive / the session-keyed coarse handle registry) keeps the
legacy any-entry lookup -- its multi-seat collision is a SEPARATE, pre-existing
latent issue (the coarse receive-loop binding registry), out of scope for the
transition-aggregation fix and untouched here.

Verified: gofmt; vet 5 combos (incl. cgo); full pkg/frost/signing in all 3 producer
states; full pkg/tbtc tagged 208s + frost_roast_retry-only 146.7s + default 146.6s;
-race on the changed paths. The ~7 BeginOrchestrationForSession test callers updated
mechanically.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… PR2b-1.5)

The PR2b-1.5 acceptance test + Codex's local-fanout guardrail. An operator controls
multiple local seats, each with its own per-member coordinator (shared operator
key); the deterministically-elected seat aggregates the transition bundle with ITS
own coordinator and stores the next-attempt record. Pre-fix, a single process-wide
coordinator bound to a different SelfMember returned ErrNotAggregator here, so no
bundle was produced and the next retry fail-closed.

Documents the local-fanout assumption at the test: in production the per-seat
exchanges share ONE wallet BroadcastChannel, and a node's own broadcast reaches its
OTHER local seats' subscribers via the channel's self-delivery (libp2p FloodSub
delivers a node's publishes to its own subscriptions; the membership validator
passes own-author). That self-delivery is a transport-contract assumption to confirm
at prod-wiring; if a transport does not self-deliver, explicit local fanout is the
remedy. The aggregation fix is independent of how the sibling snapshot arrives.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Folds my /review of the PR2b-1.5 commits. The highest-value finding was a real
miss: the multi-seat acceptance test was VACUOUS -- for its context the elected
coordinator is member 1 (= the legacy process-default SelfMember) and it only built
the elected's exchange, so it passed even against the pre-fix shared-coordinator
code.

  - Rewrite TestRoastTransitionExchange_MultiSeatElectedSeatAggregates: build an
    exchange + observe binding for EVERY local seat from its OWN per-member deps,
    then assert (a) the elected seat aggregates with its own coordinator (record +
    one bundle) AND (b) the NON-elected siblings do NOT aggregate (no bundle). (b)
    is the non-vacuous half: pre-fix all seats shared one coordinator bound to the
    elected member, so a sibling's AggregateBundle would have run as the elected
    member and broadcast a bundle; per-member coordinators make it ErrNotAggregator.
    Also assert a sibling stores its own record from the elected's bundle (the
    sibling-unparking path).
  - RegisterRoastRetryCoordinatorForMember now rejects member 0 (members are 1-based;
    a selfMember-0 coordinator is the never-aggregating sentinel, so registering one
    would silently mis-bind).
  - signing_loop.go: the active-attempt gate comment said "deterministic, group-wide
    RoastRetryActive" but it is now the PER-SEAT RoastRetryActiveForMember; corrected.
  - Extract readinessAndProducerReady() shared by RoastRetryActive and
    RoastRetryActiveForMember (was a copy-pasted 2-check prefix).

Adjudicated NOT to fold (documented): submitSnapshotIfActive / the session-keyed
coarse handle registry remain any-entry -- a pre-existing latent multi-seat issue
in the coarse receive-loop binding registry (no live binding producer wired), out of
scope for the transition-aggregation fix; the silent SelfMember-mismatch reject is
fail-safe to legacy; the local-fanout / aggregation-window / all-local-test-proxy
items are the documented transport assumptions to verify at prod-wiring.

Verified: gofmt; build 4 combos; full pkg/frost/signing 3 producer states + -race;
pkg/tbtc ROAST sanity (tagged + frost_roast_retry-only).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e82e2881-19d9-44e4-9cac-ccd810d0f946

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/frost-7.3-multiseat-per-seat-deps

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

mswilkison and others added 4 commits June 19, 2026 00:12
…egistration (7.3 PR2b-1.5, Codex P2)

Folds Codex's review of PR #4085. Gemini approved; Codex found 3 P2s -- P2-1 is a
real fracture I introduced when threading per-member deps.

P2-1 (partial-activation fracture): commit 2 made the SELECTOR's activation gate
per-member (RoastRetryActiveForMember). A multi-seat operator with member A
registered and member B not would then drive A via the transition (ROAST) and B via
the legacy shuffle for the SAME attempt -> divergent included sets -> fracture. The
legacy-fallback sentinel is only safe when UNIFORM. Fix: the selector's
ROAST-vs-legacy decision is now PROCESS-level (RoastRetryActive, any-entry), and a
missing per-member coordinator (partial registration -- a wiring bug) FAILS CLOSED,
never legacy (a legacy fallback for the unregistered seat while a sibling selects
from the transition would split the included set). The per-member loop/observe gates
stay as-is: an unregistered seat cannot observe and fails closed at the selector, so
it never selects a divergent set.

P2-2 (stale deps on reject): RegisterRoastRetryCoordinatorForMember now DELETES any
existing entry when it rejects a registration (member 0 or SelfMember mismatch), so
a bad re-registration deactivates the seat (fail-safe to legacy) instead of leaving
stale deps active.

P2-3 (coarse submitSnapshotIfActive any-entry): NOT folded -- pre-existing Phase-4
coarse evidence path (the session-keyed handle registry is not member-keyed, and no
live binding producer is wired, so it is inert). A partial fix (per-member
coordinator only) would pair it with a still-session-keyed sibling handle and be
worse; the full fix needs the session-handle registry member-keyed too -- a separate
scope and a coarse-path prod-wiring prerequisite. Gemini concurred it is out of
scope here.

Tests: PartialRegistrationFailsClosed (member 1 registered, select member 2 ->
fail closed, not legacy); RejectDropsExistingEntry. Verified: gofmt; build 4 combos;
full pkg/frost/signing 3 producer states + -race; full pkg/tbtc tagged 207s +
frost_roast_retry-only 146.9s + default 146.9s (the earlier ROAST_ONLY/DEFAULT
failures were timing flakes -- both pass clean on re-run).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
….5, Codex re-review P2)

submitSnapshotIfActive is not yet member-aware: it looks up deps via the
any-entry RegisteredRoastRetryCoordinator and keys the drive handle by sessionID
alone. Under a multi-seat operator that would attribute one local seat's evidence
to an arbitrary sibling (any-entry deps) and collide their drive handles
(sessionID-keyed binding) -- Codex's re-review P2-A/P2-B.

The session-handle binding cannot be cleanly member-keyed in this PR: binding
validation reads the member-independent attempt context, so keying handles by
member would split that read. The coarse/drive evidence is also inert in 1b/1.5
(no production caller submits; it is consumed only by the PR2b-2 blame bridge).

So guard the path: a new registeredRoastRetryMemberCount() (real under
frost_roast_retry, stubbed 0 in the default build) lets submitSnapshotIfActive
no-op when more than one local seat is registered. This fails SAFE (disabling an
inert path) rather than mis-attributing evidence; single-seat is unchanged. The
full member-aware coarse/drive path (member-keyed deps + handle) is deferred to
PR2b-2, where this evidence is actually consumed.

Verification: gofmt clean; builds green across default / frost_native /
frost_roast_retry / frost_native+frost_roast_retry / frost_native+frost_tbtc_signer
cgo; pkg/frost/signing green across all 4 producer states + -race.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-1.5, Codex re-review P2)

BeginOrchestrationForSession is member-aware (RegisteredRoastRetryCoordinatorForMember),
but when THIS seat was unregistered it returned the ErrNoRoastRetryCoordinatorRegistered
legacy-fallback sentinel unconditionally. For a multi-seat operator that registered some
but not all seats, the unregistered seat would then run the coarse/legacy primitive while
registered sibling seats drive bound ROAST messages -- mixed bound/unbound for the same
attempt, a group fracture (Codex re-review of #4085).

The legacy-fallback sentinel is only safe as a UNIFORM, group-wide decision: i.e. when NO
seat is registered anywhere (registeredRoastRetryMemberCount() == 0). When a sibling seat
IS registered, advertising the fallback for an unregistered seat is non-uniform within the
operator and fractures the attempt, so fail CLOSED with a non-sentinel error (the
dispatcher treats it as terminal and aborts Execute rather than falling through to coarse).

Also fail closed for FULLY-registered multi-seat (count > 1): the session-handle binding
(SetCurrentAttemptHandleForSession) is still keyed by sessionID alone, so two local seats
in one attempt would collide. Member-keyed handles are deferred to PR2b-2; until then any
multi-seat operator fails closed rather than mis-bind. Mirrors the coarse evidence path's
multi-seat guard (submitSnapshotIfActive) in this same PR. Single-seat is unchanged.

Tests: partial multi-seat (Codex's case) and fully-registered multi-seat both fail closed
with a non-sentinel error and create no session binding.

Verified: gofmt clean; builds green on default / frost_native / frost_roast_retry /
frost_native+frost_roast_retry; pkg/frost/signing green across all 4 producer states + -race.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… (7.3 PR2b-1.5, Codex re-review P2)

The multi-seat fail-closed guard in BeginOrchestrationForSession returned a bare
non-sentinel error. That error flows through the tBTC executor into signingRetryLoop
as a signingAttemptFn error, which the loop treats as a FAILED ATTEMPT: it drives
transitionController.OnAttemptFailed, reportAttemptOutcome(false), and retries
(`continue`). Because multi-seat is a STATIC condition no future attempt can resolve,
the node would spin futile attempts until the signing timeout and synthesize garbage
failed-attempt transitions, instead of failing closed (Codex re-review of #4085).
Coarse fallback is not an option: interactive<->coarse mixing fractures the group.

Fix (Codex+Gemini consult -> Option B, terminal error classification):
- Add an exported ErrTerminalSigningFailure sentinel in pkg/frost/signing, documented
  as the THIRD disposition in the orchestration error taxonomy: STATIC -> coarse
  fallback; RUNTIME -> no fallback but RETRY (may be transient); TERMINAL -> ABORT the
  retry loop (static, futile to retry, unsafe to fall back).
- BeginOrchestrationForSession wraps ErrTerminalSigningFailure for both multi-seat
  fail-closed branches (count>1, and the partial count==1 unregistered-seat case);
  count==0 stays the STATIC ErrNoRoastRetryCoordinatorRegistered fallback sentinel.
- signingRetryLoop matches it via errors.Is and exits immediately (return nil, err)
  BEFORE the retry/transition machinery -- no OnAttemptFailed pollution, no spin. The
  loop only classifies retry disposition; FROST owns the terminal classification.

Tests: orchestration fail-closed tests assert the terminal classification; a new
signing_loop test proves a terminal attempt error aborts after exactly one attempt
(no retry), while the existing RUNTIME signing-error case still retries.

Verified: gofmt clean; builds green on default / frost_native / frost_roast_retry /
frost_native+frost_roast_retry; pkg/frost/signing green across all 4 producer states
+ -race; full pkg/tbtc default suite green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mswilkison mswilkison merged commit 2cc36ef into feat/frost-schnorr-migration-scaffold Jun 19, 2026
15 of 16 checks passed
@mswilkison mswilkison deleted the feat/frost-7.3-multiseat-per-seat-deps branch June 19, 2026 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant