Skip to content

Commit 832d529

Browse files
committed
docs(rfc): lock RFC-21 Phase-3 design decisions
Promotes the resolved Phase-3 design decisions (settled in the 2026-05-22 cross-team review) from the Open Questions section into a dedicated Resolved Decisions section. Four targeted edits: 1. Cross-process coordinator agreement -- replaces the all-to-all-with-local-union recommendation (which silently assumed synchronous gossip) with coordinator-proposed aggregation on a dedicated topic, signed with the operator key, with receiver-side bundle verification for censorship detection. Documents the rejected alternatives and the liveness/safety properties. 2. AttemptSeed source -- the DkgGroupPublicKey input to the seed derivation comes from the FFI signer material at attempt construction time, not from a wallet registry lookup. Removes hot-path async coupling and respects layering between core signing and application state. 3. SelectCoordinator seed bridging -- BeginAttempt wraps the legacy int64-seeded SelectCoordinator with a sterile, named adapter that folds the new [32]byte AttemptSeed into the legacy parameter shape. Bridge is exhaustively tested so later edits cannot accidentally desynchronise it. 4. Silence-parking transience -- Layer B exclusion policy now states explicitly that silence-based parking is single-attempt only with no escalation, so a peer falsely labelled silent (late delivery, coordinator censorship) is reinstated by the very next attempt. Permanent exclusion only follows from overflow or non-transport reject events, neither of which can fire on a slow-but-honest peer. Also: removes a stale "(see open question 1)" reference in Layer A, and adds compact decision blocks for the remaining Phase-3 questions (signer-material binding, key reuse, JSON format, message-size budget). Open questions reduced to three: persistence across restart (Phase 5+), FFI surface guidance (follows L5 pattern from PR #425 / #3961), and AttemptContextHash backward-compat horizon (Phase 6+). No code changes. Implementation PRs reference these decisions in their descriptions.
1 parent 266272c commit 832d529

1 file changed

Lines changed: 180 additions & 38 deletions

File tree

docs/rfc/rfc-21-roast-coordinator-retry-and-transition-evidence.adoc

Lines changed: 180 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,14 @@ session inputs; it is never chosen, only derived. Any signer can
198198
recompute it from the session header and verify the coordinator's
199199
participant selection.
200200

201+
*`DkgGroupPublicKey` source.* The runtime extracts `DkgGroupPublicKey`
202+
from the FFI signer material at attempt construction time -- the same
203+
material that already carries the DKG-validated group public key and is
204+
required at signature-verification time anyway. Do not re-read it from
205+
the wallet registry: the FFI material is the canonical hot-path source,
206+
removes async/DB lookup latency, and preserves separation between the
207+
core signing protocol and application state.
208+
201209
=== Layer A: Receiver transition evidence (M4)
202210

203211
The three `select { default }` drops become:
@@ -248,8 +256,9 @@ type categoryQuota struct {
248256
The point is to produce a fixed-size attestation, not to log
249257
everything forever. Per-attempt evidence is at most
250258
`O(|IncludedSet| * sum(quotas))` bytes -- bounded, predictable, and
251-
small enough to be signed and broadcast as a single message
252-
(see open question 1).
259+
small enough to be signed and broadcast as a single message. The
260+
broadcast mechanism is the coordinator-aggregated `TransitionMessage`
261+
defined in the Resolved decisions section.
253262

254263
=== Layer B: Coordinator state (joining M4 and M7)
255264

@@ -278,6 +287,26 @@ type Coordinator interface {
278287
context from the previous attempt's evidence. It is deterministic given
279288
`(AttemptContext, TransitionEvidence)` -- two coordinators with the same
280289
verified inputs agree on the next attempt without further coordination.
290+
291+
The verified-inputs requirement is critical: gossip is eventually
292+
consistent, but `NextAttempt` is a synchronous state transition. Two
293+
honest signers fed differently-timed evidence sets produce divergent
294+
contexts. To prevent that, the *evidence input itself* is an
295+
authoritative `TransitionMessage` produced by the current attempt's
296+
coordinator (the "coordinator-aggregation" model defined in the
297+
Resolved decisions section); see that section for the full
298+
agreement-flow specification.
299+
300+
*Seed-bridging.* The legacy `pkg/frost/roast/coordinator.go::SelectCoordinator`
301+
helper accepts an `int64` seed plus an attempt number. `BeginAttempt`
302+
wraps it with a sterile bridge that folds the new `[32]byte`
303+
`AttemptSeed` into the legacy parameter shape -- for example, taking
304+
the first 8 bytes as a big-endian `int64`. The bridge is a
305+
non-cryptographic adapter for the deterministic shuffle: equivalent
306+
seed bytes must produce the same legacy `int64` on every honest
307+
signer. The bridge is named, isolated, and exhaustively tested so
308+
later edits cannot accidentally desynchronise it.
309+
281310
The exclusion policy is:
282311

283312
. Senders with `OverflowCount >= overflowExclusionThreshold` during the
@@ -286,7 +315,14 @@ The exclusion policy is:
286315
reasons are moved to `ExcludedSet` (validation blamable).
287316
. Senders with deadline-expiry only -- silent peers -- are moved to a
288317
*parked* set that the next attempt skips but the attempt after that
289-
retries (to tolerate transient outages).
318+
retries (to tolerate transient outages). Silence parking is
319+
*strictly transient*: a single attempt's worth of skip, no escalation.
320+
A peer falsely labelled silent because their contribution arrived
321+
late (or because a malicious coordinator censored it) is not
322+
permanently penalised -- they are reinstated by the very next
323+
attempt. Permanent exclusion only follows from overflow or non-
324+
transport reject events, neither of which can fire on a slow-but-
325+
honest peer.
290326
. If `IncludedSet` minus exclusions drops below the threshold `t`, the
291327
coordinator returns `ErrAttemptInfeasible` and the session is
292328
declared failed for this signer set.
@@ -404,42 +440,148 @@ choices in their PR descriptions and reviews.
404440
only when the supporting evidence is attached. The RFC does not
405441
promise an early flip.
406442

407-
== Open questions
443+
== Resolved decisions
444+
445+
The decisions in this section were settled in a Phase-3 design review
446+
(2026-05-22) with cross-team protocol-owner input. They are listed
447+
here so subsequent implementation PRs can reference them.
448+
449+
=== Cross-process coordinator agreement
450+
451+
*Decision: coordinator-proposed aggregation on a dedicated topic,
452+
signed with the operator key, with receiver-side bundle verification
453+
for censorship detection.*
454+
455+
The earlier draft of this RFC carried "all-to-all signed-evidence
456+
gossip with local union" as the recommended path. That recommendation
457+
silently assumed gossip is synchronously consistent across the signer
458+
set; in practice gossip is eventually consistent, so two honest
459+
signers can hold divergent evidence sets at the moment the attempt
460+
times out. Applying the deterministic `NextAttempt` function to
461+
divergent inputs produces divergent next-attempt contexts and
462+
fractures the signing group.
463+
464+
The replacement flow is:
465+
466+
. *Observation.* Each signer's `EvidenceRecorder` (Phase 2)
467+
produces a per-attempt local-evidence snapshot.
468+
. *Submission.* Each signer signs its snapshot with its operator
469+
key (the same key `pkg/net` already uses to attribute network
470+
messages) and broadcasts it on a dedicated evidence topic.
471+
. *Aggregation.* The current attempt's elected coordinator
472+
(the deterministic `SelectCoordinator` output) collects the
473+
signed snapshots, builds a canonical bundle, signs the bundle,
474+
and broadcasts it as a `TransitionMessage`.
475+
. *Verification.* Every receiver validates the bundle's
476+
coordinator signature, validates each contained snapshot's
477+
operator signature, *and verifies that its own observations
478+
appear in the bundle*. A coordinator that omits an honest
479+
peer's signed snapshot is caught here.
480+
. *Transition.* Receivers feed the verified bundle into
481+
`NextAttempt`. Because the bundle is the authoritative input,
482+
all honest receivers compute the same next-attempt context.
483+
484+
A peer that signs conflicting snapshots is slashable -- the
485+
signature is the binding. A coordinator that signs an inconsistent
486+
bundle (omits observations, alters counts, etc.) is detected at
487+
verification step (4) and the next-attempt coordinator handles the
488+
exclusion.
489+
490+
Alternatives considered (rejected):
491+
492+
. *All-to-all signed-evidence gossip with local union.* Original
493+
recommendation. Rejected because gossip's eventual-consistency
494+
semantics let honest signers reach the deterministic
495+
`NextAttempt` boundary with divergent inputs, producing
496+
divergent outputs.
497+
. *Piggy-back on existing FROST broadcast channel.* Rejected
498+
because it couples evidence rate limits to protocol round-trip
499+
rate limits, and re-uses a topic with different traffic
500+
characteristics.
501+
. *Coordinator-only authoritative without aggregation.* Rejected
502+
because losing the all-signer signed attestations also loses
503+
the audit trail. The aggregation model keeps the per-signer
504+
signatures inside the bundle, so the audit trail survives.
505+
506+
Liveness: a malicious coordinator can withhold the
507+
`TransitionMessage`, stalling the transition. ROAST handles this
508+
the same way it handles a malicious signer: the attempt times
509+
out, the next attempt elects a different coordinator (the
510+
`SelectCoordinator` output is deterministic but rotates with the
511+
attempt number), and the new coordinator drives the transition.
512+
The malicious coordinator's evidence is itself parked or
513+
excluded by the new coordinator's bundle, ending the loop.
514+
515+
Safety: any honest signer that verifies a bundle and computes
516+
`NextAttempt(ctx, bundle)` produces the same context as any other
517+
honest signer that verifies the same bundle. Safety reduces to
518+
"is the bundle correctly verified" -- a local check, not a
519+
network-consistency requirement.
520+
521+
This design satisfies the formal verified-inputs requirement of
522+
the deterministic `NextAttempt` policy specified in Layer B.
523+
524+
=== Source of `DkgGroupPublicKey` for the seed
525+
526+
*Decision: extract from FFI signer material at attempt construction.*
527+
528+
The DKG-validated group public key is already present in the FFI
529+
signer material (it is required at signature-verification time
530+
anyway), so the seed derivation can take it from there. The
531+
wallet registry is *not* consulted on the hot path; doing so
532+
would introduce async lookup latency and entangle the core
533+
signing protocol with application state. See Shared types above
534+
for the derivation contract.
535+
536+
=== `AttemptContext` ↔ `NativeExecutionFFISigningRequest` binding
537+
538+
*Decision: extend the request struct with an `AttemptContext`
539+
field; the context is Go-side orchestration only.*
540+
541+
The context does not cross the CGO/Rust boundary into the
542+
`tbtc-signer` engine -- the engine remains a pure signing
543+
primitive. Go-side coordinator wiring populates the context;
544+
existing call sites construct attempt-zero contexts inline
545+
during Phase 4.
546+
547+
=== `SelectCoordinator` retention
548+
549+
*Decision: keep the existing helper; bridge the seed type inside
550+
`BeginAttempt`.*
551+
552+
The deterministic shuffle is correct in isolation. The bridge
553+
folds the new `[32]byte` `AttemptSeed` into the legacy `int64`
554+
parameter shape with a sterile, named adapter (see Layer B).
555+
556+
=== Evidence-signing key
557+
558+
*Decision: reuse the existing operator key.*
559+
560+
The operator key already binds every other gossip message a
561+
keep-core node emits via `pkg/net`. Layering a second key
562+
surface specifically for evidence signing is premature
563+
optimization given the current key model.
564+
565+
=== Evidence message format
566+
567+
*Decision: JSON payload wrapped in the existing `pkg/net/gen/pb`
568+
envelope, routed via the `net.Message` interface.*
569+
570+
This matches the FROST/tbtc-signer protocol messages (Phase 1B)
571+
and inherits the network layer's operator-key signing
572+
automatically. Raw JSON does not appear on the wire.
573+
574+
=== Maximum evidence-message size
575+
576+
*Decision: single `TransitionMessage` per transition; no
577+
chunking.*
578+
579+
Under coordinator-aggregation, the per-transition payload is
580+
`O(N)` not `O(N^2)`. At a 100-signer group with all four
581+
quotas saturated the JSON-encoded bundle is ~10-20 KiB,
582+
comfortably within libp2p's per-message limits.
408583

409-
. *Cross-process coordinator agreement.* Today each signer runs its own
410-
process; the coordinator state machine is per-process. We assume
411-
that two honest signers, fed the same `TransitionEvidence` from a
412-
shared gossip layer, produce the same `NextAttempt`. Without
413-
agreement on the evidence input, the deterministic function still
414-
produces divergent outputs -- node A excludes peer X (saw overflow),
415-
node B does not (didn't), and the next-attempt sets disagree. This
416-
defeats the whole point of the layered design.
417-
+
418-
*Recommended path (signed-evidence gossip):* every observer signs the
419-
evidence it produced with its operator key and broadcasts the
420-
attestation on a dedicated evidence topic. Honest signers feed only
421-
*verified attestations* into the deterministic
422-
`NextAttempt`, taking the union over signed observations and applying
423-
the same exclusion thresholds. Two honest signers thus consume the
424-
same input set and produce the same output. A peer that signs
425-
conflicting evidence is itself slashable -- the signature is the
426-
binding.
427-
+
428-
Options considered:
429-
.. Piggy-back on existing FROST broadcast channel -- simplest but
430-
couples evidence to protocol round-trips and re-uses a topic with
431-
different rate-limit characteristics.
432-
.. *Dedicated evidence broadcast topic with signed attestations
433-
(recommended).* Cleaner separation, more wiring; the wiring is
434-
what the design owes the protocol.
435-
.. Coordinator-only authoritative -- only the elected coordinator
436-
produces evidence and other signers verify but don't recompute.
437-
Closest to the paper but loses redundancy.
438-
+
439-
The recommendation is the recommended *entering* Phase 3. The final
440-
decision is still owed and is the question that most needs
441-
design-time review with threshold-network/keep-core protocol owners
442-
before Phase 3 lands.
584+
== Open questions
443585

444586
. *Persistence across signer restart.* If a signer crashes mid-attempt,
445587
does it lose its evidence? The paper assumes persistent state. For

0 commit comments

Comments
 (0)