Skip to content

Commit 5fa67b1

Browse files
authored
hardening(frost/roast): verifiable blame — accuser quorum for exclusion; overflow can only park (#4029)
Stacked on #3866 (base: `feat/frost-schnorr-migration-scaffold`). Implements item 2 of the review feedback on the FROST/ROAST stack (verifiable blame, not counted blame). ## Problem `NextAttempt` permanently excluded members on unverifiable observer counters — reject/conflict threshold 1, overflow 4 — **summed across observers and across reasons**, so a single byzantine observer fabricating counters could permanently exclude any honest member, and by repetition grind the included set below threshold (`ErrAttemptInfeasible`). That inverts ROAST's robustness guarantee: the design's whole point is liveness with t honest of n, and the blame layer handed a liveness-and-membership veto to any single member. Bundle evidence (`OverflowEntry`/`RejectEntry`/`ConflictEntry`) is observer-signed *claims* — nothing in a counter lets a third party re-check that the accused misbehaved. ## What changed **Accuser-quorum gate (`ExclusionAccuserQuorum`).** An accusation only produces action when made by at least `f+1 = groupSize − threshold + 1` distinct credible observers (f = the byzantine tolerance). At f+1, at least one accuser is honest under the protocol's own t-of-n assumption, so the group may act as if the fault were verified. Real faults reach quorum naturally — contributions are broadcast, every honest member observes the same bytes — while f colluding members can never reach f+1 by fabrication. Production shape (n=100, t=51): quorum 50 vs. 49 worst-case byzantine. **Counting hygiene.** Observers count once per accused per category regardless of claimed `Count` magnitude; reject reasons no longer multiply accusers; categories tally independently (reject + conflict claims no longer sum); only previous-`IncludedSet` members are credible accusers; accusations against non-original-set members are ignored. **Overflow can never be permanent.** Transport pressure is observable only at the transport layer and can never be made self-incriminating. An *established* (quorum-corroborated) overflow accusation now parks the member for one attempt — same transient mechanics as silence parking — instead of excluding forever. **Sub-quorum claims are ignored entirely**, not parked: acting on a single unverifiable claim would let one byzantine observer impose an attempt of liveness cost on any honest member at will. Established reject/conflict accusations still exclude permanently, and the policy remains a pure deterministic function of `(prev, bundle, threshold)`. ## Why quorum rather than self-incriminating proofs now The review's endgame is proof-carrying blame: the accused's own two operator-signed conflicting payloads (conflicts), or their signed contribution plus a re-checkable deterministic validation failure (rejects). That requires wire-format and verification-routine changes (the current bundle carries only counters). The quorum gate delivers the safety property — fabricated blame can never become permanent, and the grinding-to-infeasibility vector is closed — with no wire change, and is the correct *floor* even after proofs land (proof-verified entries can then bypass the quorum per category). RFC-21 Layer B now documents the policy, the rationale, and that roadmap; the residual cost (sub-quorum-observed faults burn retry attempts instead of excluding) is explicitly folded into the serial-attempt latency budget. ## Tests New regression coverage: quorum boundary at f vs f+1 for both permanent categories; fabricated-blame grinding across six attempts (single byzantine accuser, max counters, honest members never move); count-magnitude fabrication; cross-category non-summing; reason non-multiplication; non-credible accusers; non-original accused; established-overflow park-and-reinstate cycle; production-shape quorum pin `(100, 51) = 50`. Existing overflow/categories/soak tests updated to the new semantics. `go test ./pkg/frost/... ./pkg/tbtc/...` passes; `go vet` clean. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
2 parents b7317f2 + e0ce74b commit 5fa67b1

5 files changed

Lines changed: 789 additions & 184 deletions

docs/rfc/rfc-21-roast-coordinator-retry-and-transition-evidence.adoc

Lines changed: 71 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -307,44 +307,83 @@ seed bytes must produce the same legacy `int64` on every honest
307307
signer. The bridge is named, isolated, and exhaustively tested so
308308
later edits cannot accidentally desynchronise it.
309309

310-
The exclusion policy is:
311-
312-
. Senders with `OverflowCount >= overflowExclusionThreshold` during the
313-
attempt window are moved to `ExcludedSet` (transport blamable).
314-
. Senders with at least one confirmed reject event for non-transport
315-
reasons are moved to `ExcludedSet` (validation blamable).
310+
The exclusion policy (verifiable-blame revision) starts from one
311+
observation: every evidence entry in the bundle is an observer-signed
312+
*claim*, not a self-incriminating proof. Nothing in an overflow,
313+
reject, or conflict counter lets a third party re-check that the
314+
accused actually misbehaved. A policy that permanently excludes on
315+
unverifiable counters inverts ROAST's robustness guarantee -- a single
316+
byzantine observer could fabricate evidence against honest members
317+
and grind the `IncludedSet` toward `ErrAttemptInfeasible`. Permanent
318+
exclusion therefore requires the accusation to be *established*:
319+
corroborated by a quorum of distinct accusers large enough that, under
320+
the protocol's own t-of-n honesty assumption, at least one accuser is
321+
honest.
322+
323+
. Accusation tallying: each bundle snapshot is one observer's claim
324+
set. Only observers in the previous attempt's `IncludedSet` are
325+
credible; each observer counts at most once per accused member per
326+
category regardless of the claimed count magnitude; accusations
327+
against members outside the original signer set are ignored. An
328+
accusation is established when at least
329+
`ExclusionAccuserQuorum(originalGroupSize, threshold)
330+
= originalGroupSize - threshold + 1` (that is, `f+1`) distinct
331+
observers make it. A real fault is observed by every honest member
332+
processing the broadcast, so established faults reach the quorum
333+
naturally; fabricated ones cannot, because at most `f` members are
334+
byzantine.
335+
. Established reject or conflict accusations (validation /
336+
equivocation blamable) move the accused to `ExcludedSet`
337+
permanently. The categories are tallied independently and never
338+
summed with each other.
339+
. Established overflow accusations (transport blamable) move the
340+
accused to the *parked* set for one attempt -- never to
341+
`ExcludedSet`. Transport pressure is observable only at the
342+
transport layer and can never be made self-incriminating, so
343+
overflow may cost an attempt of liveness but not permanence.
316344
. Senders with deadline-expiry only -- silent peers -- are moved to a
317345
*parked* set that the next attempt skips but the attempt after that
318-
retries (to tolerate transient outages). Silence parking is
319-
*strictly transient*: a single attempt's worth of skip, no escalation.
320-
A peer falsely labelled silent because their contribution arrived
321-
late (or because a malicious coordinator censored it) is not
322-
permanently penalised -- they are reinstated by the very next
323-
attempt. Permanent exclusion only follows from overflow or non-
324-
transport reject events, neither of which can fire on a slow-but-
325-
honest peer.
346+
retries (to tolerate transient outages). Parking is *strictly
347+
transient*: a single attempt's worth of skip, no escalation. A peer
348+
falsely labelled silent because their contribution arrived late (or
349+
because a malicious coordinator censored it) is not permanently
350+
penalised -- they are reinstated by the very next attempt.
351+
Silence is detected as *bundle absence* (the member submitted no
352+
evidence snapshot for the transition). A member that submits its
353+
snapshot while withholding its signing contribution is therefore
354+
not parked by this policy: signing-silence is invisible to the
355+
transition layer, costs the attempt, and is bounded only by the
356+
retry budget (Annex B) until t-of-included finalize lands.
326357
. If `IncludedSet` minus exclusions drops below the threshold `t`, the
327358
coordinator returns `ErrAttemptInfeasible` and the session is
328359
declared failed for this signer set.
329360

330-
The thresholds are *fixed constants* in the initial design, picked to
331-
be evidently small relative to the per-attempt deadline and the
332-
`expectedMessagesCount*4+1` channel capacity:
333-
334-
[source,go]
335-
----
336-
const (
337-
overflowExclusionThreshold = 4 // overflow events per attempt window
338-
rejectExclusionThreshold = 1 // any confirmed non-transport reject
339-
silenceParkingThreshold = 1 // any deadline expiry parks for 1 attempt
340-
)
341-
----
342-
343-
Making them constants up-front means honest signers do not need to
344-
negotiate them. If production telemetry indicates a constant is wrong
345-
for the attempt's wall-clock bound, the change is a routine code
346-
update that ships through Phase 7's manifest gate -- not a runtime
347-
parameter that drift can desynchronise.
361+
The quorum is a *derived constant* of the key-group shape -- for the
362+
production 51-of-100 group it is 50 -- so honest signers do not need
363+
to negotiate it and drift cannot desynchronise it. Sub-quorum claims
364+
are deliberately ignored rather than parked: acting on a single
365+
unverifiable claim would let one byzantine observer cost any honest
366+
member an attempt of liveness at will.
367+
368+
*Verifiability roadmap.* Permanent exclusion on a *single* piece of
369+
evidence becomes sound once the wire format carries
370+
self-incriminating proof: for conflicts, the accused's own two
371+
operator-signed payloads with identical (attempt, sender) and
372+
different bytes; for rejects, the accused's operator-signed
373+
contribution plus a deterministic validation failure any member can
374+
re-run. When those land, the per-category quorum gate can be relaxed
375+
to proof-verified entries. The cost of the interim quorum policy is
376+
bounded: a fault observed by fewer than `f+1` honest members (for
377+
example a targeted, per-recipient equivocation) is not permanently
378+
excluded and instead burns retry attempts, which the serial-attempt
379+
latency analysis already budgets for. Near the assumption boundary
380+
the gate is intentionally demanding -- at worst-case `f` only `t`
381+
honest observers exist, so establishment needs all but `2t-n-1` of
382+
them (50 of 51 at the production shape) to have observed the fault
383+
and landed snapshots in the bundle. In that regime the quorum acts
384+
as a fabrication firewall rather than a working exclusion mechanism;
385+
restoring per-category exclusion under heavy attack is exactly what
386+
proof-carrying blame is for.
348387

349388
=== Layer C: Retry orchestration (M7)
350389

pkg/frost/roast/multi_coordinator_soak_test.go

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -265,13 +265,15 @@ func TestSoak_CleanAttemptPreservesIncludedSet(t *testing.T) {
265265
}
266266
}
267267

268-
func TestSoak_OverflowEvidenceExcludesPermanently(t *testing.T) {
268+
func TestSoak_EstablishedOverflowParksTransiently(t *testing.T) {
269269
members := []group.MemberIndex{1, 2, 3, 4, 5}
270270
nodes := newSoakHarness(t, members)
271271
prev := soakStartingContext(t, members)
272272

273-
// Four observers report 1 overflow each against member 3.
274-
// Total 4 = OverflowExclusionThreshold.
273+
// Four distinct observers report overflow against member 3:
274+
// 4 >= ExclusionAccuserQuorum(5, 3) = 3, so the accusation is
275+
// established -- but transport blame is unverifiable in
276+
// principle, so it parks transiently instead of excluding.
275277
overflow := map[group.MemberIndex][]group.MemberIndex{
276278
1: {3},
277279
2: {3},
@@ -280,8 +282,14 @@ func TestSoak_OverflowEvidenceExcludesPermanently(t *testing.T) {
280282
}
281283
next, _ := soakAttempt(t, nodes, prev, nil, overflow, 3)
282284

283-
if !containsMember(next.ExcludedSet, 3) {
284-
t.Fatalf("member 3 must be excluded; got %v", next.ExcludedSet)
285+
if !containsMember(next.TransientlyParked, 3) {
286+
t.Fatalf("member 3 must be parked; got %v", next.TransientlyParked)
287+
}
288+
if containsMember(next.ExcludedSet, 3) {
289+
t.Fatalf(
290+
"overflow must never permanently exclude; got %v",
291+
next.ExcludedSet,
292+
)
285293
}
286294
if containsMember(next.IncludedSet, 3) {
287295
t.Fatal("member 3 must not be in next IncludedSet")

0 commit comments

Comments
 (0)