Skip to content

Commit 934d1a6

Browse files
mswilkisonclaude
andcommitted
hardening(frost/roast): require accuser quorum for exclusion; overflow can only park
The NextAttempt exclusion policy permanently excluded members on unverifiable observer counters: reject/conflict threshold 1, overflow 4, summed across observers and categories so a single byzantine observer could fabricate evidence and grind honest members out of the included set toward ErrAttemptInfeasible -- inverting ROAST's robustness guarantee. Bundle evidence entries are observer-signed claims, not self-incriminating proofs, so the policy now refuses to take permanent action on any accusation that is not group-established: - ExclusionAccuserQuorum(groupSize, threshold) = f+1 distinct accusers, where f = groupSize - threshold is the byzantine tolerance. At f+1 at least one accuser is honest under the protocol's own t-of-n assumption. Real faults reach the quorum naturally because contributions are broadcast and every honest member observes them; fabricated ones cannot. - Accusers are counted distinctly (one per observer per accused per category); claimed count magnitudes are no longer summed into blame. Reject reasons no longer multiply accusers, and categories are tallied independently instead of summing into each other. - Only members of the previous IncludedSet are credible accusers; accusations against members outside the original signer set are ignored. - Established overflow accusations (transport-blamable) now park transiently instead of excluding permanently: transport pressure can never be made self-incriminating, so it may cost an attempt of liveness but not permanence. - Established reject/conflict accusations still exclude permanently. - Sub-quorum claims are ignored entirely so a single byzantine observer cannot even impose parking liveness costs. RFC-21 Layer B is updated to match, including the verifiability roadmap: once the wire format carries self-incriminating proof (the accused's own signed conflicting bytes; a re-checkable invalid contribution), single-proof exclusion becomes sound and the quorum gate can be relaxed per category. New regression coverage: quorum boundary at f vs f+1 for both permanent categories, fabricated-blame grinding across six attempts, count-magnitude fabrication, cross-category non-summing, non-credible accusers, non-original accused, established-overflow park-and- reinstate cycle, and the production-shape quorum pin (100, 51) = 50. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1 parent b7317f2 commit 934d1a6

5 files changed

Lines changed: 762 additions & 184 deletions

docs/rfc/rfc-21-roast-coordinator-retry-and-transition-evidence.adoc

Lines changed: 58 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -307,44 +307,70 @@ seed bytes must produce the same legacy `int64` on every honest
307307
signer. The bridge is named, isolated, and exhaustively tested so
308308
later edits cannot accidentally desynchronise it.
309309

310-
The exclusion policy is:
311-
312-
. Senders with `OverflowCount >= overflowExclusionThreshold` during the
313-
attempt window are moved to `ExcludedSet` (transport blamable).
314-
. Senders with at least one confirmed reject event for non-transport
315-
reasons are moved to `ExcludedSet` (validation blamable).
310+
The exclusion policy (verifiable-blame revision) starts from one
311+
observation: every evidence entry in the bundle is an observer-signed
312+
*claim*, not a self-incriminating proof. Nothing in an overflow,
313+
reject, or conflict counter lets a third party re-check that the
314+
accused actually misbehaved. A policy that permanently excludes on
315+
unverifiable counters inverts ROAST's robustness guarantee -- a single
316+
byzantine observer could fabricate evidence against honest members
317+
and grind the `IncludedSet` toward `ErrAttemptInfeasible`. Permanent
318+
exclusion therefore requires the accusation to be *established*:
319+
corroborated by a quorum of distinct accusers large enough that, under
320+
the protocol's own t-of-n honesty assumption, at least one accuser is
321+
honest.
322+
323+
. Accusation tallying: each bundle snapshot is one observer's claim
324+
set. Only observers in the previous attempt's `IncludedSet` are
325+
credible; each observer counts at most once per accused member per
326+
category regardless of the claimed count magnitude; accusations
327+
against members outside the original signer set are ignored. An
328+
accusation is established when at least
329+
`ExclusionAccuserQuorum(originalGroupSize, threshold)
330+
= originalGroupSize - threshold + 1` (that is, `f+1`) distinct
331+
observers make it. A real fault is observed by every honest member
332+
processing the broadcast, so established faults reach the quorum
333+
naturally; fabricated ones cannot, because at most `f` members are
334+
byzantine.
335+
. Established reject or conflict accusations (validation /
336+
equivocation blamable) move the accused to `ExcludedSet`
337+
permanently. The categories are tallied independently and never
338+
summed with each other.
339+
. Established overflow accusations (transport blamable) move the
340+
accused to the *parked* set for one attempt -- never to
341+
`ExcludedSet`. Transport pressure is observable only at the
342+
transport layer and can never be made self-incriminating, so
343+
overflow may cost an attempt of liveness but not permanence.
316344
. Senders with deadline-expiry only -- silent peers -- are moved to a
317345
*parked* set that the next attempt skips but the attempt after that
318-
retries (to tolerate transient outages). Silence parking is
319-
*strictly transient*: a single attempt's worth of skip, no escalation.
320-
A peer falsely labelled silent because their contribution arrived
321-
late (or because a malicious coordinator censored it) is not
322-
permanently penalised -- they are reinstated by the very next
323-
attempt. Permanent exclusion only follows from overflow or non-
324-
transport reject events, neither of which can fire on a slow-but-
325-
honest peer.
346+
retries (to tolerate transient outages). Parking is *strictly
347+
transient*: a single attempt's worth of skip, no escalation. A peer
348+
falsely labelled silent because their contribution arrived late (or
349+
because a malicious coordinator censored it) is not permanently
350+
penalised -- they are reinstated by the very next attempt.
326351
. If `IncludedSet` minus exclusions drops below the threshold `t`, the
327352
coordinator returns `ErrAttemptInfeasible` and the session is
328353
declared failed for this signer set.
329354

330-
The thresholds are *fixed constants* in the initial design, picked to
331-
be evidently small relative to the per-attempt deadline and the
332-
`expectedMessagesCount*4+1` channel capacity:
333-
334-
[source,go]
335-
----
336-
const (
337-
overflowExclusionThreshold = 4 // overflow events per attempt window
338-
rejectExclusionThreshold = 1 // any confirmed non-transport reject
339-
silenceParkingThreshold = 1 // any deadline expiry parks for 1 attempt
340-
)
341-
----
342-
343-
Making them constants up-front means honest signers do not need to
344-
negotiate them. If production telemetry indicates a constant is wrong
345-
for the attempt's wall-clock bound, the change is a routine code
346-
update that ships through Phase 7's manifest gate -- not a runtime
347-
parameter that drift can desynchronise.
355+
The quorum is a *derived constant* of the key-group shape -- for the
356+
production 100-of-51 group it is 50 -- so honest signers do not need
357+
to negotiate it and drift cannot desynchronise it. Sub-quorum claims
358+
are deliberately ignored rather than parked: acting on a single
359+
unverifiable claim would let one byzantine observer cost any honest
360+
member an attempt of liveness at will.
361+
362+
*Verifiability roadmap.* Permanent exclusion on a *single* piece of
363+
evidence becomes sound once the wire format carries
364+
self-incriminating proof: for conflicts, the accused's own two
365+
operator-signed payloads with identical (attempt, sender) and
366+
different bytes; for rejects, the accused's operator-signed
367+
contribution plus a deterministic validation failure any member can
368+
re-run. When those land, the per-category quorum gate can be relaxed
369+
to proof-verified entries. The cost of the interim quorum policy is
370+
bounded: a fault observed by fewer than `f+1` honest members (for
371+
example a targeted, per-recipient equivocation) is not permanently
372+
excluded and instead burns retry attempts, which the serial-attempt
373+
latency analysis already budgets for.
348374

349375
=== Layer C: Retry orchestration (M7)
350376

pkg/frost/roast/multi_coordinator_soak_test.go

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -265,13 +265,15 @@ func TestSoak_CleanAttemptPreservesIncludedSet(t *testing.T) {
265265
}
266266
}
267267

268-
func TestSoak_OverflowEvidenceExcludesPermanently(t *testing.T) {
268+
func TestSoak_EstablishedOverflowParksTransiently(t *testing.T) {
269269
members := []group.MemberIndex{1, 2, 3, 4, 5}
270270
nodes := newSoakHarness(t, members)
271271
prev := soakStartingContext(t, members)
272272

273-
// Four observers report 1 overflow each against member 3.
274-
// Total 4 = OverflowExclusionThreshold.
273+
// Four distinct observers report overflow against member 3:
274+
// 4 >= ExclusionAccuserQuorum(5, 3) = 3, so the accusation is
275+
// established -- but transport blame is unverifiable in
276+
// principle, so it parks transiently instead of excluding.
275277
overflow := map[group.MemberIndex][]group.MemberIndex{
276278
1: {3},
277279
2: {3},
@@ -280,8 +282,14 @@ func TestSoak_OverflowEvidenceExcludesPermanently(t *testing.T) {
280282
}
281283
next, _ := soakAttempt(t, nodes, prev, nil, overflow, 3)
282284

283-
if !containsMember(next.ExcludedSet, 3) {
284-
t.Fatalf("member 3 must be excluded; got %v", next.ExcludedSet)
285+
if !containsMember(next.TransientlyParked, 3) {
286+
t.Fatalf("member 3 must be parked; got %v", next.TransientlyParked)
287+
}
288+
if containsMember(next.ExcludedSet, 3) {
289+
t.Fatalf(
290+
"overflow must never permanently exclude; got %v",
291+
next.ExcludedSet,
292+
)
285293
}
286294
if containsMember(next.IncludedSet, 3) {
287295
t.Fatal("member 3 must not be in next IncludedSet")

0 commit comments

Comments
 (0)