Skip to content

Commit 760bc01

Browse files
AIQnetLabclaude
andcommitted
fix: v27 — gate quorum vote-collection on deterministic N-2 committee (HOLE1-5)
HOLE5: handle_timeout_vote / ProducerVote / aggregated-TC reject non-committee voters via a single deterministic_eligible_ids() source so the BFT threshold numerator and denominator share one set. Fixes the permanent halt after super-node registration (non-committee votes desynced view-change -> fork at h=53731). HOLE1: deterministic ML-DSA-65 identity from mnemonic + pinned GENESIS_CONSENSUS_PKS + fail-closed boot KAT. HOLE2: two-sided NTP-anchored block-timestamp clamp. HOLE3: rollback-aware recent-microblock read-through cache. HOLE4: verify->apply forward-progress range-sync escalation. Trim oversized inline comment banners to essence across the tree. cargo check --workspace + release build clean; 147/147 lib tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 7fb853a commit 760bc01

24 files changed

Lines changed: 1961 additions & 6393 deletions

File tree

Cargo.lock

Lines changed: 13 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

core/qnet-consensus/src/commit_reveal.rs

Lines changed: 37 additions & 148 deletions
Original file line numberDiff line numberDiff line change
@@ -351,52 +351,18 @@ impl CommitRevealConsensus {
351351
// v15.15: idempotent path now UPGRADES participants when the existing
352352
// round was auto-created from an early-arriving commit.
353353
if self.rounds.contains_key(&round_number) {
354-
// ═══════════════════════════════════════════════════════════════════
355-
// v15.15: PARTICIPANTS UPGRADE on idempotent re-entry.
356-
//
357-
// Why this matters:
358-
// `process_commit` auto-creates a round entry when a commit
359-
// arrives BEFORE the local node has called start_round_at_height
360-
// for that round (race between P2P delivery and local consensus
361-
// tick). The auto-create path uses `vec![commit.node_id.clone()]`
362-
// as a stub participants list — i.e. participants.len() == 1.
363-
//
364-
// When the producer-side macroblock loop later calls
365-
// start_round_at_height with the FULL committee
366-
// (genesis_node_count() = 5, or VRF committee up to MAX_VALIDATORS
367-
// = 1000), the previous version returned without updating the
368-
// stub. That stub then propagated into finalize_round_by_number,
369-
// which derives the canonical 2f+1 byzantine threshold from
370-
// `participants.len().max(commits.len())`. With participants=1
371-
// and commits=2 the threshold collapsed to 2 — far below the
372-
// real 2f+1 of 4 (genesis) or 668 (1000-validator committee).
373-
// Producer then finalized macroblocks with sub-quorum data which
374-
// the validator-side strict 2f+1 check rejected, halting the
375-
// chain.
376-
//
377-
// Fix:
378-
// When the new participants list is larger than the existing
379-
// stub, replace the participants vector. We only ever GROW the
380-
// list — never shrink — so a later attempt with an incomplete
381-
// peer view cannot weaken an already-validated committee.
382-
//
383-
// Correctness:
384-
// * Participants is metadata used to derive the 2f+1 threshold;
385-
// swapping it does not invalidate already-collected commits
386-
// or reveals (those are signature-bound to round_number,
387-
// not to the participants list).
388-
// * Authoritative committee_size for genesis epochs (mb_idx ≤ 2)
389-
// is `genesis_node_count()` baked into the binary. For mb_idx ≥ 3
390-
// it's the N-2 macroblock's eligible_producers snapshot.
391-
// Both sources are chain-anchored and identical across honest
392-
// nodes — the upgrade does not introduce non-determinism.
393-
//
394-
// Scalability:
395-
// At 1000-validator committees the participants vector is ~64KB
396-
// (1000 × ~64-byte node_ids). Swap is a Vec move — O(1).
397-
// Happens once per round at most. No locking beyond the existing
398-
// `&mut self` borrow on the consensus engine.
399-
// ═══════════════════════════════════════════════════════════════════
354+
// v15.15: participants upgrade on idempotent re-entry. process_commit
355+
// auto-creates a round with a STUB participants vec![commit.node_id]
356+
// (len 1) when a commit arrives before local start_round_at_height
357+
// (P2P-vs-tick race). If not upgraded the stub flows into
358+
// finalize_round_by_number which derives 2f+1 from
359+
// participants.len().max(commits.len()) → threshold collapses (e.g. 2
360+
// « real 4/668) → producer finalizes sub-quorum → validator 2f+1
361+
// rejects → halt. Fix: when the new committee is LARGER, replace
362+
// (grow-only — a later incomplete peer view can't weaken a validated
363+
// committee). Safe: commits/reveals are sig-bound to round_number, not
364+
// participants; committee_size chain-anchored (genesis_node_count mb<=2
365+
// / N-2 snapshot mb>=3) → deterministic. O(1) Vec move, once/round.
400366
if let Some(existing) = self.rounds.get_mut(&round_number) {
401367
if existing.participants.len() < participants.len() {
402368
let prev = existing.participants.len();
@@ -1080,40 +1046,15 @@ impl CommitRevealConsensus {
10801046
})
10811047
}
10821048

1083-
/// PRODUCTION v2.32: DETERMINISTIC + UNPREDICTABLE leader selection
1084-
/// ═══════════════════════════════════════════════════════════════════════════
1085-
///
1086-
/// PROBLEM (v2.30): reveal_data varies between nodes → FORK!
1087-
/// PROBLEM (v2.31): No beacon → leader predictable (DoS risk)
1088-
///
1089-
/// SOLUTION (v2.32):
1090-
/// - Use randomness_beacon from MacroBlock N-2 as entropy source
1091-
/// - Beacon is accumulated reveal_data from previous epochs
1092-
/// - Unpredictable until N-2 finalized, then deterministic for all nodes
1093-
/// - Fallback to Genesis seed for first 2 epochs (no N-2 yet)
1094-
///
1095-
/// ENTROPY SOURCES (all deterministic across nodes):
1096-
/// 1. prev_randomness_beacon (from MacroBlock N-2) - unpredictable!
1097-
/// 2. round_number (same on all nodes)
1098-
/// 3. sorted participant list (same on all nodes)
1049+
/// Deterministic + unpredictable leader selection.
10991050
///
1100-
/// SCALABILITY: Works with 1000 validators per round
1101-
/// ═══════════════════════════════════════════════════════════════════════════
1102-
/// PRODUCTION v2.40.3: XOR-based leader selection with CURRENT epoch entropy
1103-
/// ═══════════════════════════════════════════════════════════════════════════
1104-
///
1105-
/// PROBLEM (v2.32-v2.40.2): Beacon N-2 is PUBLIC after MacroBlock N-2 finalized!
1106-
/// → Leader for epoch N is PREDICTABLE → DDoS attack possible!
1107-
///
1108-
/// SOLUTION (v2.40.3): Use CURRENT reveals as PRIMARY entropy source
1109-
/// 1. current_beacon = XOR(all reveal nonces in THIS round) - UNPREDICTABLE!
1110-
/// 2. prev_beacon (N-2) = historical entropy accumulation
1111-
/// 3. Combined: hash(current_beacon, prev_beacon, round, participants)
1112-
///
1113-
/// SECURITY: Leader cannot be predicted until ALL reveals are collected!
1114-
/// - Even if attacker knows 4/5 reveals, the 5th reveal changes the beacon
1115-
/// - 1-bit bias attack possible (last revealer) but not practical for leader selection
1116-
/// ═══════════════════════════════════════════════════════════════════════════
1051+
/// Entropy = hash(current_beacon, prev_beacon, round, sorted_participants)
1052+
/// where current_beacon = XOR(all reveal nonces in THIS round) is the
1053+
/// primary source. A public N-2 beacon alone makes the leader predictable
1054+
/// (DDoS); using current reveals means the leader cannot be predicted
1055+
/// until ALL reveals are collected (last revealer has a 1-bit bias only,
1056+
/// impractical for leader selection). Deterministic across nodes once
1057+
/// reveals are in. Works at 1000 validators/round.
11171058
fn select_leader(&self, reveals: &HashMap<String, Reveal>) -> Option<String> {
11181059
if reveals.is_empty() {
11191060
return None;
@@ -1621,42 +1562,18 @@ impl CommitRevealConsensus {
16211562
return None;
16221563
}
16231564

1624-
// ═══════════════════════════════════════════════════════════════════════
1625-
// v15.2: ROUND-ROBIN LEADER ROTATION — mirrors the microblock producer
1626-
// rotation and the macroblock initiator picker in `should_initiate_consensus`.
1627-
//
1628-
// Formula:
1565+
// Round-robin leader rotation (same model as the microblock producer
1566+
// and macroblock initiator pickers — one rotation model, identical
1567+
// failover across both tiers):
16291568
// base_idx = SHA3-512(entropy ‖ height ‖ sorted_participants) % N
1630-
// leader = sorted_participants[ (base_idx + round) % N ]
1631-
//
1632-
// Why this replaced the previous hash-with-round approach:
1633-
// The old compute mixed `round` INTO the hash input, which meant every
1634-
// view-change gave a fresh random pick from N candidates. A dead or
1635-
// partitioned validator could be re-selected multiple rounds in a
1636-
// row with probability 1/N each round — livelock when that node
1637-
// kept being picked. Round-robin advances by exactly one slot per
1638-
// view-change, so after N rounds every candidate has had its turn.
1639-
// Guaranteed progress even against hostile leader hashing.
1640-
//
1641-
// Symmetry: matches
1642-
// * `select_microblock_producer_with_round` at the microblock layer
1643-
// * `should_initiate_consensus` at the macroblock initiator layer
1644-
// Three leader decisions, one rotation model. Identical failover
1645-
// guarantees across both consensus tiers.
1646-
//
1647-
// Safety:
1648-
// * `base_idx` derives only from on-chain/entropy inputs shared by
1649-
// every honest node at the same (height, participants, beacon).
1650-
// * `round` comes from `HIGHEST_CERTIFIED_ROUND[mb]`, advanced only
1651-
// by 2f+1 Dilithium3-signed TimeoutVotes, so no ≤ f adversary can
1652-
// skew it.
1653-
// * Sorted participants list is the same canonical committee view
1654-
// used by the initiator picker.
1655-
//
1656-
// Scalability: O(N) hash prep + O(1) modular arithmetic. At the
1657-
// MAX_VALIDATORS=1000 committee cap this is sub-millisecond. No
1658-
// allocation per round beyond the one-time sorted participant vector.
1659-
// ═══════════════════════════════════════════════════════════════════════
1569+
// leader = sorted_participants[(base_idx + round) % N]
1570+
// Replaces the old hash-with-round: mixing round INTO the hash re-picked
1571+
// a dead validator w.p. 1/N every view-change → livelock. Round-robin
1572+
// advances exactly one slot per view-change → every candidate has a
1573+
// turn within N rounds (progress even against hostile hashing).
1574+
// Safety: base_idx from shared on-chain entropy; round from
1575+
// HIGHEST_CERTIFIED_ROUND[mb], advanced only by 2f+1 signed votes
1576+
// (no ≤f skew). O(N) hash prep + O(1) arithmetic.
16601577

16611578
use sha3::{Sha3_512, Digest};
16621579
let mut hasher = Sha3_512::new();
@@ -1717,40 +1634,12 @@ impl CommitRevealConsensus {
17171634
Some(selected_leader)
17181635
}
17191636

1720-
// ════════════════════════════════════════════════════════════════════════
1721-
// v15.10 STAGE-2B: SHARD-AWARE LEADER COMPUTATION
1722-
// ────────────────────────────────────────────────────────────────────────
1723-
// Wraps `compute_leader_for_round` with optional shard awareness. When
1724-
// the global `ShardCommitteeCache` carries an assignment for the
1725-
// requested epoch, the sub-committee for `shard_id` is consulted
1726-
// instead of the global participants list — every shard runs
1727-
// round-robin leader rotation INDEPENDENTLY within its own
1728-
// committee, which is the precondition for parallel per-shard
1729-
// microblock production once Stage-2B activates fully.
1730-
//
1731-
// FALLBACK PATH
1732-
// ────────────────────────────────────────────────────────────────────────
1733-
// When the cache holds no assignment (the canonical case before
1734-
// Stage-2B activation) OR the assignment carries `num_shards == 1`
1735-
// (single-shard configuration), the call delegates straight to
1736-
// `compute_leader_for_round` with the supplied global participants.
1737-
// This means call sites can adopt the shard-aware API today and
1738-
// get bit-for-bit identical behaviour to the legacy path until
1739-
// operators bump `num_shards`.
1740-
//
1741-
// VALIDATOR-ONLY PARTICIPANTS
1742-
// ────────────────────────────────────────────────────────────────────────
1743-
// The `participants` slice MUST contain only Genesis + Super node
1744-
// ids. Light wallets are HTTP-API clients, never validators, never
1745-
// appear here. The committee assignment honours the same invariant
1746-
// — see `assign_committees`.
1747-
//
1748-
// SCALABILITY (1 000+ super-node committees, 256 shards)
1749-
// ────────────────────────────────────────────────────────────────────────
1750-
// At the cap (1 000 validators ÷ 256 shards ≈ 4 validators per
1751-
// shard), per-shard sort + hash is sub-millisecond. The cache
1752-
// lookup is a single parking_lot RwLock read; no allocation on
1753-
// the hot path.
1637+
// Shard-aware wrapper over compute_leader_for_round: if the
1638+
// ShardCommitteeCache has an assignment for the epoch, leader rotation
1639+
// runs independently within shard_id's sub-committee; otherwise (no
1640+
// assignment or num_shards==1) it delegates to the global path with
1641+
// bit-for-bit identical behaviour. `participants` MUST be Genesis+Super
1642+
// ids only (no light wallets) — same invariant as assign_committees.
17541643
pub fn compute_shard_aware_leader_for_round(
17551644
&self,
17561645
height: u64,

0 commit comments

Comments
 (0)