Skip to content

Commit 8e90cd7

Browse files
AIQnetLabclaude
andcommitted
fix: v23.1 — BFT-certified rotation hardening + cryptographic binding
Five concrete defects in v23 closed by direct cross-file audit. Together they bring the microblock layer to canonical BFT-PoS L1 standards. 1. timeout_round cryptographically bound to block identity * MicroBlock::hash and EfficientMicroBlock::hash now include timeout_round in the SHA3-256 digest. * sign_microblock_with_dilithium / verify_microblock_signature include timeout_round in the signing payload; payload tag bumped to "Block_Sig_v23.1" (breaking, requires clean restart). * Closes the surface where a peer-relay could mutate timeout_round in transit and storage-L4 anti-fork would treat the mutated block as idempotent re-save. 2. Producer selection uses STRICT 2f+1 certified-only rotation * New get_certified_rotation_round(mb_idx) = HIGHEST_CERTIFIED_ROUND − baseline (no f+1 adopted input). * Old get_effective_rotation_round marked #[deprecated]; its max(certified, adopted) semantics re-introduced the h=556 split-brain class under partial gossip propagation. v23 had accidentally adopted that path; v23.1 reverts to v15.13's supermajority-only rule. 3. Authenticity gate on block.timeout_round (block_pipeline.rs) * Hard-reject when claimed round exceeds local_certified by more than TIMEOUT_ROUND_DRIFT_WINDOW=3. Bounds Byzantine claim inflation that could lock the rotation baseline. 4. Sticky leader within view (node.rs STICKY_LEADER_PER_VIEW) * Once a fallback (timeout_round > 0) successfully produces in leadership_round L, it sticks for the remainder of L. Replaces the v23 thrash pattern where every successful fallback was followed by a retry of the failed primary on the next height (6-sec stall per block while primary offline). Sticky lock is released only when certified advances to a different round OR the 30-block view ends. 5. Bounded memory for the two new rotation DashMaps * LAST_TIMEOUT_EMIT_PER_MB and STICKY_LEADER_PER_VIEW pruned by the existing cleanup_old_timeout_data sweep, sharing the same active-macroblock-window retention contract as the rest of the rotation state. BREAKING (signing-payload tag change): Existing peers running v22.1 or earlier will fail signature verification on v23.1-signed blocks and vice versa. Operational path is a coordinated clean restart with QNET_BOOTSTRAP_FRESH=1 on every genesis node and a wiped Explorer database. Verification: * cargo check --release --tests: 0 warnings, 0 errors. * cargo test --release tests_v23_rotation_round: 2/2 passed. * cargo build --release --bin qnet-node: exit 0, 22 MB binary. Files touched (4): core/qnet-state/src/block.rs, development/qnet-integration/src/{block_pipeline.rs, node.rs, unified_p2p.rs}. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b3bd4fe commit 8e90cd7

4 files changed

Lines changed: 729 additions & 383 deletions

File tree

core/qnet-state/src/block.rs

Lines changed: 44 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -957,14 +957,45 @@ impl MicroBlock {
957957
}
958958

959959
/// Calculate microblock hash
960+
///
961+
/// ═══════════════════════════════════════════════════════════════════════
962+
/// v23.1: TIMEOUT_ROUND INCLUDED IN HASH (consensus-relevant binding)
963+
/// ═══════════════════════════════════════════════════════════════════════
964+
/// The `timeout_round` field is consensus-relevant state — it determines
965+
/// which producer is legitimately elected at this height via the pure
966+
/// function `select_microblock_producer_with_round(h, candidates, round)`.
967+
/// It also drives `record_finalized_round` → `LAST_FINALIZED_ROUND_PER_MB`
968+
/// → `get_certified_rotation_round` for all subsequent heights in the
969+
/// macroblock window. Any field that influences consensus decisions
970+
/// MUST be cryptographically bound to the block.
971+
///
972+
/// Pre-v23.1 the field was omitted from the hash because v22 hardcoded
973+
/// it to 0 (single value, omission harmless). v23 restored real values
974+
/// (0, 1, 2, ...) for BFT-certified rotation; the omission then became
975+
/// a real attack surface: a peer-relay or man-in-the-middle could mutate
976+
/// `timeout_round` without breaking the hash → storage L4 anti-fork
977+
/// guard would silently treat the mutated block as idempotent re-save
978+
/// → divergent baseline tracking across nodes → potential leader-
979+
/// selection divergence on subsequent heights.
980+
///
981+
/// Including `timeout_round` in the hash closes that surface: any
982+
/// mutation produces a different hash → storage L4 detects it as
983+
/// equivocation (different content at same height) → reject + record
984+
/// evidence for slashing.
985+
///
986+
/// Scalability: one extra `to_le_bytes` + `hasher.update` per block.
987+
/// Negligible cost at any committee size.
988+
/// ═══════════════════════════════════════════════════════════════════════
960989
pub fn hash(&self) -> [u8; 32] {
961990
let mut hasher = Sha3_256::new();
962991
hasher.update(&self.height.to_le_bytes());
963992
hasher.update(&self.timestamp.to_le_bytes());
964993
hasher.update(&self.previous_hash);
965994
hasher.update(&self.merkle_root);
966995
hasher.update(self.producer.as_bytes());
967-
996+
// v23.1: bind timeout_round to block identity (see header above).
997+
hasher.update(&self.timeout_round.to_le_bytes());
998+
968999
let result = hasher.finalize();
9691000
let mut hash = [0u8; 32];
9701001
hash.copy_from_slice(&result);
@@ -1123,14 +1154,25 @@ impl EfficientMicroBlock {
11231154
}
11241155

11251156
/// Calculate efficient microblock hash
1157+
///
1158+
/// v23.1: Mirror of `MicroBlock::hash` — includes `timeout_round` in the
1159+
/// digest so that storage-layer hash identity matches between the full
1160+
/// `MicroBlock` and its `EfficientMicroBlock` representation. Without
1161+
/// this mirror, a block loaded as `MicroBlock` and a block loaded as
1162+
/// `EfficientMicroBlock` would produce different hashes for the same
1163+
/// on-disk bytes — breaking the storage-L4 anti-fork guard's identity
1164+
/// comparison across read paths. See `MicroBlock::hash` header for the
1165+
/// full consensus-binding rationale.
11261166
pub fn hash(&self) -> [u8; 32] {
11271167
let mut hasher = Sha3_256::new();
11281168
hasher.update(&self.height.to_le_bytes());
11291169
hasher.update(&self.timestamp.to_le_bytes());
11301170
hasher.update(&self.previous_hash);
11311171
hasher.update(&self.merkle_root);
11321172
hasher.update(self.producer.as_bytes());
1133-
1173+
// v23.1: bind timeout_round to block identity (see MicroBlock::hash header).
1174+
hasher.update(&self.timeout_round.to_le_bytes());
1175+
11341176
let result = hasher.finalize();
11351177
let mut hash = [0u8; 32];
11361178
hash.copy_from_slice(&result);

development/qnet-integration/src/block_pipeline.rs

Lines changed: 68 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1822,15 +1822,78 @@ impl BlockPipeline {
18221822
//
18231823
// Scalability: O(1) cache lookup. Identical cost at 5 or 5000 validators.
18241824
if !snap.is_syncing() && mb.height > 0 {
1825+
// ═══════════════════════════════════════════════════════════════
1826+
// v23.1: BFT-CERTIFIED ROUND AUTHENTICITY GATE
1827+
// ═══════════════════════════════════════════════════════════════
1828+
// A block claims to have been produced at rotation round
1829+
// `mb.timeout_round`. Verify the claim is plausible against
1830+
// this node's local view of supermajority-certified rounds
1831+
// for the containing macroblock.
1832+
//
1833+
// Allow a small forward drift (TIMEOUT_ROUND_DRIFT_WINDOW)
1834+
// to absorb honest gossip propagation latency — a producer
1835+
// can legitimately see 2f+1 votes for round R before this
1836+
// node's local DashMap has been updated by the same gossip
1837+
// stream. After the drift window, the claim is implausibly
1838+
// far ahead of any cert this node could ever have seen,
1839+
// so it must come from a Byzantine signer (authentic
1840+
// producer with valid Dilithium3 key but signing an
1841+
// unsupportable claim).
1842+
//
1843+
// Why this matters
1844+
// ────────────────
1845+
// After v23.1's timeout_round binding in hash+signature
1846+
// (block.rs:hash, sign_microblock_with_dilithium), the
1847+
// producer's claim is CRYPTOGRAPHICALLY ATTESTED — it
1848+
// cannot be mutated in transit. But a Byzantine producer
1849+
// can still SIGN an arbitrary round claim. Without this
1850+
// gate, downstream code (notably `record_finalized_round`
1851+
// called at apply) would advance `LAST_FINALIZED_ROUND_PER_MB`
1852+
// to the Byzantine value, locking out future rotation
1853+
// until 2f+1 honest evidence catches up to the inflated
1854+
// baseline. This is a DoS class — bounded here.
1855+
//
1856+
// The v15.0 `rotation_backfill_request` path below still
1857+
// fires as a soft signal: if the producer's claim is
1858+
// legitimate (cert exists somewhere), peer-side retrieval
1859+
// catches our local certified up to match.
1860+
//
1861+
// Drift window = 3: covers ~3 gossip RTTs at the
1862+
// 1000-validator committee cap (log_5 propagation depth
1863+
// ≈ 4 hops × ~50ms each = 200ms; cross-region asymmetry
1864+
// could extend this to ~1s; 3 rounds × 5s emit grace
1865+
// covers the worst-case propagation race).
1866+
//
1867+
// Scalability: one O(1) DashMap read per block ingest.
1868+
// Identical cost at 5 or 10 000 super-nodes.
1869+
// ═══════════════════════════════════════════════════════════════
1870+
const TIMEOUT_ROUND_DRIFT_WINDOW: u64 = 3;
1871+
let mb_idx = mb.height / 90;
1872+
let local_certified =
1873+
crate::unified_p2p::highest_certified_round_for(mb_idx);
1874+
if mb.timeout_round > local_certified.saturating_add(TIMEOUT_ROUND_DRIFT_WINDOW) {
1875+
if is_warn() {
1876+
println!(
1877+
"[WARN][PIPELINE] timeout_round_implausible h={} mb={} claimed={} local_certified={} drift_window={} action=hard_reject from={}",
1878+
mb.height, mb_idx, mb.timeout_round,
1879+
local_certified, TIMEOUT_ROUND_DRIFT_WINDOW,
1880+
decoded.from_peer,
1881+
);
1882+
}
1883+
metrics.verify_failed.fetch_add(1, Ordering::Relaxed);
1884+
continue;
1885+
}
1886+
18251887
if let Some((expected, expected_round)) = crate::node::get_expected_producer(mb.height) {
18261888
if mb.producer != expected {
18271889
if mb.timeout_round != expected_round {
18281890
// Category A: Timeout divergence — different round claimed.
1829-
// Without an ingest-side VRF re-derivation we cannot
1830-
// declare this invalid; signature + hash chain + 2f+1
1831-
// macroblock commit still enforce correctness, and
1832-
// the BFT-driven rotation converges once all nodes
1833-
// have gossiped their signed TimeoutVotes.
1891+
// Bounded above by the v23.1 authenticity gate; this branch
1892+
// covers honest gossip-window divergence (within drift) where
1893+
// the claim is plausible but doesn't match our cached view.
1894+
// Signature + hash chain + macroblock 2f+1 commit still enforce
1895+
// correctness; the BFT-driven rotation converges once vote
1896+
// gossip propagates.
18341897
if is_info() {
18351898
println!("[INFO][PIPELINE] timeout_divergence h={} our_round={} block_round={} our_prod={} block_prod={}",
18361899
mb.height, expected_round, mb.timeout_round, expected, mb.producer);

0 commit comments

Comments
 (0)