Skip to content

Commit b19c872

Browse files
AIQnetLabclaude
andcommitted
fix: v21 — close macroblock-boundary timeout-cert circular deadlock
Three surgical changes to the BFT vote-pool semantics. Closes the deadlock observed at the first macroblock boundary with a failed primary producer (mb_idx=4 on 5-node testnet, h=360 stuck for 12.7 h). A1. Forward-looking TimeoutVote target (node.rs:17749) Vote mb_idx now derives from next_height / 90, not microblock_height / 90. At a macroblock boundary the old expression resolved to the PREVIOUS macroblock (already finalised); voters could never converge 2f+1 on the new one. Receiver-side already accepts forward votes within local_mb + 50 (unified_p2p.rs::handle_timeout_vote), so emitter-only change. A2. Vote-pool fallback in pipeline cert check (block_pipeline.rs:1761, +2 helpers in unified_p2p.rs) When AggregatedTimeoutCert for (mb_idx, round) is missing, consult the live TIMEOUT_VOTES pool. If 2f+1 distinct-voter Dilithium3-signed votes are present, admit (cert is just an aggregated view of those same signed messages — same trust source, same threshold). Otherwise fall back to the original defer-and-request-backfill path. New SimplifiedP2P helpers: count_timeout_votes_in_pool(mb_idx, round) -> usize has_two_f_plus_one_timeout_votes(mb_idx, round, threshold) -> bool Logged at INFO with a boundary flag indicating whether the bypass fired at h % 90 == 0 (legitimate cert-aggregation race) or mid-macroblock (operator-attention worthy). B1. Heartbeat-driven forward TimeoutVote emit (node.rs +95 lines) Existing heartbeat_fast_path detector (3 s silent threshold) only triggered empty-slot attestation. Now also emits a signed TimeoutVote at certified_round + 1 — gated on proposed_timeout_round == 0, is_synced_enough, and (microblock_height > 0 || genesis_era_dead_producer) so the new path never double-fires with the legacy stall-driven emit. broadcast_timeout_vote dedupes via TIMEOUT_VOTED_HEIGHTS, so cross-tick or cross-path redundancy collapses to one effective vote per (mb_idx, round, voter). Safety ------ A1: same Dilithium3 signature, same (mb_idx, round, voter_id) anti-replay tracker, same 2f+1 threshold for cert generation. A2: pool entries are the same Dilithium3-signed messages the cert aggregator consumes (verified at gossip ingest). No new gate, just direct access. B1: same signing path, same broadcast path, same per-voter dedup. The cryptographic floor (signature math + 2f+1) is unchanged across all three fixes. Scalability ----------- Per-node steady-state cost: zero added bandwidth, zero added storage, O(1) hot-path for A2 lookup. Identical profile from 5-node genesis to 1M super-nodes. Tests ----- * 6 new regression tests in tests_v21_a2_vote_pool covering pool helpers (zero/exact/below/at/above threshold, independent per-round buckets). * Full suite: qnet-consensus 73 passed, qnet-integration 149 passed (was 143, +6 new), 12 ignored hardware bench, 0 failed across both crates. * cargo build --release: clean, 0 warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent d9c7a89 commit b19c872

3 files changed

Lines changed: 450 additions & 23 deletions

File tree

development/qnet-integration/src/block_pipeline.rs

Lines changed: 94 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1769,30 +1769,106 @@ impl BlockPipeline {
17691769
true
17701770
};
17711771
if !cert_present {
1772-
// Defer block: cert not yet propagated to us. Trigger
1773-
// backfill request and put block in the deferred buffer to
1774-
// be re-checked on the next pipeline pass.
1775-
if let Some(ref p2p) = unified_p2p {
1776-
p2p.request_timeout_proofs(mb_idx_for_cert, mb_idx_for_cert);
1777-
}
1778-
if deferred.len() < DEFERRED_MAX {
1779-
if is_debug() {
1780-
println!(
1781-
"[DBG][PIPELINE] block_deferred_for_cert h={} round={} mb_idx={} buf={}",
1782-
mb.height, mb.timeout_round, mb_idx_for_cert, deferred.len()
1783-
);
1784-
}
1785-
deferred.insert(mb.height, decoded);
1772+
// ═══════════════════════════════════════════════════════════
1773+
// v21 (A2): VOTE-POOL FALLBACK — boundary grace
1774+
// ═══════════════════════════════════════════════════════════
1775+
// The aggregated cert may not have been generated or
1776+
// gossipped yet, but the underlying TimeoutVotes — each
1777+
// Dilithium3-verified at ingest by `handle_timeout_vote`
1778+
// — may already be in the local pool. The cert is just an
1779+
// aggregated view of those votes; if 2f+1 are present in
1780+
// the pool, we have equivalent cryptographic evidence.
1781+
//
1782+
// Why this is needed
1783+
// ──────────────────
1784+
// Forensic case h=360 on the 5-node testnet showed the
1785+
// failure mode:
1786+
// * h=360 first block of new macroblock; primary
1787+
// timed out → failover producer emitted block with
1788+
// timeout_round=R>0;
1789+
// * receivers required AggregatedTimeoutCert for
1790+
// (mb_idx=4, R) before applying;
1791+
// * cert generation requires 2f+1 votes for
1792+
// (mb_idx=4, R) gossipped to at least one node
1793+
// which then aggregates and re-broadcasts;
1794+
// * during the race window between last vote arriving
1795+
// and cert re-gossip, every receiver's pipeline
1796+
// deferred the block — even though the underlying
1797+
// votes WERE locally present.
1798+
//
1799+
// Treating the pool as equivalent evidence closes that
1800+
// race. No new attack surface: the votes counted are
1801+
// the same Dilithium3-signed messages that feed cert
1802+
// generation; threshold (2f+1) is the same.
1803+
//
1804+
// Pattern
1805+
// ───────
1806+
// "cert is a view, not a gate" — same data, two access
1807+
// paths. Aligns with production-grade BFT semantics
1808+
// where vote pool is the canonical source of truth and
1809+
// the aggregated form is a transport optimisation.
1810+
//
1811+
// Scalability
1812+
// ───────────
1813+
// One DashMap shard read + one HashMap len() — O(1)
1814+
// hot-path cost. At 1M super-nodes the inner HashMap
1815+
// is bounded by `MAX_VALIDATORS = 1000` per slot, so
1816+
// the count operation is a constant.
1817+
// ═══════════════════════════════════════════════════════════
1818+
let pool_has_quorum = if let Some(ref p2p) = unified_p2p {
1819+
let total = p2p.get_active_validator_count();
1820+
// Same threshold formula used everywhere in the
1821+
// codebase: `(N * 2 + 2) / 3` = ceil(2N/3) = 2f+1.
1822+
let two_f_plus_1 = (total * 2 + 2) / 3;
1823+
p2p.has_two_f_plus_one_timeout_votes(
1824+
mb_idx_for_cert,
1825+
mb.timeout_round,
1826+
two_f_plus_1,
1827+
)
17861828
} else {
1829+
false
1830+
};
1831+
1832+
if pool_has_quorum {
1833+
// Pool evidence equivalent to cert. Fall through to
1834+
// subsequent verify steps. Boundary flag in the log
1835+
// helps operators distinguish the legitimate macroblock-
1836+
// boundary race window from steady-state mid-macroblock
1837+
// catches (the latter is unusual and worth noting).
1838+
let at_boundary = mb.height % 90 == 0;
17871839
if is_info() {
17881840
println!(
1789-
"[INFO][PIPELINE] deferred_full h={} round={} dropped (buf={})",
1790-
mb.height, mb.timeout_round, DEFERRED_MAX
1841+
"[INFO][PIPELINE] cert_pool_grace_admit h={} mb_idx={} round={} boundary={} \
1842+
reason=2fplus1_votes_in_local_pool hint=cert_aggregation_race_bypassed",
1843+
mb.height, mb_idx_for_cert, mb.timeout_round, at_boundary
17911844
);
17921845
}
1793-
metrics.verify_failed.fetch_add(1, Ordering::Relaxed);
1846+
} else {
1847+
// Defer block: neither cert nor enough pool votes yet.
1848+
// Trigger backfill request and put block in the
1849+
// deferred buffer to be re-checked on the next pass.
1850+
if let Some(ref p2p) = unified_p2p {
1851+
p2p.request_timeout_proofs(mb_idx_for_cert, mb_idx_for_cert);
1852+
}
1853+
if deferred.len() < DEFERRED_MAX {
1854+
if is_debug() {
1855+
println!(
1856+
"[DBG][PIPELINE] block_deferred_for_cert h={} round={} mb_idx={} buf={}",
1857+
mb.height, mb.timeout_round, mb_idx_for_cert, deferred.len()
1858+
);
1859+
}
1860+
deferred.insert(mb.height, decoded);
1861+
} else {
1862+
if is_info() {
1863+
println!(
1864+
"[INFO][PIPELINE] deferred_full h={} round={} dropped (buf={})",
1865+
mb.height, mb.timeout_round, DEFERRED_MAX
1866+
);
1867+
}
1868+
metrics.verify_failed.fetch_add(1, Ordering::Relaxed);
1869+
}
1870+
continue;
17941871
}
1795-
continue;
17961872
}
17971873
}
17981874

development/qnet-integration/src/node.rs

Lines changed: 145 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17694,11 +17694,59 @@ impl BlockchainNode {
1769417694
// extra second of stall past the adaptive grace window.
1769517695
const TIMEOUT_VOTE_INTERVAL: u64 = 1;
1769617696

17697-
// v4.2: Timeout votes/certificates keyed by MACROBLOCK INDEX,
17698-
// not exact microblock height. This ensures nodes at different
17699-
// microblock heights within the same macroblock can still form
17700-
// quorum and produce a TimeoutCertificate.
17701-
let timeout_mb_index = microblock_height / 90;
17697+
// ═══════════════════════════════════════════════════════════════
17698+
// v21 (A1): FORWARD-LOOKING TIMEOUT VOTE TARGET
17699+
// ═══════════════════════════════════════════════════════════════
17700+
// The timeout vote MUST address the macroblock whose producer is
17701+
// currently failing — the macroblock that contains the BLOCK
17702+
// we are waiting for, not the macroblock our local tip is in.
17703+
//
17704+
// At a macroblock boundary, this distinction is what bridges
17705+
// boot-N to boot-N+1:
17706+
//
17707+
// local_height = 359 → next_height = 360
17708+
// old logic: mb_idx = 359 / 90 = 3 (PREVIOUS macroblock)
17709+
// v21 logic: mb_idx = 360 / 90 = 4 (NEXT macroblock —
17710+
// the one whose producer
17711+
// is failing right now)
17712+
//
17713+
// Why the old indexing caused a circular deadlock at the very
17714+
// first failed-primary boundary:
17715+
// * Voter at h=359 emitted votes for mb_idx=3.
17716+
// * mb_idx=3 was already finalised (it ended at h=359 itself).
17717+
// * No node accumulated evidence for mb_idx=4.
17718+
// * The block at h=360 (produced by failover at round>0)
17719+
// required an AggregatedTimeoutCert for (mb_idx=4, round)
17720+
// to apply.
17721+
// * Cert never reached 2f+1 — voters were all voting for
17722+
// mb_idx=3 instead.
17723+
// * Network stalled at h=359 with no path to recover.
17724+
//
17725+
// Targeting `next_height / 90` makes vote semantics align with
17726+
// production-grade BFT vote-pool patterns: a vote is a
17727+
// standalone cryptographic claim about a future round, not a
17728+
// function of the voter's local state. The receiver-side
17729+
// already accepts votes for `mb_idx ≤ local_mb + 50` — see
17730+
// `unified_p2p.rs::handle_timeout_vote` lookahead window — so
17731+
// emitter-side change is sufficient and self-contained.
17732+
//
17733+
// Safety invariants preserved
17734+
// ─────────────────────────────
17735+
// * Vote remains Dilithium3-signed by the voter — emitter
17736+
// identity gated as before.
17737+
// * 2f+1 supermajority threshold unchanged.
17738+
// * VRF determinism for producer selection unchanged.
17739+
// * `voted_for_round` per-voter dedup still bounds emit rate.
17740+
//
17741+
// Scalability
17742+
// ───────────
17743+
// No additional bandwidth: same vote payload, same broadcast
17744+
// path, same gossip fan-out. The change shifts WHEN the vote
17745+
// is emitted (one slot earlier in the boundary case) and
17746+
// WHICH mb_idx it targets, not the cost of emitting it.
17747+
// Identical performance from 5 to 1M super-nodes.
17748+
// ═══════════════════════════════════════════════════════════════
17749+
let timeout_mb_index = next_height / 90;
1770217750

1770317751
// v5.4: Efficient certificate lookup (replaces bounded loop)
1770417752
let certified_timeout_round = if let Some(p2p) = &unified_p2p {
@@ -18154,6 +18202,98 @@ impl BlockchainNode {
1815418202
);
1815518203
}
1815618204

18205+
// ═══════════════════════════════════════════════════════════════
18206+
// v21 (B1): HEARTBEAT-DRIVEN FORWARD TIMEOUT VOTE EMIT
18207+
// ═══════════════════════════════════════════════════════════════
18208+
// When heartbeat absence is detected from the expected producer
18209+
// for `next_height`, emit a TimeoutVote IMMEDIATELY rather than
18210+
// waiting for `local_delay > timeout_grace_period`. This shaves
18211+
// ~5-10 seconds off the failover path because the vote starts
18212+
// propagating ~3 s after heartbeat-silence detection instead of
18213+
// after a full slot-grace window.
18214+
//
18215+
// Bridges the existing empty-slot attestation mechanism (which
18216+
// already fires on heartbeat_fast_path) into the TimeoutVote /
18217+
// cert-aggregation path, so the same observed producer failure
18218+
// produces evidence on BOTH consensus channels:
18219+
//
18220+
// * empty_slot_failover_round (attestation-based —
18221+
// accelerates microblock-level skip)
18222+
// * HIGHEST_CERTIFIED_ROUND (cert-based —
18223+
// drives macroblock-rotation round advancement)
18224+
//
18225+
// Without this cross-wiring, a heartbeat-detected producer
18226+
// failure triggered ONLY the attestation channel; the
18227+
// TimeoutVote stream had to wait for the legacy
18228+
// `local_delay > grace_period` gate, which at macroblock
18229+
// boundaries created a window where attestations advanced but
18230+
// the cert chain did not — leaving the cert-presence pipeline
18231+
// gate stalling blocks (forensic case h=360 at the first
18232+
// macroblock-boundary primary failure on the testnet).
18233+
//
18234+
// Gated on `proposed_timeout_round == 0` so this path never
18235+
// double-fires with the legacy stall-driven emit (which only
18236+
// runs when proposed_timeout_round > 0). Once `local_delay`
18237+
// crosses the grace threshold, control switches cleanly to the
18238+
// legacy path with no overlap.
18239+
//
18240+
// Safety
18241+
// ──────
18242+
// Same Dilithium3 signature, same `(mb_idx, round, voter_id)`
18243+
// anti-replay tracker, same 2f+1 supermajority threshold for
18244+
// cert generation. `broadcast_timeout_vote` itself dedupes via
18245+
// `TIMEOUT_VOTED_HEIGHTS` so repeated invocations within the
18246+
// same tick are no-ops. The cryptographic floor is unchanged.
18247+
//
18248+
// Scalability
18249+
// ───────────
18250+
// One conditional Dilithium3 sign (~3 ms) + one broadcast when
18251+
// heartbeat goes silent — same per-event cost as the legacy
18252+
// emit, just earlier in the timeline. Identical performance
18253+
// profile from 5 to 1M super-nodes.
18254+
// ═══════════════════════════════════════════════════════════════
18255+
if heartbeat_fast_path
18256+
&& proposed_timeout_round == 0
18257+
&& is_synced_enough
18258+
&& (microblock_height > 0 || genesis_era_dead_producer)
18259+
{
18260+
let target_round = certified_timeout_round.saturating_add(1);
18261+
if let Some(p2p) = &unified_p2p {
18262+
let last_block_hash = storage.get_latest_macroblock_hash()
18263+
.unwrap_or([0u8; 32]);
18264+
let vote_msg = format!(
18265+
"TIMEOUT:{}:{}:{}",
18266+
timeout_mb_index, target_round, hex::encode(&last_block_hash)
18267+
);
18268+
if let Some(crypto) = try_get_quantum_crypto() {
18269+
match crypto.create_consensus_signature(&node_id, &vote_msg).await {
18270+
Ok(sig) => {
18271+
p2p.broadcast_timeout_vote(
18272+
timeout_mb_index,
18273+
target_round,
18274+
last_block_hash,
18275+
sig.signature.as_bytes().to_vec(),
18276+
);
18277+
if is_info() {
18278+
println!(
18279+
"[INFO][TIMEOUT] heartbeat_driven_emit mb={} round={} reason=producer_silent_fast_path",
18280+
timeout_mb_index, target_round
18281+
);
18282+
}
18283+
}
18284+
Err(e) => {
18285+
if is_warn() {
18286+
println!(
18287+
"[WARN][TIMEOUT] heartbeat_driven_sign_fail err={}",
18288+
e
18289+
);
18290+
}
18291+
}
18292+
}
18293+
}
18294+
}
18295+
}
18296+
1815718297
// v14.8.11: drift self-pause vote gate REMOVED. A
1815818298
// drifted node still contributes TimeoutVotes because
1815918299
// its `proposed_timeout_round` is clamped by the

0 commit comments

Comments
 (0)