Skip to content

Commit b3224b6

Browse files
AIQnetLabclaude
andcommitted
feat: v16.1 identity anchoring + escalation ladder + network heartbeat + 2f+1 rollback
Forensic root causes (12-hour deadlock at h=781): * node_001 keypair regenerated post-restart while peer registries retained the original PK -> 720+ sig_invalid via pk_mismatch hard reject * state machine cycled VALIDATING->ERROR{recoverable=true}->VALIDATING for ~40000 iterations with `recoverable` flag having zero consumers * producer_silent watchdog tracked LOCAL producer task, not remote producer * hash_chain_break at f+1=2 triggered destructive rollback (BFT violation) -> resync re-downloaded forked branch -> cascade h=333->331, h=368->364 * empty-slot attestation gate `microblock_height > 0` blocked genesis-era failover when initial producer was dead from boot Phase 1: Identity Key Anchoring (Cluster A, 5 fixes) * genesis_constants.rs: load_genesis_anchor_pks_from_file + install_genesis_anchors_at_startup + get_genesis_anchor_pk * consensus_crypto.rs: get_consensus_pk_anchor + genesis_anchor_pks_len * node.rs::start: anchors install BEFORE RPC/P2P * node.rs::create_genesis_registration_txs: embed anchored PK in TX + recompute hash so all nodes share canonical (node_id -> PK) binding * node.rs::initialize_wallet_identity: refuse-on-mismatch with FATAL panic + operator restoration hint Phase 2: State Machine Liveness (Cluster B, 4 fixes) * node.rs::set_node_state: ERROR_CYCLE_COUNT + 4-stage escalation ladder (force_round @3, resync @10, peer_refresh @30, halt @120 cycles) * node.rs:19459: bft_wait_start corrected log labels + bft_wait_timeout with real received vs threshold values * node.rs:19488: live `responses` field update inside poll loop * node.rs:17850: macroblock-match escape clause in vote gate -- a node within 1 macroblock (90 microblocks) of best peer is on canonical finalized chain by 2f+1 commit-reveal construction; allow voting Phase 3: Producer Failover (Cluster C, 2 fixes) * unified_p2p.rs: NetworkMessage::ProducerHeartbeat + Dilithium3-verified handler + REMOTE_PRODUCER_HEARTBEAT_{MS,OBSERVED_MS} + broadcast_producer_heartbeat parallel fan-out * node.rs:19154: 1/sec throttled broadcast when elected (next_block_height) * node.rs:17726: smart genesis-era gate -- microblock_height > 0 OR (microblock_height == 0 AND now > genesis_ts + 3*grace_period); prevents premature failover on startup, restores genesis-era recovery * node.rs:17756: heartbeat fast-path -- last_remote_producer_heartbeat_age_ms > 3000 triggers immediate empty-slot attestation broadcast Phase 4: Fork Recovery Hardening (Cluster D, 2 fixes) * block_pipeline.rs::record_hash_chain_break_witness: SPLIT thresholds: f+1 -> fork_detection_signal (advisory log + mark peer as fork-source); 2f+1 -> minority_fork_confirmed (destructive rollback only) * block_pipeline.rs: FORKED_PEER_COOLDOWN map + mark_peer_as_fork_source + is_peer_in_fork_cooldown + cleanup_forked_peer_cooldown (5-min window) * unified_p2p.rs::get_sync_peers_filtered_by_height: two-pass canonical- aware selection -- clean peers first, tagged peers fall back only when clean exhausted (liveness over caution) * node.rs:17608: CHRONIC_STALL_REQUESTED wired into chronic stall path * node.rs:16601: HALT_REQUESTED -> process::exit(1) at loop top * unified_p2p.rs:20081: PEER_REFRESH_REQUESTED -> forced peer exchange with doubled breadth on signal Formal invariants enforced: * INV-1: identity-key binding immutable after anchor install * INV-2: 2f+1 supermajority required for any destructive state mutation * INV-3: single source cannot trigger failover (2f+1 attestation gate) * LIV-1: stuck Error state bounded <= 120 cycles before halt * LIV-2: dead-producer detection bounded <= 3 sec via heartbeat * LIV-3: cascade rollback math-impossible (cooldown breaks loop) Operator procedure (mandatory before redeploy): 1. Run cluster once with new binary, collect each node's pk_hash from [INFO][KEY] keypair_ready logs 2. Build /app/data/genesis_anchors.json with all 5 hex-encoded PKs 3. Distribute file to ALL 5 nodes (consistency mandatory) 4. Backup dilithium_keypair.bin per node (loss = anchor mismatch panic) 5. Restart cluster -- anchors install before P2P, embed in genesis TX, immutable binding for network lifetime Scalability: registries 50k-cap, committee bounded <= 1000, all hot paths O(1) DashMap, network heartbeat 1msg/sec total. Designed for 100k+ super-node deployment. Build: cargo build --release exit 0 (13m 13s, 0 warnings). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c5d0523 commit b3224b6

5 files changed

Lines changed: 1195 additions & 56 deletions

File tree

core/qnet-consensus/src/consensus_crypto.rs

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -349,6 +349,23 @@ lazy_static::lazy_static! {
349349
parking_lot::RwLock::new(std::collections::HashMap::new());
350350
}
351351

352+
/// Read-only access to the genesis anchor for a single identity. Returns
353+
/// None when no anchor map is installed (cold boot before anchor file is
354+
/// loaded) or when `node_id` is not a genesis identity.
355+
///
356+
/// Used by the integration layer at `initialize_wallet_identity` to refuse
357+
/// boot when the locally-loaded keypair does not match the anchored PK,
358+
/// preventing the v15.x pk_mismatch class of incidents.
359+
pub fn get_consensus_pk_anchor(node_id: &str) -> Option<Vec<u8>> {
360+
GENESIS_ANCHOR_PKS.read().get(node_id).cloned()
361+
}
362+
363+
/// Number of installed genesis anchors. 0 when no anchor file has been
364+
/// loaded yet — used by callers to decide whether to enforce strict binding.
365+
pub fn genesis_anchor_pks_len() -> usize {
366+
GENESIS_ANCHOR_PKS.read().len()
367+
}
368+
352369
/// Install the genesis anchor PK map. Called exactly once at process start
353370
/// by the integration layer, BEFORE any `register_consensus_pk_with_proof`
354371
/// call, with the deterministic genesis PKs for the 5 anchor nodes.

development/qnet-integration/src/block_pipeline.rs

Lines changed: 146 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -122,15 +122,45 @@ static HASH_CHAIN_BREAK_WITNESSES: once_cell::sync::Lazy<
122122
> = once_cell::sync::Lazy::new(dashmap::DashMap::new);
123123

124124
/// Record that `peer_id` reported a hash_chain_break at `height`.
125-
/// If the set of distinct witnesses reaches f+1 (not 2f+1), signal fork
126-
/// recovery. f+1 is the "at least one honest witness" threshold and is
127-
/// the canonical bar for fork DETECTION (as opposed to the 2f+1 COMMIT
128-
/// threshold).
125+
///
126+
/// v16.1: SPLIT THRESHOLD MODEL. Industry-standard BFT chains separate
127+
/// fork DETECTION (advisory signal, low threshold) from fork ROLLBACK
128+
/// (state-mutating action, supermajority threshold):
129+
///
130+
/// * f+1 = "at least one honest witness exists" — sufficient to RAISE
131+
/// a soft-warning signal, prompt deeper investigation, increase peer
132+
/// diversity in resync sources. NEVER rolls back state.
133+
///
134+
/// * 2f+1 = "byzantine supermajority" — required for any state
135+
/// mutation including destructive rollback. Below this threshold
136+
/// a malicious f-node colluding with f-1 honest witnesses on a
137+
/// transient timestamp/round mismatch could force everyone else to
138+
/// unwind their canonical chain.
139+
///
140+
/// Forensic motivation (v15.x h=334..h=368 cascade):
141+
/// The legacy code used f+1 (=2 for n=5) for both detection AND
142+
/// destructive rollback. A single byzantine producer (node_001 with
143+
/// pk_mismatch broken signatures) plus one honest peer reporting a
144+
/// harmless `block_round=2` divergence was enough to roll the local
145+
/// tip back. Resync then re-downloaded the SAME forked branch from
146+
/// the same peer set (no canonical-chain filter), and the witness
147+
/// counter trip again at the new height — repeat for hours, walking
148+
/// the chain backwards.
149+
///
150+
/// Splitting the thresholds plus the peer-cooldown (Phase 4.B) breaks
151+
/// the cascade: a single dishonest peer cannot induce rollback even
152+
/// if f honest peers happen to agree on the same wrong-looking hash;
153+
/// the rollback path requires a real BFT-supermajority of distinct
154+
/// peer_ids.
129155
///
130156
/// Rate-limit semantics: once FORK_RECOVERY_HEIGHT is non-zero we don't
131157
/// overwrite it with a different (lower) height — the main loop consumes
132158
/// it first. This prevents flapping when two heights both accumulate
133159
/// witnesses during a partition.
160+
///
161+
/// Scalability: per-height witness sets are bounded by the active
162+
/// validator-set count (≤ MAX_VALIDATORS = 1000 in committee). Cleanup
163+
/// sweep evicts entries below current chain tip.
134164
pub fn record_hash_chain_break_witness(height: u64, peer_id: &str) {
135165
if peer_id.is_empty() || peer_id == "self" {
136166
return;
@@ -150,14 +180,43 @@ pub fn record_hash_chain_break_witness(height: u64, peer_id: &str) {
150180
let n = qnet_consensus::consensus_crypto::consensus_pk_registry_len();
151181
if n >= 3 { n } else { 5 }
152182
};
153-
// f+1 = ceil(n/3): guarantees at least one honest witness.
183+
184+
// f+1 = ceil(n/3): "at least one honest witness" — DETECTION ONLY.
154185
let threshold_f_plus_1 = (total_validators.saturating_add(2)) / 3;
155-
// Floor at 2 so at any registry size ≥ 4 the threshold is ≥ 2; below
156-
// 4 we still need at least 2 distinct reporters to avoid single-peer
157-
// false positives.
158-
let threshold = threshold_f_plus_1.max(2);
186+
let detection_threshold = threshold_f_plus_1.max(2);
187+
188+
// 2f+1 = canonical BFT supermajority — destructive ROLLBACK threshold.
189+
// Formula `(2n+2)/3` is the canonical ceiling form; works uniformly
190+
// for n=5 as well as n=1_000_000.
191+
let threshold_2f_plus_1 = ((total_validators.saturating_mul(2)).saturating_add(2)) / 3;
192+
let rollback_threshold = threshold_2f_plus_1.max(3);
193+
194+
// Stage 1: SOFT detection signal at f+1.
195+
// Logged once per crossing so operators see partial agreement without
196+
// triggering rollback. The signal is also picked up by Phase 4.B
197+
// peer-cooldown which tags every peer that supplied a forked block at
198+
// this height as "potentially-byzantine" for canonical-aware sync
199+
// peer selection — without losing those peers as gossip sources.
200+
if witnesses == detection_threshold && witnesses < rollback_threshold {
201+
if is_warn() {
202+
println!(
203+
"[WARN][PIPELINE] fork_detection_signal h={} witnesses={} threshold_f_plus_1={} action=advisory_only_no_rollback",
204+
height, witnesses, detection_threshold
205+
);
206+
}
207+
// Mark all current witnesses as "fork-source" so the canonical-
208+
// aware sync peer selection prefers other peers when refilling
209+
// the local chain at this height.
210+
if let Some(set) = HASH_CHAIN_BREAK_WITNESSES.get(&height) {
211+
for w in set.iter() {
212+
mark_peer_as_fork_source(w.key());
213+
}
214+
}
215+
}
159216

160-
if witnesses >= threshold {
217+
// Stage 2: HARD destructive rollback at 2f+1.
218+
// Only at this point do we commit to actually deleting blocks.
219+
if witnesses >= rollback_threshold {
161220
let rollback_to = height.saturating_sub(1);
162221
// Only raise the signal — never lower. The main loop consumes it
163222
// under the same atomic swap that clears the tracker.
@@ -166,14 +225,89 @@ pub fn record_hash_chain_break_witness(height: u64, peer_id: &str) {
166225
FORK_RECOVERY_HEIGHT.store(rollback_to, Ordering::SeqCst);
167226
if is_warn() {
168227
println!(
169-
"[WARN][PIPELINE] minority_fork_detected h={} rollback_to={} witnesses={} threshold={} (f+1)",
170-
height, rollback_to, witnesses, threshold
228+
"[WARN][PIPELINE] minority_fork_confirmed h={} rollback_to={} witnesses={} threshold_2f_plus_1={} action=destructive_rollback",
229+
height, rollback_to, witnesses, rollback_threshold
171230
);
172231
}
173232
}
174233
}
175234
}
176235

236+
// ═══════════════════════════════════════════════════════════════════════════
237+
// v16.1: FORKED PEER COOLDOWN
238+
// ═══════════════════════════════════════════════════════════════════════════
239+
// Peers that supplied blocks of a branch we just rolled back from (or which
240+
// triggered the f+1 fork-detection signal) are tagged here for a bounded
241+
// cooldown window. The canonical-aware sync peer selector reads this map
242+
// and de-prioritises tagged peers until the cooldown expires — letting the
243+
// resync pull from peers on the canonical branch instead of refetching
244+
// the same forked blocks.
245+
//
246+
// Bounded retention: 5-minute cooldown per peer. Auto-evicted on next
247+
// fork event for that peer (refresh) or via the periodic sweep below.
248+
// At 100k super-node deployment this map is bounded by the union of
249+
// recent fork participants — typically << 1000 entries.
250+
// ═══════════════════════════════════════════════════════════════════════════
251+
252+
const FORKED_PEER_COOLDOWN_MS: u64 = 5 * 60 * 1000; // 5 min
253+
254+
static FORKED_PEER_COOLDOWN: once_cell::sync::Lazy<dashmap::DashMap<String, u64>> =
255+
once_cell::sync::Lazy::new(dashmap::DashMap::new);
256+
257+
/// Mark `peer_id` as having supplied a forked-branch block. Used by the
258+
/// canonical-aware sync peer selector to prefer other peers during the
259+
/// cooldown window. Idempotent — refreshes timestamp on repeated hits.
260+
pub fn mark_peer_as_fork_source(peer_id: &str) {
261+
if peer_id.is_empty() || peer_id == "self" {
262+
return;
263+
}
264+
let now_ms = std::time::SystemTime::now()
265+
.duration_since(std::time::UNIX_EPOCH)
266+
.map(|d| d.as_millis() as u64)
267+
.unwrap_or(0);
268+
FORKED_PEER_COOLDOWN.insert(peer_id.to_string(), now_ms);
269+
}
270+
271+
/// Returns true while `peer_id` is within the fork-cooldown window. The
272+
/// canonical-aware sync peer selector skips peers for which this returns
273+
/// true; if the entire candidate set is in cooldown, the selector falls
274+
/// back to the full set rather than starving sync (preferring suspect
275+
/// peers over no peers at all when liveness is at stake).
276+
pub fn is_peer_in_fork_cooldown(peer_id: &str) -> bool {
277+
let entry = match FORKED_PEER_COOLDOWN.get(peer_id) {
278+
Some(e) => e,
279+
None => return false,
280+
};
281+
let marked_at = *entry.value();
282+
drop(entry);
283+
let now_ms = std::time::SystemTime::now()
284+
.duration_since(std::time::UNIX_EPOCH)
285+
.map(|d| d.as_millis() as u64)
286+
.unwrap_or(0);
287+
let in_cooldown = now_ms.saturating_sub(marked_at) < FORKED_PEER_COOLDOWN_MS;
288+
if !in_cooldown {
289+
// Lazy eviction — opportunistically clean expired entries on
290+
// every read. Avoids a separate cleanup task at the cost of a
291+
// single DashMap remove per expiration check.
292+
FORKED_PEER_COOLDOWN.remove(peer_id);
293+
}
294+
in_cooldown
295+
}
296+
297+
/// Periodic sweep called from the existing cleanup task. Removes entries
298+
/// older than the cooldown window so the map stays bounded under sustained
299+
/// fork activity. O(N) over current map size; runs at low cadence (the
300+
/// caller's existing 5-minute sweep is sufficient).
301+
pub fn cleanup_forked_peer_cooldown() {
302+
let now_ms = std::time::SystemTime::now()
303+
.duration_since(std::time::UNIX_EPOCH)
304+
.map(|d| d.as_millis() as u64)
305+
.unwrap_or(0);
306+
FORKED_PEER_COOLDOWN.retain(|_, marked_at| {
307+
now_ms.saturating_sub(*marked_at) < FORKED_PEER_COOLDOWN_MS
308+
});
309+
}
310+
177311
/// Periodic cleanup of stale witness entries below `min_height`.
178312
/// Called by unified_p2p cleanup tasks.
179313
pub fn cleanup_break_tracker(min_height: u64) {

development/qnet-integration/src/genesis_constants.rs

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,3 +159,123 @@ pub fn get_all_vrf_keys() -> HashMap<String, Vec<u8>> {
159159
VRF_PK_REGISTRY.read().clone()
160160
}
161161

162+
// =========================================================================
163+
// v16.1: GENESIS DILITHIUM ANCHOR LOADER (chain-anchored identity binding)
164+
// =========================================================================
165+
//
166+
// Identity-key binding for the 5 genesis bootstrap nodes is anchored at boot
167+
// time from a JSON file shipped with the deployment. Once installed via
168+
// `consensus_crypto::set_genesis_anchor_pks`, the anchor map is immutable and
169+
// guards every subsequent registration: any PK that does not match the anchor
170+
// is rejected as a squat attempt.
171+
//
172+
// File format (`/app/data/genesis_anchors.json`):
173+
// { "genesis_node_001": "<hex_1952_bytes>", ... "genesis_node_005": "..." }
174+
//
175+
// Operator workflow:
176+
// 1. On a clean cluster, every bootstrap node generates its own keypair
177+
// (lazy, on first start) under `/app/data/keys/dilithium_keypair.bin`.
178+
// 2. Operator collects each node's PK (hex from `pk_hash` log or RPC) and
179+
// writes them all into ONE JSON file deployed to every node BEFORE the
180+
// first restart that loads anchors.
181+
// 3. Subsequent restarts read the file and install anchors at startup,
182+
// BEFORE P2P comes online — closes the trust-on-first-verify race that
183+
// caused the v15.x pk_mismatch deadlock.
184+
//
185+
// Operational property: keypair files MUST be backed up. If a node's
186+
// `dilithium_keypair.bin` is lost while the anchor map still binds the old
187+
// PK, the node refuses to start (via `initialize_wallet_identity`'s strict
188+
// guard) — operator must restore from backup.
189+
//
190+
// Scalability: anchors are bounded to the 5 genesis identities. For
191+
// thousands of super-node operators, identity-key binding is established via
192+
// signed `NodeRegistration` transactions (already implemented at
193+
// `cache_node_registrations_from_transactions_with_dashmap`), which carry
194+
// `dilithium_public_key` in TX payload and feed `register_consensus_pk_from_chain`.
195+
// =========================================================================
196+
197+
/// Default location of the genesis anchors JSON file inside the container.
198+
pub const GENESIS_ANCHORS_PATH: &str = "/app/data/genesis_anchors.json";
199+
200+
/// Load genesis Dilithium3 anchor PKs from `path`. Returns empty map if file
201+
/// missing or malformed (logged as WARN, not fatal — boot proceeds without
202+
/// anchors so a fresh cluster can complete first-time keygen + anchor write).
203+
///
204+
/// Format: JSON object `{ node_id: pk_hex_1952_bytes }`. Each PK MUST decode
205+
/// to exactly 1952 bytes; invalid entries are skipped with WARN.
206+
pub fn load_genesis_anchor_pks_from_file(path: &str) -> HashMap<String, Vec<u8>> {
207+
use std::fs;
208+
let raw = match fs::read_to_string(path) {
209+
Ok(s) => s,
210+
Err(_) => {
211+
// Not present is normal for first cluster boot — operator writes
212+
// the file after collecting PKs. Don't WARN at this stage.
213+
return HashMap::new();
214+
}
215+
};
216+
217+
let parsed: HashMap<String, String> = match serde_json::from_str(&raw) {
218+
Ok(m) => m,
219+
Err(e) => {
220+
eprintln!("[WARN][GENESIS] anchors_parse_fail path={} err={}", path, e);
221+
return HashMap::new();
222+
}
223+
};
224+
225+
let mut out = HashMap::with_capacity(parsed.len());
226+
for (node_id, pk_hex) in parsed {
227+
match hex::decode(&pk_hex) {
228+
Ok(bytes) if bytes.len() == 1952 => {
229+
out.insert(node_id, bytes);
230+
}
231+
Ok(bytes) => {
232+
eprintln!(
233+
"[WARN][GENESIS] anchor_invalid_size node={} got={} expected=1952",
234+
node_id, bytes.len()
235+
);
236+
}
237+
Err(e) => {
238+
eprintln!("[WARN][GENESIS] anchor_hex_decode_fail node={} err={}", node_id, e);
239+
}
240+
}
241+
}
242+
out
243+
}
244+
245+
/// One-shot startup hook: load anchors from default path and install into
246+
/// the consensus-layer registry. Idempotent — second call is a no-op once
247+
/// the consensus layer has installed a non-empty anchor map.
248+
///
249+
/// Returns the count of anchors installed (0 if file missing — caller may
250+
/// proceed without anchors during first-cluster keygen, then call again
251+
/// after the file is written by operator).
252+
///
253+
/// MUST be called BEFORE any P2P traffic is accepted (specifically, before
254+
/// the first `VrfLeaderClaim` or `VrfKeyAnnounce` could trigger
255+
/// `register_consensus_pk_from_chain` in a TOFV path).
256+
pub fn install_genesis_anchors_at_startup() -> usize {
257+
let map = load_genesis_anchor_pks_from_file(GENESIS_ANCHORS_PATH);
258+
if map.is_empty() {
259+
// First-boot path: no anchor file yet. Caller logs the appropriate
260+
// INFO; we return 0 so caller can decide whether to fail or proceed.
261+
return 0;
262+
}
263+
let count = map.len();
264+
let installed = qnet_consensus::consensus_crypto::set_genesis_anchor_pks(map);
265+
if installed {
266+
println!("[INFO][GENESIS] anchors_installed count={} src={}", count, GENESIS_ANCHORS_PATH);
267+
count
268+
} else {
269+
// Already installed (immutable). Treat as success, but log so an
270+
// operator restart with a different file is visible.
271+
println!("[INFO][GENESIS] anchors_already_installed count={}", count);
272+
count
273+
}
274+
}
275+
276+
/// Lookup the anchored PK for a given genesis node_id. Returns None if no
277+
/// anchor map is installed, or if the node_id is not a genesis identity.
278+
pub fn get_genesis_anchor_pk(node_id: &str) -> Option<Vec<u8>> {
279+
qnet_consensus::consensus_crypto::get_consensus_pk_anchor(node_id)
280+
}
281+

0 commit comments

Comments
 (0)