Skip to content

Commit 8cc116a

Browse files
AIQnetLabclaude
andcommitted
fix(sync): authenticated signed-head tip oracle — close served-ceiling false-synced
A follower derived its network tip from the heights of blocks it PULLED (served-ceiling), so its peer-height view tracked its own sync progress, not the real tip — it falsely flipped synced and stalled below the network. Source the tip ONLY from authenticated heads, never served blocks: SIGNED_HEAD_MAX fed by Dilithium-verified HealthPing heads (immune to the per-peer down-clamp that erased the tip); get_best_peer_height = max(BEST_PEER_HEIGHT, SIGNED_HEAD_MAX); relay genesis heads transitively to non-genesis neighbours only (per-origin monotonic-ts dedup, excludes origin+sender) — reaches deep followers with zero fan-in onto the 5 genesis; drop served-block height attestation at the 4 sites (liveness only); the hint trusts the signed head, the genesis HTTP probe is a cold-start fallback only. Unforgeable (registry-bound PK, body PK ignored), QC-frontier-floored (lie-high chases a phantom tail to STALL_ABORT, no state injection), genesis-inert at bootstrap. cargo check clean, 185 lib tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 92f2089 commit 8cc116a

2 files changed

Lines changed: 65 additions & 31 deletions

File tree

development/qnet-integration/src/sync_manager.rs

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -292,8 +292,11 @@ impl SyncManager {
292292
let local_h = self.coordinator.chain_height();
293293
let best = self.p2p.get_best_peer_height();
294294

295-
// If single peer reports height >100 blocks ahead, verify against bootstrap
296-
if best > 0 && best <= local_h + 100 {
295+
// best is floored by the authenticated signed-head tip (get_best_peer_height). Once any signed
296+
// head exists, trust it directly — it is unforgeable (Dilithium) and the QC frontier floors the
297+
// bulk target, so no genesis HTTP fan-in is needed. The probe below is the cold-start fallback
298+
// only, before the first head arrives (SIGNED_HEAD_MAX == 0).
299+
if best > 0 && crate::unified_p2p::SIGNED_HEAD_MAX.load(std::sync::atomic::Ordering::Relaxed) > 0 {
297300
return best;
298301
}
299302

development/qnet-integration/src/unified_p2p.rs

Lines changed: 60 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,12 @@ static LAST_CANONICAL_VALIDATOR_COUNT: Lazy<Arc<AtomicU64>> =
190190
// Updated atomically when peer heartbeats arrive (update_peer_last_seen).
191191
// Replaces O(N) scan of active_full_super_nodes on every consensus tick.
192192
pub static BEST_PEER_HEIGHT: AtomicU64 = AtomicU64::new(0);
193+
/// Network-tip oracle: highest height from an authenticated (Dilithium-signed) HealthPing head,
194+
/// direct or relayed. NEVER fed by served-block heights, so a follower's own sync progress cannot
195+
/// poison it; the genesis (always present) keep it at the true tip. Floors get_best_peer_height.
196+
pub static SIGNED_HEAD_MAX: AtomicU64 = AtomicU64::new(0);
197+
/// Per-origin last accepted signed-head ts: monotonic anti-replay + relay dedup (gossip each origin/ts once).
198+
static LAST_HEAD_TS: Lazy<DashMap<String, u64>> = Lazy::new(DashMap::new);
193199

194200
// Pending-gap queue (multiple disjoint gaps, lock-free). The pre-v24
195201
// single Mutex<(u64,u64)> slot tracked only ONE gap, so a second gap
@@ -314,9 +320,9 @@ const SOFT_LIMIT_PENDING_SYNC_BLOCKS: usize = 1600;
314320

315321
// v30.A3: freshness window for peer height attestation. A peer's
316322
// last_block_height is consulted for `network_height` ONLY if its
317-
// last_height_attested_at falls within this window. Attestation is set by
318-
// authenticated signal paths (applied block, certified shred, signed
319-
// HealthPing) — NEVER by empty-batch echo or gossip-relayed claims.
323+
// last_height_attested_at falls within this window. Attestation is set ONLY by
324+
// the authenticated signed-head (signed HealthPing / verified handshake) — never
325+
// by served-block heights (an availability fact, not a tip) or empty-batch echo.
320326
// 120 s ≈ 1 macroblock (90 microblocks × 1 s slot + jitter): a peer that
321327
// hasn't emitted a signed height in 2 minutes is treated as height-unknown,
322328
// preventing stale or poisoned values from steering sync indefinitely.
@@ -1935,8 +1941,8 @@ pub struct PeerInfo {
19351941
pub last_block_height: u64,
19361942

19371943
// v30.A3: wall-clock secs of the last height-attesting event for this peer
1938-
// (signed HealthPing, applied block, certified shred). Heights that were
1939-
// populated via empty-batch echo or unauthenticated paths leave this at 0.
1944+
// (signed HealthPing / verified handshake ONLY — served-block heights no
1945+
// longer attest a tip). Unauthenticated paths leave this at 0.
19401946
// `get_max_peer_height` filters on freshness against this — stale or
19411947
// unattested entries are excluded from `network_height` consensus, which
19421948
// collapses the empty-batch → cache-poisoning → permanent sync-mode loop.
@@ -8310,7 +8316,8 @@ impl SimplifiedP2P {
83108316
// accidentally showed correct height via unwrap_or(local_height) fallback.
83118317
let producer_id = if let Some(ref cert) = assembly.certificate {
83128318
let pid = cert.node_id.clone();
8313-
self.update_peer_last_seen_with_height(&pid, Some(height), false);
8319+
// Liveness only: a reconstructed block height is an availability fact, not the peer tip.
8320+
self.update_peer_last_seen(&pid);
83148321
pid
83158322
} else {
83168323
"shred_protocol".to_string()
@@ -8404,7 +8411,8 @@ impl SimplifiedP2P {
84048411
// must update peer heights for correct network_height tracking
84058412
let producer_id = if let Some(ref cert) = assembly.certificate {
84068413
let pid = cert.node_id.clone();
8407-
self.update_peer_last_seen_with_height(&pid, Some(height), false);
8414+
// Liveness only: a reconstructed block height is an availability fact, not the peer tip.
8415+
self.update_peer_last_seen(&pid);
84088416
pid
84098417
} else {
84108418
"shred_protocol-rs".to_string()
@@ -8761,9 +8769,9 @@ impl SimplifiedP2P {
87618769
/// This provides real-time network height without HTTP calls
87628770
///
87638771
/// v30.A3: only entries attested within PEER_HEIGHT_ATTEST_TTL_SECS are
8764-
/// included. Attestation is set exclusively by authenticated paths
8765-
/// (applied block, certified shred, signed HealthPing); stale or
8766-
/// gossip-only entries are excluded — closes the empty-batch self-poison
8772+
/// included. Attestation is set exclusively by the authenticated signed-head
8773+
/// (signed HealthPing / verified handshake — served-block heights do not
8774+
/// attest a tip); stale or gossip-only entries are excluded — closes the empty-batch self-poison
87678775
/// loop and rejects single-peer height claims as a network-wide signal.
87688776
///
87698777
/// v30.A2: requires ≥ 2 distinct attested peers (besides local) before
@@ -12967,8 +12975,8 @@ impl SimplifiedP2P {
1296712975
// Legacy commit/reveal (pre-v2) — ignored: macroblock consensus is Checkpoint-BFT only.
1296812976
NetworkMessage::ConsensusCommit { .. } | NetworkMessage::ConsensusReveal { .. } => {}
1296912977
NetworkMessage::Block { height, data, block_type } => {
12970-
// CRITICAL FIX: Update last_seen AND height for the peer who sent the block
12971-
self.update_peer_last_seen_with_height(from_peer, Some(height), false);
12978+
// Liveness only: the relayed block height is an availability fact, not the peer tip.
12979+
self.update_peer_last_seen(from_peer);
1297212980

1297312981
// Log only every 10th block
1297412982
if height % 10 == 0 {
@@ -13245,7 +13253,7 @@ impl SimplifiedP2P {
1324513253
// v9.1: Rate limit BEFORE Dilithium3 verification (~35ms CPU each).
1324613254
// HealthPings arrive every 10s per peer → max 6/min is generous.
1324713255
// Without this, an attacker floods pings to burn CPU on sig verification.
13248-
if self.is_consensus_rate_limited(from_peer, "health_ping", 12) {
13256+
if self.is_consensus_rate_limited(from_peer, "health_ping", 60) {
1324913257
return;
1325013258
}
1325113259

@@ -13276,6 +13284,20 @@ impl SimplifiedP2P {
1327613284
// Clock drift only means the message took a detour, not that
1327713285
// the height is wrong. This is critical for syncing nodes.
1327813286
self.update_peer_last_seen_with_height(&from, Some(height), true);
13287+
// Authenticated head = the tip oracle (never a served-block height). Relay genesis
13288+
// heads transitively (per-origin monotonic-ts dedup) so a deep follower learns the
13289+
// real tip from any peer — no direct-genesis dependency, no HTTP fan-in. O(5N).
13290+
SIGNED_HEAD_MAX.fetch_max(height, std::sync::atomic::Ordering::Relaxed);
13291+
let head_ts_new = timestamp > LAST_HEAD_TS.get(&from).map(|e| *e.value()).unwrap_or(0);
13292+
if head_ts_new {
13293+
LAST_HEAD_TS.insert(from.clone(), timestamp);
13294+
if crate::genesis_constants::is_legacy_genesis_node(&from) {
13295+
self.relay_signed_head(NetworkMessage::HealthPing {
13296+
from: from.clone(), timestamp, height,
13297+
signature: signature.clone(), public_key: public_key.clone(),
13298+
}, &from, from_peer, 6);
13299+
}
13300+
}
1327913301
if crate::node::is_debug() && height % 100 == 0 {
1328013302
println!("[DBG][P2P] health_ping from={} h={} sig=verified age={}s", from, height, age_secs);
1328113303
}
@@ -15823,12 +15845,30 @@ impl SimplifiedP2P {
1582315845

1582415846
// Take K closest neighbors
1582515847
let k_neighbors: Vec<_> = peers.into_iter().take(k).collect();
15826-
15848+
1582715849
for peer in k_neighbors {
1582815850
self.send_network_message(&peer.addr, message.clone());
1582915851
}
1583015852
}
1583115853

15854+
/// Relay a verified genesis signed-head to NON-genesis neighbors only, excluding the origin and
15855+
/// the immediate sender. The genesis mesh already exchanges heads via direct emit, so relaying back
15856+
/// to it is pure fan-in; restricting to non-genesis k-closest pushes the tip OUTWARD to deep
15857+
/// followers with zero fan-in onto the 5 genesis at thousands-of-joiner scale.
15858+
fn relay_signed_head(&self, message: NetworkMessage, origin_id: &str, sender_addr: &str, k: usize) {
15859+
let mut peers: Vec<_> = self.connected_peers_lockfree.iter()
15860+
.map(|r| r.value().clone())
15861+
.filter(|p| p.id != origin_id
15862+
&& p.addr != sender_addr
15863+
&& !crate::genesis_constants::is_legacy_genesis_node(&p.id))
15864+
.collect();
15865+
if peers.is_empty() { return; }
15866+
peers.sort_by_key(|p| p.bucket_index);
15867+
for peer in peers.into_iter().take(k) {
15868+
self.send_network_message(&peer.addr, message.clone());
15869+
}
15870+
}
15871+
1583215872
// ═══════════════════════════════════════════════════════════════════════════
1583315873
// v5.1: KADEMLIA DHT — FIND_NODE iterative lookup + periodic bucket refresh
1583415874
// ═══════════════════════════════════════════════════════════════════════════
@@ -18343,7 +18383,10 @@ impl SimplifiedP2P {
1834318383
/// O(1) — reads global AtomicU64 updated on every heartbeat.
1834418384
/// Used to determine if THIS node is synced enough to participate in consensus.
1834518385
pub fn get_best_peer_height(&self) -> u64 {
18386+
// Floor by the authenticated signed-head tip: served-block heights (BEST_PEER_HEIGHT) can only
18387+
// attest a peer HAD block N (an availability fact <= our own crawl), never its true head.
1834618388
BEST_PEER_HEIGHT.load(std::sync::atomic::Ordering::Relaxed)
18389+
.max(SIGNED_HEAD_MAX.load(std::sync::atomic::Ordering::Relaxed))
1834718390
}
1834818391

1834918392
/// v9.5: Recalculate BEST_PEER_HEIGHT from scratch by scanning all connected peers.
@@ -18808,21 +18851,9 @@ impl SimplifiedP2P {
1880818851
///
1880918852
/// v2.104: FIXED - On backpressure, cleanup stale entries first instead of dropping
1881018853
pub fn handle_blocks_batch(&self, blocks: Vec<(u64, Vec<u8>)>, from_height: u64, to_height: u64, sender_id: String) {
18811-
// v30.A1: attest sender height ONLY from real delivered blocks. The
18812-
// previous implementation echoed the request's `to_height` back into
18813-
// `last_block_height`, so empty batches (peer has nothing in range)
18814-
// raised the cached height to the requester's own asking ceiling — a
18815-
// self-amplifying loop that locks the network into permanent SYNC mode
18816-
// (every poll re-requests the same window, empty responses keep
18817-
// re-confirming the phantom ceiling). With this fix:
18818-
// * non-empty batch → attest max(block.height) — the only authentic
18819-
// proof of peer height is a block the sender actually possesses;
18820-
// * empty batch → no height attestation, refresh liveness only.
18821-
if let Some(max_block_h) = blocks.iter().map(|(h, _)| *h).max() {
18822-
self.update_peer_last_seen_with_height(&sender_id, Some(max_block_h), false);
18823-
} else {
18824-
self.update_peer_last_seen(&sender_id);
18825-
}
18854+
// Liveness only: a delivered block proves the sender HAD that height (an availability fact
18855+
// bounded by our own request window), never its tip. The network tip comes from signed heads.
18856+
self.update_peer_last_seen(&sender_id);
1882618857

1882718858
// v2.104: BACKPRESSURE - Check queue size and cleanup if needed
1882818859
let queue_size = get_pending_sync_count();

0 commit comments

Comments
 (0)