Commit b3224b6
feat: v16.1 identity anchoring + escalation ladder + network heartbeat + 2f+1 rollback
Forensic root causes (12-hour deadlock at h=781):
* node_001 keypair regenerated post-restart while peer registries retained
the original PK -> 720+ sig_invalid via pk_mismatch hard reject
* state machine cycled VALIDATING->ERROR{recoverable=true}->VALIDATING for
~40000 iterations with `recoverable` flag having zero consumers
* producer_silent watchdog tracked LOCAL producer task, not remote producer
* hash_chain_break at f+1=2 triggered destructive rollback (BFT violation)
-> resync re-downloaded forked branch -> cascade h=333->331, h=368->364
* empty-slot attestation gate `microblock_height > 0` blocked genesis-era
failover when initial producer was dead from boot
Phase 1: Identity Key Anchoring (Cluster A, 5 fixes)
* genesis_constants.rs: load_genesis_anchor_pks_from_file +
install_genesis_anchors_at_startup + get_genesis_anchor_pk
* consensus_crypto.rs: get_consensus_pk_anchor + genesis_anchor_pks_len
* node.rs::start: anchors install BEFORE RPC/P2P
* node.rs::create_genesis_registration_txs: embed anchored PK in TX +
recompute hash so all nodes share canonical (node_id -> PK) binding
* node.rs::initialize_wallet_identity: refuse-on-mismatch with FATAL
panic + operator restoration hint
Phase 2: State Machine Liveness (Cluster B, 4 fixes)
* node.rs::set_node_state: ERROR_CYCLE_COUNT + 4-stage escalation ladder
(force_round @3, resync @10, peer_refresh @30, halt @120 cycles)
* node.rs:19459: bft_wait_start corrected log labels + bft_wait_timeout
with real received vs threshold values
* node.rs:19488: live `responses` field update inside poll loop
* node.rs:17850: macroblock-match escape clause in vote gate -- a node
within 1 macroblock (90 microblocks) of best peer is on canonical
finalized chain by 2f+1 commit-reveal construction; allow voting
Phase 3: Producer Failover (Cluster C, 2 fixes)
* unified_p2p.rs: NetworkMessage::ProducerHeartbeat + Dilithium3-verified
handler + REMOTE_PRODUCER_HEARTBEAT_{MS,OBSERVED_MS} +
broadcast_producer_heartbeat parallel fan-out
* node.rs:19154: 1/sec throttled broadcast when elected (next_block_height)
* node.rs:17726: smart genesis-era gate -- microblock_height > 0 OR
(microblock_height == 0 AND now > genesis_ts + 3*grace_period); prevents
premature failover on startup, restores genesis-era recovery
* node.rs:17756: heartbeat fast-path -- last_remote_producer_heartbeat_age_ms
> 3000 triggers immediate empty-slot attestation broadcast
Phase 4: Fork Recovery Hardening (Cluster D, 2 fixes)
* block_pipeline.rs::record_hash_chain_break_witness: SPLIT thresholds:
f+1 -> fork_detection_signal (advisory log + mark peer as fork-source);
2f+1 -> minority_fork_confirmed (destructive rollback only)
* block_pipeline.rs: FORKED_PEER_COOLDOWN map + mark_peer_as_fork_source +
is_peer_in_fork_cooldown + cleanup_forked_peer_cooldown (5-min window)
* unified_p2p.rs::get_sync_peers_filtered_by_height: two-pass canonical-
aware selection -- clean peers first, tagged peers fall back only when
clean exhausted (liveness over caution)
* node.rs:17608: CHRONIC_STALL_REQUESTED wired into chronic stall path
* node.rs:16601: HALT_REQUESTED -> process::exit(1) at loop top
* unified_p2p.rs:20081: PEER_REFRESH_REQUESTED -> forced peer exchange
with doubled breadth on signal
Formal invariants enforced:
* INV-1: identity-key binding immutable after anchor install
* INV-2: 2f+1 supermajority required for any destructive state mutation
* INV-3: single source cannot trigger failover (2f+1 attestation gate)
* LIV-1: stuck Error state bounded <= 120 cycles before halt
* LIV-2: dead-producer detection bounded <= 3 sec via heartbeat
* LIV-3: cascade rollback math-impossible (cooldown breaks loop)
Operator procedure (mandatory before redeploy):
1. Run cluster once with new binary, collect each node's pk_hash from
[INFO][KEY] keypair_ready logs
2. Build /app/data/genesis_anchors.json with all 5 hex-encoded PKs
3. Distribute file to ALL 5 nodes (consistency mandatory)
4. Backup dilithium_keypair.bin per node (loss = anchor mismatch panic)
5. Restart cluster -- anchors install before P2P, embed in genesis TX,
immutable binding for network lifetime
Scalability: registries 50k-cap, committee bounded <= 1000, all hot
paths O(1) DashMap, network heartbeat 1msg/sec total. Designed for
100k+ super-node deployment.
Build: cargo build --release exit 0 (13m 13s, 0 warnings).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent c5d0523 commit b3224b6
5 files changed
Lines changed: 1195 additions & 56 deletions
File tree
- core/qnet-consensus/src
- development/qnet-integration/src
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
349 | 349 | | |
350 | 350 | | |
351 | 351 | | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
352 | 369 | | |
353 | 370 | | |
354 | 371 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
122 | 122 | | |
123 | 123 | | |
124 | 124 | | |
125 | | - | |
126 | | - | |
127 | | - | |
128 | | - | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
129 | 155 | | |
130 | 156 | | |
131 | 157 | | |
132 | 158 | | |
133 | 159 | | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
134 | 164 | | |
135 | 165 | | |
136 | 166 | | |
| |||
150 | 180 | | |
151 | 181 | | |
152 | 182 | | |
153 | | - | |
| 183 | + | |
| 184 | + | |
154 | 185 | | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
159 | 216 | | |
160 | | - | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
161 | 220 | | |
162 | 221 | | |
163 | 222 | | |
| |||
166 | 225 | | |
167 | 226 | | |
168 | 227 | | |
169 | | - | |
170 | | - | |
| 228 | + | |
| 229 | + | |
171 | 230 | | |
172 | 231 | | |
173 | 232 | | |
174 | 233 | | |
175 | 234 | | |
176 | 235 | | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
177 | 311 | | |
178 | 312 | | |
179 | 313 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
0 commit comments