You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(cluster): spec-5.15 Hardening v1.3 — cold-bootstrap proof requires fresh-alive, not just a valid slot
v1.2 anchored the co-boot quorum on a valid voting-disk slot (generation > 0 at
epoch INITIAL), but a generation > 0 slot alone is not liveness: a CRASHED peer
leaves a stale leftover slot at epoch INITIAL. decide_quorum_view's P2.1
heartbeat-freshness gate already excludes such stale slots from the alive_bitmap,
but the v1.2 bootstrap proof read the raw observed slot (no freshness) — so a node
could reach quorum with self + a stale peer slot and fail-open (latch BOOTSTRAP
without a live co-boot quorum).
Fix: publish the per-node FRESH-ALIVE signal (decide_quorum_view's alive_bitmap)
into the reconfig region, and require it in the cold-bootstrap proof: fresh-alive
AND generation > 0 AND epoch INITIAL. Anchored on the durable voting-disk
heartbeat (not live CSSD), so it rejects stale slots WITHOUT reintroducing the
v1.2 IC-churn race (the disk heartbeat keeps flowing while CSSD/tier1 churns).
Quorum threshold and rejoiner predicate unchanged.
New ReconfigShmem field observed_fresh_alive[CLUSTER_MAX_NODES] (atomic, default
0 = fail-closed); qvotec publishes it each poll from decide_quorum_view.
Unit: test_cluster_reconfig U21 (stale slot must not count, TDD red->green) +
U19/U20 updated to fresh-alive semantics; UT_PLAN 43->45 (also fixes a pre-existing
plan/count mismatch). No on-disk/wire/catalog/GUC change; no catversion bump.
Spec: spec-5.15-online-declared-node-join-membership.md (Hardening v1.3)
0 commit comments