Skip to content

Commit 04e8462

Browse files
author
SqlRush
committed
test(cluster): spec-5.19 D2 — 4-node reconfig fault matrix (Component A)
spec-5.16 (join GRD/PCM remaster) landed on main, completing the reconfig band, so the gated Component A now follows: the first 4-node combination test of the Stage 5 reconfiguration band (5.13 clean-leave / 5.14 fail-stop / 5.15 join / 5.16 join-remaster / 5.18 leave-remove). It verifies the combination at 4 nodes (does not re-prove each sub-spec's own invariants). Cells (fault x phase), HG#1/#2/#3 — membership+epoch monotone, no split- master, no false-visible, fenced/removed write fail-closed: - C1 clean_leave x idle (REQUIRED): leaver commits, coordinator observes clean_leave, epoch monotone. - C2 fail_stop x idle (REQUIRED): survivors detect DEAD, epoch monotone. - C3 fail_stop x under_write_load (REQUIRED): single coordinator (no split- master), survivor write consistent (no false-visible). - C4 leave_remove x idle (REQUIRED) + HG#3: clean-leave drain -> permanent removal accepted (cluster.online_node_removal=on), epoch monotone; removed node fail-closes writes ("this node is write-fenced"). The 4-node multi- survivor membership-shrink ACK (5.18 Hardening v1.1 survivor-local-apply) is substrate-noted, authoritatively proven at 2-node (t/325). - C5/C6 join + join_remaster x idle (PASS-or-SKIP): online peer-restart rejoin substrate-limited at 4-node -> honest SKIP; epoch monotone through the absent transition. Shmem-resilient: on a host whose macOS SHMMNI is saturated by concurrent test sessions, a cell whose 4-node cluster cannot allocate its SysV interlock segments environment-SKIPs (L239) instead of failing — never a faked pass (rule 8.A/8.B). CI runners have ample shmem and run every cell. nightly: stage5-integrated-acceptance shard extended 327-330 -> 327-331. Spec: spec-5.19-stage5-integrated-acceptance.md
1 parent c9ac561 commit 04e8462

2 files changed

Lines changed: 415 additions & 2 deletions

File tree

.github/workflows/nightly.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -154,8 +154,9 @@ jobs:
154154
# + pgbench + crash-recovery) gets its own shard so the multi-node
155155
# legs run in parallel and do not extend the ges-locking shard
156156
# wallclock. t/327 HW workload / t/328 MG-B write perf / t/329 MG-D
157-
# heap-ITL WAL measure / t/330 production-bench-subset.
158-
- { name: stage5-integrated-acceptance, ranges: "327-330", unit: false, regress: false }
157+
# heap-ITL WAL measure / t/330 production-bench-subset / t/331 4-node
158+
# reconfig fault matrix.
159+
- { name: stage5-integrated-acceptance, ranges: "327-331", unit: false, regress: false }
159160
steps:
160161
- name: Checkout
161162
uses: actions/checkout@v4

0 commit comments

Comments
 (0)