Skip to content

Commit fa889e7

Browse files
author
SqlRush
committed
feat(cluster): spec-5.57 cross-instance CR read-path coordinator boundary
Make the "cross-node table content read is unavailable" wall an explicit, observable, frozen read-path coordinator boundary (CR-9) instead of six scattered fail-closed + forward-link sites. Contract + hardening + observability only; the cross-instance data plane stays fail-closed and forwards to Stage 6 (#119) -- the value gate proves it is unreachable in Stage 5.5 (legal NO-GO). - D2 (Q11-A): CR-side remote-UBA pre-check classifies the segment owner BEFORE the undo read; a runtime-warm cross-instance origin (class③) fails closed with the canonical 53R9G via a shared refusal routine, instead of being conflated with own-instance retention-recycled undo (53R9F). NON-DEGRADABLE: the ereport fires under any GUC value; cluster_undo_get_record's return-0 contract (recovery / SRF callers) is preserved, not blanket-converted to ERROR. - D3: independent observability region "pgrac cluster cr coordinator" + 4 counters + pg_cluster_state category 'cr_coord'; GUCs cluster.cross_instance_cr_coordinator (off/boundary/forward) + cluster.cross_instance_cr_probe. The GUC gates only the observability surface, never the fail-closed boundary (rule 8.A). - D1: unify the W1/W2/W3 (+ stale GCS) forward-links onto Spec: spec-5.57. - D4/D5/D6/D7/D8: freeze the version-selection + requester-side fabrication contract, the cross-instance key/epoch identity contract (origin is a routing dimension, not a CR key field), the C4 class③ invalidation contract, and the wire payload byte-layout (spec text only, no struct/enum -- AD-013). - D0: scripts/perf/run-cross-instance-cr-probe.sh measure-leg (static value gate + soundness-gate checklist + self-test). - D9: cluster_unit test_cluster_cr_coordinator (classifier / counters / namespace / GUC defaults) + cluster_tap t/318 (real 2-node fail-closed boundary e2e: 53R9G + counters, probe, forward LOG-once, off-mode non-degradability, SRF return-0 audit, measure-leg). catversion not bumped (no catalog / SRF / wire ABI). Region +1 / category +1 baselines regrounded. Spec: spec-5.57-cross-instance-cr-current-coordinator.md
1 parent 95de913 commit fa889e7

29 files changed

Lines changed: 1299 additions & 67 deletions
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
#!/bin/bash
2+
#-------------------------------------------------------------------------
3+
#
4+
# run-cross-instance-cr-probe.sh -- spec-5.57 D0 measure-leg (value gate +
5+
# soundness gate) for the cross-instance CR/current read-path coordinator
6+
# boundary.
7+
#
8+
# This is a MEASURE-ONLY tool (zero product code). It answers the spec-5.57
9+
# value-gate question: "is the class③ runtime-warm-remote CR data plane reachable
10+
# / worth opening in Stage 5.5?" The answer is determined by STATIC evidence that
11+
# survives in CI (no harness needed): the codebase has NO over-the-wire remote
12+
# undo fetch path and the CR walker fail-closes class③ with 53R9G. Therefore the
13+
# value-gate verdict is NO-GO-in-5.5 (data plane -> Stage 6 #119) -- a LEGAL NO-GO
14+
# (據實 not降 scope), same paradigm as spec-5.50/5.55/5.56.
15+
#
16+
# It ALSO prints the L257 soundness-gate checklist (§3.4) with a per-item ruling:
17+
# 5.57 verifies failure-direction (GREEN, the fail-closed boundary) here; the
18+
# other four gates are Stage 6 data-plane ship gates (listed, owned by Stage 6).
19+
#
20+
# An OPTIONAL --dynamic 2-node leg (off by default; needs the local IPC::Run TAP
21+
# harness) confirms at runtime that cross-node reads fail closed.
22+
#
23+
# Author: SqlRush <sqlrush@gmail.com>
24+
# Portions Copyright (c) 2026, pgrac contributors
25+
#
26+
# Spec: spec-5.57-cross-instance-cr-current-coordinator.md (FROZEN v1.0)
27+
#
28+
# IDENTIFICATION
29+
# scripts/perf/run-cross-instance-cr-probe.sh
30+
#
31+
# NOTES
32+
# pgrac-original. --self-test parses the --static output (L223: validate
33+
# CONTENT, not mere file existence) and is the CI-safe entry point.
34+
#
35+
#-------------------------------------------------------------------------
36+
set -euo pipefail
37+
38+
PROGNAME="run-cross-instance-cr-probe.sh"
39+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
40+
SRCROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
41+
42+
CR_C="$SRCROOT/src/backend/cluster/cluster_cr.c"
43+
UNDO_C="$SRCROOT/src/backend/cluster/cluster_undo_record.c"
44+
WAIT_H="$SRCROOT/src/include/utils/wait_event.h"
45+
46+
usage() {
47+
cat <<EOF
48+
$PROGNAME -- spec-5.57 D0 cross-instance CR measure-leg.
49+
50+
Usage:
51+
$PROGNAME --static Emit the static value-gate + soundness-gate evidence.
52+
$PROGNAME --self-test Run --static, parse the output, assert the verdict (CI-safe).
53+
$PROGNAME --dynamic (optional) Additionally run a 2-node fail-closed probe.
54+
$PROGNAME --help
55+
EOF
56+
}
57+
58+
# ---- static value gate -------------------------------------------------
59+
# Each GATE line is machine-parseable: "GATE <name> <PASS|FAIL> <detail>".
60+
emit_static() {
61+
echo "# spec-5.57 D0 measure-leg: cross-instance CR value gate (STATIC evidence)"
62+
echo "# srcroot=$SRCROOT"
63+
64+
# E1: no over-the-wire remote undo fetch -- cluster_undo_get_record returns 0
65+
# for a non-own, non-materialized (runtime-warm) origin.
66+
if grep -q "runtime cross-instance undo read not supported" "$UNDO_C" 2>/dev/null; then
67+
echo "GATE no_remote_undo_fetch PASS cluster_undo_get_record returns 0 for runtime cross-instance (no wire fetch)"
68+
else
69+
echo "GATE no_remote_undo_fetch FAIL marker not found in cluster_undo_record.c"
70+
fi
71+
72+
# E2: the CR walker fail-closes class③ with the canonical 53R9G (not a silent
73+
# fallback, not 53R9F snapshot-too-old) at the remote-undo-read leg.
74+
if grep -q "remote-undo-read \+\"" "$CR_C" 2>/dev/null \
75+
|| grep -q "remote-undo-read" "$CR_C" 2>/dev/null; then
76+
echo "GATE class3_fail_closed PASS CR walker fail-closes class③ with ERRCODE_CLUSTER_CR_CROSS_INSTANCE_UNSUPPORTED (53R9G)"
77+
else
78+
echo "GATE class3_fail_closed FAIL pre-check marker not found in cluster_cr.c"
79+
fi
80+
81+
# E3: GCS ships only current blocks; the CR-ship wait events are reserved /
82+
# dormant (declared but no producer) -- no CR-image wire path.
83+
if grep -q "WAIT_EVENT_BUFFER_SHIP_CR_BUILD" "$WAIT_H" 2>/dev/null \
84+
&& ! grep -rq "WAIT_EVENT_BUFFER_SHIP_CR_BUILD" "$SRCROOT/src/backend/cluster/cluster_gcs_block.c" 2>/dev/null; then
85+
echo "GATE no_cr_image_wire PASS WAIT_EVENT_BUFFER_SHIP_CR_BUILD/SEND reserved but dormant (no GCS producer)"
86+
else
87+
echo "GATE no_cr_image_wire FAIL CR-ship wait event has a producer or is missing"
88+
fi
89+
90+
# Value-gate verdict: all three => the data plane is unreachable in Stage 5.5.
91+
echo "VERDICT value_gate NO_GO_IN_5_5 data plane forwarded to Stage 6 (#119 undo-block Cache Fusion); spec-5.57 ships boundary + contract only (legal NO-GO)"
92+
93+
echo "# spec-5.57 §3.4 soundness-gate checklist (L257): 5.57 verifies failure-direction; the rest are Stage 6 ship gates."
94+
echo "SOUNDNESS failure_direction VERIFIED_5_57 any uncertainty -> ERROR 53R9G, never visible (CR walker pre-check; TAP fail-closed e2e)"
95+
echo "SOUNDNESS base_availability STAGE6_OWNS current block image + WAL-before-ship (feature-019)"
96+
echo "SOUNDNESS metadata_availability STAGE6_OWNS remote undo image is real evidence (not ITL stamp hint); INDOUBT -> fail-closed"
97+
echo "SOUNDNESS replay_determinism STAGE6_OWNS v2 column-diff cross-node reconstruct determinism (spec-4.9ab reader)"
98+
echo "SOUNDNESS invalidation_completeness STAGE6_OWNS remaster/DROP/TRUNCATE/relfilenode-reuse/retention + fence_epoch invalidation"
99+
}
100+
101+
# ---- self-test: parse --static, assert the verdict ---------------------
102+
self_test() {
103+
local out rc=0
104+
out="$(emit_static)"
105+
106+
echo "$out"
107+
echo "# ---- self-test assertions ----"
108+
109+
# every GATE line must be PASS
110+
if echo "$out" | grep '^GATE ' | grep -qv ' PASS '; then
111+
echo "not ok - some GATE line is not PASS"
112+
rc=1
113+
else
114+
echo "ok - all GATE lines PASS"
115+
fi
116+
117+
# the value-gate verdict must be the legal NO-GO
118+
if echo "$out" | grep -q '^VERDICT value_gate NO_GO_IN_5_5'; then
119+
echo "ok - value gate verdict is NO_GO_IN_5_5 (data plane -> Stage 6)"
120+
else
121+
echo "not ok - value gate verdict missing/unexpected"
122+
rc=1
123+
fi
124+
125+
# failure-direction must be verified by 5.57 itself
126+
if echo "$out" | grep -q '^SOUNDNESS failure_direction VERIFIED_5_57'; then
127+
echo "ok - failure-direction soundness gate verified in 5.57"
128+
else
129+
echo "not ok - failure-direction soundness gate not verified"
130+
rc=1
131+
fi
132+
133+
# exactly 5 soundness gates listed (§3.4)
134+
local n_sound
135+
n_sound="$(echo "$out" | grep -c '^SOUNDNESS ')"
136+
if [ "$n_sound" -eq 5 ]; then
137+
echo "ok - 5 soundness gates listed (§3.4)"
138+
else
139+
echo "not ok - expected 5 soundness gates, found $n_sound"
140+
rc=1
141+
fi
142+
143+
if [ "$rc" -eq 0 ]; then
144+
echo "# self-test PASSED"
145+
else
146+
echo "# self-test FAILED"
147+
fi
148+
return "$rc"
149+
}
150+
151+
# ---- optional dynamic 2-node fail-closed probe -------------------------
152+
dynamic_probe() {
153+
echo "# spec-5.57 D0 dynamic leg: a 2-node fail-closed probe lives in the TAP"
154+
echo "# harness (src/test/cluster_tap/t/318_*.pl). The data plane is"
155+
echo "# unreachable, so the only observable runtime outcome is fail-closed"
156+
echo "# 53R9G + cross_instance_cr_refused/remote_undo_read_refused counters."
157+
echo "# Run it via: cd src/test/cluster_tap && PERL5LIB=\$HOME/perl5/lib/perl5 \\"
158+
echo "# prove -v t/318_cluster_5_57_cross_instance_boundary.pl"
159+
}
160+
161+
case "${1:---help}" in
162+
--static) emit_static ;;
163+
--self-test) self_test ;;
164+
--dynamic) emit_static; echo; dynamic_probe ;;
165+
--help | -h) usage ;;
166+
*) usage; exit 2 ;;
167+
esac

src/backend/cluster/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,7 @@ OBJS = \
162162
cluster_cr_admit_stat.o \
163163
cluster_cr_tuple.o \
164164
cluster_cr_tuple_stat.o \
165+
cluster_cr_coordinator_stat.o \
165166
cluster_cr_srf.o \
166167
cluster_resolver_cache.o \
167168
cluster_tt_slot.o \

src/backend/cluster/cluster_cr.c

Lines changed: 118 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,8 @@
4949
#include "cluster/cluster_cr_admit.h" /* spec-5.52 D2: insert-side admission gate */
5050
#include "cluster/cluster_cr_apply.h"
5151
#include "cluster/cluster_cr_cache.h"
52-
#include "cluster/cluster_cr_pool.h" /* spec-5.51 D4: shared L2 CR pool */
52+
#include "cluster/cluster_cr_coordinator_stat.h" /* spec-5.57 D2/D3: coordinator boundary */
53+
#include "cluster/cluster_cr_pool.h" /* spec-5.51 D4: shared L2 CR pool */
5354
#include "cluster/cluster_cr_tuple.h" /* spec-5.54: tuple-level / verdict-only fast path */
5455
#include "cluster/cluster_conf.h" /* spec-3.24 D1: cluster_conf_has_peers */
5556
#include "cluster/cluster_guc.h" /* cluster_cr_chain_walk_max_steps, cluster_node_id */
@@ -237,6 +238,55 @@ cr_scratch_ensure(void)
237238
}
238239

239240

241+
/*
242+
* cr_coordinator_refuse_runtime_remote -- spec-5.57 D2: the single fail-closed
243+
* refusal routine for a class③ (runtime-warm remote) CR origin. Shared by the
244+
* CR-side pre-check (the real boundary, on a real remote UBA) AND the W2 test
245+
* injection (a synthetic class③), so the injection exercises the exact runtime
246+
* behavior the production pre-check uses.
247+
*
248+
* One class③ refusal is BOTH the boundary headline (cross_instance_cr_refused)
249+
* and specifically the remote-undo-read leg (remote_undo_read_refused); both
250+
* bump by construction. The rare W1 header-mismatch belt bumps only
251+
* cross_instance_cr_refused, giving the invariant
252+
* cross_instance_cr_refused >= remote_undo_read_refused.
253+
*
254+
* NON-DEGRADABLE (rule 8.A, §2.2): the ereport fires under ANY GUC value; the
255+
* GUC only gates the advisory counters / probe / LOG-once. This function never
256+
* returns (it always ereport(ERROR)s). The data plane is Stage 6 (#119).
257+
*/
258+
static void
259+
pg_attribute_noreturn() cr_coordinator_refuse_runtime_remote(int origin_node)
260+
{
261+
if (cluster_cross_instance_cr_coordinator != CR_COORD_MODE_OFF) {
262+
cluster_cr_coordinator_stat_bump(CR_COORD_CROSS_INSTANCE_CR_REFUSED);
263+
cluster_cr_coordinator_stat_bump(CR_COORD_REMOTE_UNDO_READ_REFUSED);
264+
}
265+
/* D0 measure-leg: count the class③ hit (behavior unchanged -- still fails). */
266+
if (cluster_cross_instance_cr_probe)
267+
cluster_cr_coordinator_stat_bump(CR_COORD_CROSS_INSTANCE_BOUNDARY_PROBE);
268+
/* forward mode: LOG-once that the data plane is Stage 6 (L213: once per
269+
* backend so the hot path is never flooded). */
270+
if (cluster_cross_instance_cr_coordinator == CR_COORD_MODE_FORWARD) {
271+
static bool forward_logged = false;
272+
273+
if (!forward_logged) {
274+
forward_logged = true;
275+
elog(LOG, "cluster.cross_instance_cr_coordinator=forward is a contract "
276+
"placeholder: cross-instance CR/undo data plane lands in Stage 6 "
277+
"(#119); reads stay fail-closed (Spec: spec-5.57)");
278+
}
279+
}
280+
ereport(ERROR, (errcode(ERRCODE_CLUSTER_CR_CROSS_INSTANCE_UNSUPPORTED),
281+
errmsg("cluster CR cross-instance UBA encountered at the remote-undo-read "
282+
"leg (origin_node_id=%d, local=%d)",
283+
origin_node, cluster_node_id),
284+
errhint("Own-instance CR only unless the origin was materialized by merged "
285+
"recovery; the runtime cross-instance CR/undo data plane lands in "
286+
"Stage 6 (#119 undo-block Cache Fusion); see Spec: spec-5.57.")));
287+
}
288+
289+
240290
/* ============================================================
241291
* Test injection hooks (spec-3.9 Step 7; SKIP-style precondition)
242292
* ============================================================ */
@@ -260,11 +310,11 @@ cr_check_error_injections(void)
260310
"cluster_inject_fault('cr_snapshot_too_old','none',0).")));
261311

262312
if (cluster_cr_injection_armed("cr_cross_instance", &param))
263-
ereport(ERROR,
264-
(errcode(ERRCODE_CLUSTER_CR_CROSS_INSTANCE_UNSUPPORTED),
265-
errmsg("cluster CR cross-instance UBA (injected; origin_node_id=%u, local=%d)",
266-
(uint32)param, cluster_node_id),
267-
errhint("test injection cr_cross_instance; spec-3.9 is own-instance only.")));
313+
/* spec-5.57 D2/D3: synthetic class③ refusal -- drive the SAME fail-closed
314+
* routine the production pre-check uses (53R9G + both coordinator counters
315+
* + probe/forward), so the TAP injection legs exercise the real boundary
316+
* behavior deterministically (param = synthetic origin_node_id). */
317+
cr_coordinator_refuse_runtime_remote((int)param);
268318

269319
if (cluster_cr_injection_armed("cr_corruption", &param)) {
270320
const char *kind = (param == 1) ? "uba_decode"
@@ -341,6 +391,39 @@ cr_walk_chain(char *scratch_page, UBA start_uba, SCN read_scn,
341391
ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED),
342392
errmsg("cluster CR encountered a malformed UBA in the undo chain")));
343393

394+
/*
395+
* spec-5.57 D2 (Q11-A): CR-side remote-UBA pre-check -- the read-path
396+
* coordinator boundary, applied BEFORE the undo read. Derive the
397+
* segment owner from the UBA and classify it (§0.1). A runtime-warm
398+
* cross-instance origin (class③: not own, not merged-materialized) is
399+
* the net-new boundary: fail closed HERE with the canonical 53R9G,
400+
* rather than letting cluster_undo_get_record() return 0 and conflate it
401+
* with own-instance retention-recycled undo (53R9F below). This is the
402+
* W3 hardening: the remote-undo-read leg now fails closed with the SAME
403+
* errcode as the W1 walker wall (errcode consolidation, CR-9). It is
404+
* NON-DEGRADABLE -- the ereport fires under any GUC value; the GUC only
405+
* gates the advisory counters (rule 8.A, §2.2). Own-instance is the
406+
* common OLTP path: classify returns OWN and we fall straight through.
407+
* The data plane (real remote undo fetch) is Stage 6 (#119).
408+
* See Spec: spec-5.57 §3.1 (W3) / §2.1 (three roles).
409+
*/
410+
{
411+
NodeId cr_origin = uba_origin_node_id(uba);
412+
ClusterCrCoordOriginClass cr_origin_class
413+
= cluster_cr_coordinator_classify_origin(cr_origin);
414+
415+
if (cr_origin_class == CR_COORD_ORIGIN_RUNTIME_REMOTE) {
416+
/* class③: fail closed via the shared refusal routine (53R9G +
417+
* both coordinator counters; non-degradable, §2.2). */
418+
cr_coordinator_refuse_runtime_remote((int)cr_origin);
419+
} else if (cr_origin_class == CR_COORD_ORIGIN_MATERIALIZED_REMOTE) {
420+
/* class②: merged-materialized remote, served from the local tree
421+
* (already shipped, spec-4.5a D8). Count the serve (advisory). */
422+
if (cluster_cross_instance_cr_coordinator != CR_COORD_MODE_OFF)
423+
cluster_cr_coordinator_stat_bump(CR_COORD_MATERIALIZED_REMOTE_SERVED);
424+
}
425+
}
426+
344427
len = cluster_undo_get_record(uba, record_buf.data, sizeof(record_buf.data));
345428
if (len == 0)
346429
ereport(ERROR,
@@ -361,30 +444,40 @@ cr_walk_chain(char *scratch_page, UBA start_uba, SCN read_scn,
361444

362445
/* Own-instance, or a merged-materialized remote instance whose undo
363446
* lives in the local pg_undo/instance_<origin> tree (spec-4.5a D8).
364-
* Anything else stays the spec-3.9 fail-closed.
447+
* Anything else stays fail-closed (Spec: spec-5.57 §3.1 W1).
448+
*
449+
* This is the W1 belt: the spec-5.57 D2 segment-derived pre-check above
450+
* (on uba_origin_node_id) catches the common class③ case BEFORE the undo
451+
* read, so this header-origin check now fires only on the rare segment-
452+
* vs-header mismatch (D8 §2.5 cross-check) -- defense-in-depth, never a
453+
* silent assumption.
365454
*
366455
* spec-5.56 C4 (reconfig contract, §3.3): this carve-out is ALSO the
367456
* fail-closed boundary that keeps the THIRD origin class — runtime warm
368457
* remote (not own, not merged-materialized) — OUT of the CR pool: it never
369-
* constructs (ERROR below) so it never caches. The two pool-eligible
370-
* classes are reconfig-INVARIANT and need NO membership/remaster
371-
* invalidation: (①) own-instance pages/undo are unchanged by reconfig; (②)
372-
* merged-materialized remote undo is durable in the local tree with a
373-
* reconfig-invariant merge_recovered_lsn authority, and an origin rejoin's
374-
* NEW writes are new versions => new key => MISS (already fenced by C1/key).
375-
* read_scn is a GLOBAL SCN (AD-008), not a membership epoch (INV-C2). The
376-
* runtime-warm-remote class's reconfig/remaster invalidation is forwarded to
377-
* spec-5.57 (where construct stops fail-closing it); until then this ERROR
378-
* is the C4 class-③ guard (INV-C3), not a silent assumption. */
458+
* constructs (ERROR) so it never caches. The two pool-eligible classes are
459+
* reconfig-INVARIANT and need NO membership/remaster invalidation: (①)
460+
* own-instance pages/undo are unchanged by reconfig; (②) merged-materialized
461+
* remote undo is durable in the local tree with a reconfig-invariant
462+
* merge_recovered_lsn authority, and an origin rejoin's NEW writes are new
463+
* versions => new key => MISS (already fenced by C1/key). read_scn is a
464+
* GLOBAL SCN (AD-008), not a membership epoch (INV-C2). spec-5.57 freezes
465+
* this class③ fail-closed as the read-path coordinator boundary (CR-9); the
466+
* runtime-warm-remote data plane lands in Stage 6 (#119). */
379467
if (hdr->origin_node_id != (uint16)cluster_node_id
380-
&& !cluster_merged_instance_is_materialized((int)hdr->origin_node_id))
381-
ereport(ERROR, (errcode(ERRCODE_CLUSTER_CR_CROSS_INSTANCE_UNSUPPORTED),
382-
errmsg("cluster CR cross-instance UBA encountered "
383-
"(origin_node_id=%u, local=%d)",
384-
hdr->origin_node_id, cluster_node_id),
385-
errhint("Own-instance CR only unless the origin was materialized by "
386-
"merged recovery; runtime cross-instance CR is Stage 4 "
387-
"(Cache Fusion CR coordinator).")));
468+
&& !cluster_merged_instance_is_materialized((int)hdr->origin_node_id)) {
469+
if (cluster_cross_instance_cr_coordinator != CR_COORD_MODE_OFF)
470+
cluster_cr_coordinator_stat_bump(CR_COORD_CROSS_INSTANCE_CR_REFUSED);
471+
ereport(ERROR,
472+
(errcode(ERRCODE_CLUSTER_CR_CROSS_INSTANCE_UNSUPPORTED),
473+
errmsg("cluster CR cross-instance UBA encountered "
474+
"(origin_node_id=%u, local=%d)",
475+
hdr->origin_node_id, cluster_node_id),
476+
errhint("Own-instance CR only unless the origin was materialized by "
477+
"merged recovery; the runtime cross-instance CR/undo data plane "
478+
"lands in Stage 6 (#119 undo-block Cache Fusion); see "
479+
"Spec: spec-5.57.")));
480+
}
388481

389482
/* I-chain-1: normal SCN stop. */
390483
if (scn_time_cmp(hdr->write_scn, read_scn) <= 0)

0 commit comments

Comments
 (0)