Skip to content

Commit 2056233

Browse files
author
SqlRush
committed
test(cluster): spec-5.58 acceptance — opus review fixes (2 P1 + 4 P2)
Post-ship opus code review (0 P0, suite confirmed a genuine non-fake gate): false-visible caught by the t/319 differential, boundary degradation by t/321 (code-verified unconditional 53R9G), stale-reuse by the in-range forwarded t/306. The findings were coverage-claim overstatements, not correctness holes. P1-1 (t/319): the header claimed a shared-L2-pool cross-backend HIT, but the actual L4 assertion is cr_tuple_verdict_count (a plain verdict SELECT does NOT warm the pool -- the fast path short-circuits the full-block construct). Rewrote the header/abstract to state the real proof + that the pool HIT is gated by t/303/t/306 (re-run in nightly 276-323); deleted the dead pool_hit sub. P1-2 (t/322): L3/L4/L5 are CURRENT-snapshot reads that do not enter the CR construct/pool path, so they are DDL-correctness smoke checks, not CR-pool stale-reuse gates. Softened the docstring + labels accordingly and stated that the genuine relfilenode-reuse stale-serve gate is t/306 (re-run by 5.58); L6 remains the real held-snapshot historical CR read. P2-1 (t/320 L5): tightened the clean-construct leg from an or-tautology to is(eclean,'') so it can fail if the band leaks a masked error. P2-2 (t/319): added a non-empty guard (>300 rows) so the master differential cannot pass on empty input. P2-3 (t/319): assert the pre-churn relation fits the sampled 4-block range so the cap cannot silently drop authoritative rows. P2-5: dropped unused Time::HiRes imports (t/319/320/322/323). Tests-only; no behavior change. Spec: spec-5.58 (FROZEN v1.0)
1 parent 614c1f2 commit 2056233

4 files changed

Lines changed: 60 additions & 39 deletions

src/test/cluster_tap/t/319_cluster_5_58_cr_fullstack_differential.pl

Lines changed: 22 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -22,19 +22,25 @@
2222
# (cluster_cr.c:1958) = L1 cache -> L2 pool -> reconstruct, i.e. the LIVE
2323
# optimized band.
2424
# * two backends (R1, R2) take REPEATABLE READ snapshots in the same
25-
# quiescent window so they share one read_scn (5.50 §17.3): R1's read
26-
# publishes the CR image to the shared L2 pool, R2's read HITS it
27-
# cross-backend -> proves the pool really fired (not all-fallback).
25+
# quiescent window so they share one read_scn (5.50 §17.3) -> cross-backend
26+
# equivalence of the live optimized read. (The tuple fast path serves
27+
# these verdicts WITHOUT warming the L2 pool; the pool cross-backend HIT is
28+
# gated by the dedicated t/303 + t/306, re-run in nightly range 276-323.)
2829
#
2930
# Assertions (own-instance, single all-on node):
3031
# L1 R1 and R2 share one read_scn (quiescent window).
3132
# L2 the REAL optimized read == the AUTHORITATIVE SRF image, exact visible
3233
# (id,v) set, for every sampled block (the master differential, HG#1).
3334
# L3 cross-backend equivalence: R1's optimized read == R2's optimized read
3435
# (no false-visible / extra / missing row across backends).
35-
# L4 the band REALLY fired: the shared L2 pool recorded a cross-backend HIT
36-
# (R2 served from R1's published image) -- the equivalence is over the
37-
# live optimized path (L250).
36+
# L4 the band REALLY fired: the tuple-level fast path served a DECIDED
37+
# verdict (cr_tuple_verdict_count advanced -- a distinct counter from the
38+
# fallback outcomes), so the differential above is over the LIVE
39+
# optimized path, not an all-fallback no-op (L250). (A plain verdict
40+
# SELECT does NOT warm the shared L2 pool -- the fast path short-circuits
41+
# the full-block construct -- so the pool cross-backend HIT is gated by
42+
# t/303 + t/306, which spec-5.58 re-runs in nightly range 276-323, not
43+
# here.)
3844
# L5 determinism: re-reading is identical across >=5 runs (no flake).
3945
#
4046
# IDENTIFICATION
@@ -53,7 +59,6 @@
5359
use lib "$FindBin::RealBin/../lib";
5460

5561
use Test::More;
56-
use Time::HiRes qw(usleep);
5762
use PgracClusterNode;
5863

5964
# ---- all-on profile: the whole CR read-path band live ---------------------
@@ -83,7 +88,6 @@ sub cr_counter
8388
qq{SELECT COALESCE((SELECT value::bigint FROM pg_cluster_state
8489
WHERE category='$cat' AND key='$key'), 0)}) + 0);
8590
}
86-
sub pool_hit { return cr_counter('cr_pool', 'hit_count'); }
8791
sub tuple_verdict { return cr_counter('cr', 'cr_tuple_verdict_count'); }
8892

8993
# Authoritative ground truth: raw full-block reconstruct (no cache) as a sorted
@@ -111,6 +115,12 @@ sub auth_digest
111115

112116
my $maxblk = $node->safe_psql('postgres',
113117
q{SELECT GREATEST(0, (pg_relation_size('t_diff') / current_setting('block_size')::int)::int - 1)});
118+
# P2-3 (review): the authoritative SRF is sampled over blocks 0..maxblk while the
119+
# optimized read is the whole table; assert the pre-churn relation fits within the
120+
# sampled range so the 4-block cap below can never silently drop rows from the
121+
# authoritative side (which would make L2a fail for the wrong reason).
122+
cmp_ok($maxblk, '<=', 3,
123+
"L1 pre-churn t_diff fits in the sampled block range (maxblk=$maxblk <= 3)");
114124
$maxblk = 3 if $maxblk > 3; # sample the first few blocks
115125

116126
# Two readers take REPEATABLE READ snapshots in the same quiescent window: their
@@ -154,6 +164,10 @@ sub auth_digest
154164
# Normalize: both are sorted (id,v) over the same visible set; compare as sets.
155165
my %rset = map { $_ => 1 } split /\|/, $real1;
156166
my %aset = map { $_ => 1 } split /\|/, $auth;
167+
# P2-2 (review): the master differential must not pass on empty input (both sets
168+
# empty would make the diff asserts trivially green).
169+
cmp_ok(scalar keys %rset, '>', 300,
170+
'L2 sanity: the optimized read returned the expected ~400 pre-churn rows (not empty)');
157171
my $only_real = join(',', grep { !$aset{$_} } sort keys %rset);
158172
my $only_auth = join(',', grep { !$rset{$_} } sort keys %aset);
159173

src/test/cluster_tap/t/320_cluster_5_58_cr_failclosed_direction.pl

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,6 @@
4646
use lib "$FindBin::RealBin/../lib";
4747

4848
use Test::More;
49-
use Time::HiRes qw(usleep);
5049
use PgracClusterNode;
5150

5251
# all-on band profile
@@ -127,12 +126,15 @@ sub cr_counter
127126
}
128127
is($corrupt_hits, 4, 'L4 cr_corruption fails closed data_corrupted for all 4 subkinds (band on)');
129128

130-
# --- L5: clean construct (no injection) is deterministic CR behavior ---
129+
# --- L5: clean construct (no injection, max read_scn) SUCCEEDS -------------
130+
# All injections disarmed and read_scn = INT64_MAX (the watermark can never
131+
# exceed it, so no 53R9F), so the construct must succeed with NO error -- this
132+
# leg can actually fail if the band leaks a stale/masked error after the armed
133+
# cases (review P2-1: was an or-tautology that passed on any CR-shaped error).
131134
$s->query("SELECT cluster_cr_test_construct('t_fc'::regclass,0,0,9223372036854775807)");
132135
my $eclean = $s->{stderr}; $s->{stderr} = '';
133-
ok($eclean eq '' || $eclean =~ /(53R9F|53R9G|XX001|cluster CR)/,
134-
'L5 a clean construct is deterministic CR behavior, not a leftover/masked error '
135-
. '(the band did not enter a broken state)');
136+
is($eclean, '',
137+
'L5 a clean construct succeeds with no error (the band did not enter a broken state)');
136138
$s->quit;
137139

138140
# --- L6: the fail-closed paths really ran (counters advanced) ---

src/test/cluster_tap/t/322_cluster_5_58_cr_no_stale_reuse.pl

Lines changed: 31 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -4,27 +4,34 @@
44
# 322_cluster_5_58_cr_no_stale_reuse.pl
55
# spec-5.58 D2 (HG#2) — no stale reuse under lifecycle churn, full band on.
66
#
7-
# With the WHOLE CR read-path band live (shared L2 pool + admission + tuple
8-
# fast path + per-rel generation + resolver + L1 cache), prove that lifecycle
9-
# churn that frees/rewrites a relfilenode (TRUNCATE / VACUUM FULL / CLUSTER /
10-
# DROP+recreate) NEVER lets a cached CR image be served stale: a read after
11-
# the churn reflects the NEW relation state, never the pre-churn image, and
12-
# the pool/cache invalidation (epoch bump on relfilenode unlink) really fires.
13-
# This is the combination-level check (5.53 relfilenode-reuse fence + 5.56
14-
# lifecycle epoch bump live at once).
7+
# With the WHOLE CR read-path band live, prove that lifecycle churn that
8+
# frees/rewrites a relfilenode (TRUNCATE / VACUUM FULL / DROP+recreate) leaves
9+
# the relation correct and the band live -- DDL correctness with the band on,
10+
# plus a held-snapshot historical CR read that still reconstructs correctly
11+
# after the churn.
12+
#
13+
# Scope (review P1-2, honest): L3/L4/L5 are CURRENT-snapshot reads; a current
14+
# read sees live tuples via native MVCC and does NOT enter the CR construct /
15+
# pool path (only an as-of-snapshot HISTORICAL read does -- L6), so these legs
16+
# cannot serve a cached CR image stale and are DDL-correctness smoke checks,
17+
# not CR-pool stale-reuse gates. The genuine CR-pool relfilenode-reuse
18+
# stale-serve gate (shared-read_scn cross-backend pool HIT -> epoch bump on
19+
# unlink -> the stale-epoch image MISSES) is spec-5.53 t/306, which spec-5.58
20+
# re-runs in nightly range 276-323; the tuple fast path never reads the pool,
21+
# so its isolation coverage == its combination coverage on this axis.
1522
#
1623
# L1 band-all-on node up.
17-
# L2 warm the band: a long-snapshot CR read populates the pool/cache with
18-
# the relation's images.
19-
# L3 TRUNCATE + reinsert DIFFERENT data -> a fresh read reflects the new
20-
# data, never the warmed (pre-truncate) image (no stale serve).
21-
# L4 VACUUM FULL (relfilenode rewrite, same data) -> the optimized read
22-
# still equals the authoritative reconstruct (differential preserved
23-
# across a rewrite).
24-
# L5 DROP + recreate + insert -> a read of the recreated relation reflects
25-
# ONLY the new data (no over-hit from the old relfilenode's pool entries).
26-
# L6 the pool epoch really bumped across the churn (relfilenode-unlink
27-
# invalidation fired -- the no-stale guarantee is enforced, not luck).
24+
# L2 a long-snapshot CR read sees the pre-churn relation (band engaged).
25+
# L3 TRUNCATE + reinsert DIFFERENT data -> a current read reflects ONLY the
26+
# new data (DDL correctness, band on).
27+
# L4 VACUUM FULL (relfilenode rewrite) -> a current read reflects ONLY the
28+
# new data exactly (no loss / no duplicate across the rewrite).
29+
# L5 DROP + recreate + insert -> a current read reflects ONLY the reborn
30+
# data (DDL correctness, band on).
31+
# L6 a held-snapshot HISTORICAL CR read after all the churn STILL
32+
# reconstructs the pre-update image exactly, and the tuple fast path
33+
# still fires -- the band is live + correct after lifecycle churn (not a
34+
# broken state).
2835
#
2936
# IDENTIFICATION
3037
# src/test/cluster_tap/t/322_cluster_5_58_cr_no_stale_reuse.pl
@@ -42,7 +49,6 @@
4249
use lib "$FindBin::RealBin/../lib";
4350

4451
use Test::More;
45-
use Time::HiRes qw(usleep);
4652
use PgracClusterNode;
4753

4854
my $node = PgracClusterNode->new('cr_nsr');
@@ -90,7 +96,7 @@ sub tuple_verdict
9096
$r->query_safe('ROLLBACK');
9197
$r->quit;
9298
chomp($warm);
93-
is($warm, '300', 'L2 long-snapshot CR read warmed the band (sees all 300 pre-churn rows)');
99+
is($warm, '300', 'L2 long-snapshot CR read sees all 300 pre-churn rows (band engaged)');
94100

95101
# ---- L3: TRUNCATE + reinsert different -> no stale serve -------------------
96102
$node->safe_psql('postgres', 'TRUNCATE t_nsr');
@@ -100,7 +106,7 @@ sub tuple_verdict
100106
q{SELECT count(*) || ',' || count(*) FILTER (WHERE v LIKE 'NEW_%') || ',' ||
101107
count(*) FILTER (WHERE v LIKE 'orig_%' OR v LIKE 'upd%') FROM t_nsr});
102108
is($after_trunc, '50,50,0',
103-
'L3 after TRUNCATE+reinsert: read reflects ONLY the new 50 rows, NO stale pre-truncate image');
109+
'L3 after TRUNCATE+reinsert: current read reflects ONLY the new 50 rows (DDL correctness, band on)');
104110

105111
# ---- L4: VACUUM FULL (rewrite) -> differential preserved -------------------
106112
$node->safe_psql('postgres', 'VACUUM FULL t_nsr');
@@ -109,7 +115,7 @@ sub tuple_verdict
109115
my $expect_vac = $node->safe_psql('postgres',
110116
q{SELECT string_agg(g||':'||'NEW_'||g, '|' ORDER BY g) FROM generate_series(1,50) g});
111117
is($after_vac, $expect_vac,
112-
'L4 after VACUUM FULL (relfilenode rewrite): every row exactly the new data (no stale, no loss)');
118+
'L4 after VACUUM FULL (relfilenode rewrite): current read is exactly the new data (no loss, no duplicate)');
113119

114120
# ---- L5: DROP + recreate + insert -> no over-hit --------------------------
115121
$node->safe_psql('postgres', 'DROP TABLE t_nsr');
@@ -119,8 +125,8 @@ sub tuple_verdict
119125
my $after_drop = $node->safe_psql('postgres',
120126
q{SELECT count(*) || ',' || count(*) FILTER (WHERE v LIKE 'REBORN_%') FROM t_nsr});
121127
is($after_drop, '20,20',
122-
'L5 after DROP+recreate+insert: read reflects ONLY the reborn 20 rows '
123-
. '(no over-hit from the old relfilenode pool entries)');
128+
'L5 after DROP+recreate+insert: current read reflects ONLY the reborn 20 rows '
129+
. '(DDL correctness, band on)');
124130

125131
# ---- L6: the band is STILL live + correct after all the churn --------------
126132
# A fresh long-snapshot CR read on the reborn relation reconstructs correctly

src/test/cluster_tap/t/323_cluster_5_58_cr_real_benefit_path.pl

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,6 @@
4747
use lib "$FindBin::RealBin/../lib";
4848

4949
use Test::More;
50-
use Time::HiRes qw(usleep);
5150
use PgracClusterNode;
5251

5352
my $node = PgracClusterNode->new('cr_rb');

0 commit comments

Comments
 (0)