Skip to content

Commit 5cc307c

Browse files
author
SqlRush
committed
feat(cluster): spec-5.55 shared resolver cache — CR Source 3 by-xid search-shortcut memo
A per-instance shared-memory memo that caches the position the last own-instance WRAP_ANY by-xid durable scan (CR Source 3) matched -- (xid_epoch, raw_xid, origin) -> {seg, slot, wrap} -- so a peer backend re-validates that one slot in O(1) via cluster_tt_slot_durable_lookup instead of re-running the O(segments) scan. Six 8.A safety gates (all default-off; entries=0 is byte-identical): 1. memo non-authoritative: the scn is always a fresh durable re-read. 2. exact-(xid,wrap) re-validation: a recycled/wrap-bumped slot fails -> re-scan. 3. acceptance same-segment rerun: a memo hit runs the PHYSICALLY SAME cluster_cr_accept_resolved_scn helper (wrap_suspect WRAP_ANY + current horizon + sticky retention reliability) the fresh scan runs -> verdict-equivalent by construction; the memo never short-circuits acceptance. 4. xid_epoch fence: a single monotonic own-instance FullTransactionId epoch makes a hint from epoch E a key MISS in E+1, before raw_xid can be reused. 5. cross-instance fail-closed: own-instance scans only. 6. COMMITTED-only. Two modes (cluster.resolver_cache_enabled trust / cluster.resolver_cache_measure diagnostic), both PGC_POSTMASTER, default off. D1 extends the by-xid resolver with optional matched-slot out params (NULL = legacy, no TTSlot ABI / catversion change). §0.6 value gate (measure-first): re-probe hit rate is workload-dependent (high under retained-snapshot multi-reader; the recycle-heavy pure-write leg re-scans safely via gate 2). Default off; the non-zero default + sizing are forward 5.58. Spec: spec-5.55-shared-resolver-cache.md (FROZEN v1.0) Landed onto post-5.15 main (cherry-pick of v0.115.0-stage5.55 094b84c): - renumber t/315_cluster_5_55_resolver_cache -> t/316 (resolves collision with spec-5.15's t/315_cluster_5_15_online_rejoin; nightly range 276-315 -> 276-316) - reground pg_cluster_state category-count baselines 48 -> 49 across t/017/t/023/t/024/t/030/t/204/t/205/t/206 (reconfig_join from spec-5.15 + resolver_cache from spec-5.55 are both always-present categories)
1 parent a3ab9b5 commit 5cc307c

27 files changed

Lines changed: 1701 additions & 59 deletions

src/backend/cluster/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,7 @@ OBJS = \
163163
cluster_cr_tuple.o \
164164
cluster_cr_tuple_stat.o \
165165
cluster_cr_srf.o \
166+
cluster_resolver_cache.o \
166167
cluster_tt_slot.o \
167168
cluster_tt_2pc.o \
168169
cluster_tt_2pc_record.o \

src/backend/cluster/cluster_cr.c

Lines changed: 79 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@
6060
#include "cluster/cluster_touched_peers.h" /* PGRAC: spec-5.14 D2 class 4 */
6161
#include "cluster/cluster_itl_slot.h"
6262
#include "cluster/cluster_shmem.h"
63+
#include "cluster/cluster_resolver_cache.h" /* spec-5.55 D0: by-xid scan measure hook */
6364
#include "cluster/cluster_tt_durable.h" /* spec-3.11 D6: watermark by-xid resolve */
6465
#include "cluster/cluster_tt_slot.h" /* spec-3.22: retention_off_recycle_count */
6566
#include "cluster/cluster_undo_retention.h" /* spec-3.22: retention horizon proof */
@@ -1322,6 +1323,36 @@ cluster_cr_count_xmax_scan_unavail_or_no_proof(void)
13221323
pg_atomic_fetch_add_u64(&CRShared->cr_xmax_scan_unavail_or_no_proof_count, 1);
13231324
}
13241325

1326+
/*
1327+
* cluster_cr_accept_resolved_scn -- spec-4.8 D3 acceptance gate for a Source 3
1328+
* by-xid RESOLVED match, extracted (spec-5.55 D6) so the fresh-scan path AND a
1329+
* shared-resolver-cache memo hit run the PHYSICALLY SAME gate (gate (3),
1330+
* verdict-equivalent by construction -- they cannot drift).
1331+
*
1332+
* The durable scan runs WRAP_ANY (a recycled scratch ITL slot carries no
1333+
* binding wrap), so a single COMMITTED match cannot tell a genuine commit from a
1334+
* 2^32-wrapped raw-xid collision. When retention is unreliable AND the match is
1335+
* below the CURRENT horizon it is wrap-suspect -> fail closed (规则 8.A: a wrong
1336+
* deleter scn would false-hide a live row); the existing
1337+
* wrap_generation_disambiguated observability is bumped. The reliability proof
1338+
* is the EXACT sticky condition (horizon enabled AND no retention-off recycle
1339+
* this incarnation), not an abstraction -- losing the sticky leg would re-trust
1340+
* a retention-off-window recycle. Returns true to ACCEPT (RESOLVED).
1341+
*/
1342+
static bool
1343+
cluster_cr_accept_resolved_scn(SCN scn)
1344+
{
1345+
bool retention_reliable = cluster_undo_retention_horizon_enabled
1346+
&& cluster_tt_slot_retention_off_recycle_count() == 0;
1347+
1348+
if (cluster_tt_recovery_wrap_suspect(CLUSTER_TT_WRAP_ANY, scn, cluster_undo_retention_horizon(),
1349+
retention_reliable)) {
1350+
cluster_tt_recovery_count_wrap_generation_disambiguated();
1351+
return false; /* suspect -> fail closed */
1352+
}
1353+
return true;
1354+
}
1355+
13251356
/*
13261357
* cluster_cr_resolve_xmax_commit_scn -- resolve the EXACT commit_scn of a
13271358
* committed own-instance deleter (cr_xmax) recorded on a CR image, for the
@@ -1347,6 +1378,11 @@ cluster_cr_resolve_xmax_commit_scn(const char *cr_page, uint8 itl_idx, Transacti
13471378
{
13481379
Page page = (Page)cr_page; /* read-only ITL access on the scratch image */
13491380
uint32 expected_wrap = CLUSTER_TT_WRAP_ANY;
1381+
/* spec-5.55 D1: Source 3 RESOLVED also reports the matched durable slot
1382+
* identity, cached as a position hint by the shared resolver cache (D5). */
1383+
uint16 resolved_seg = 0;
1384+
uint16 resolved_slot = 0;
1385+
uint16 resolved_wrap = 0;
13501386

13511387
*out_scn = InvalidScn;
13521388

@@ -1397,33 +1433,61 @@ cluster_cr_resolve_xmax_commit_scn(const char *cr_page, uint8 itl_idx, Transacti
13971433
}
13981434
}
13991435

1436+
/*
1437+
* spec-5.55 D6: shared resolver cache search-shortcut. In TRUST mode a memo
1438+
* hit re-validates the single hint slot in O(1) (gate (1)+(2)+(4)) and runs the
1439+
* PHYSICALLY SAME acceptance gate the fresh scan runs (gate (3)), resolving
1440+
* WITHOUT the O(segments) scan below -- verdict-equivalent by construction. In
1441+
* MEASURE/off mode the authoritative scan still runs (measure still records the
1442+
* would-hit counters). origin == own node (Source 3 is own-instance, gate (5)).
1443+
*/
1444+
{
1445+
SCN hint_scn = InvalidScn;
1446+
1447+
if (cluster_resolver_cache_probe((uint16)cluster_node_id, cr_xmax, &hint_scn)) {
1448+
bool accepted = cluster_cr_accept_resolved_scn(hint_scn);
1449+
1450+
cluster_resolver_cache_count_acceptance(accepted);
1451+
if (cluster_resolver_cache_trust()) {
1452+
if (!accepted) {
1453+
*out_scn = InvalidScn; /* same fail-closed as the fresh-scan branch */
1454+
return CLUSTER_CR_XMAX_INVALID_OR_AMBIGUOUS;
1455+
}
1456+
*out_scn = hint_scn; /* gate (1): durable re-read, verdict-equivalent */
1457+
return CLUSTER_CR_XMAX_RESOLVED_SCN;
1458+
}
1459+
/* MEASURE: never trust the hint -- fall through to the authoritative scan. */
1460+
}
1461+
}
1462+
14001463
/*
14011464
* Source 3: durable TT by exact xid (survives ITL slot recycle). spec-3.22:
14021465
* consume the finer-grained resolve enum so a 0-match (RECYCLED_ZERO_MATCH ->
14031466
* provably below horizon, IF the gate's retention proof holds) is no longer
14041467
* conflated with a delayed-cleanout / wrap / unreadable miss (all fail closed).
14051468
*/
1406-
switch (cluster_tt_slot_durable_resolve_by_xid(cr_xmax, expected_wrap, out_scn)) {
1469+
switch (cluster_tt_slot_durable_resolve_by_xid(cr_xmax, expected_wrap, out_scn, &resolved_seg,
1470+
&resolved_slot, &resolved_wrap)) {
14071471
case CLUSTER_TT_DURABLE_RESOLVED_SCN:
14081472
/*
1409-
* spec-4.8 D3 (task#90): the durable scan above ran WRAP_ANY (a
1410-
* recycled scratch ITL slot carries no binding wrap here), so a single
1411-
* COMMITTED match cannot tell a genuine commit from a 2^32-wrapped
1412-
* raw-xid collision. When retention is unreliable AND the match is
1413-
* below the horizon it is wrap-suspect -> fail closed (narrowed
1414-
* AMBIGUOUS_WRAP), never resolve to its commit_scn (规则 8.A: a wrong
1415-
* deleter scn would false-hide a live row). With retention reliable a
1416-
* below-horizon collision's slot is already recycled (0-match), so a
1417-
* below-horizon 1-match is a legit recycle-lag commit -> trusted.
1473+
* spec-4.8 D3 acceptance (now the shared cluster_cr_accept_resolved_scn
1474+
* helper, spec-5.55 D6, gate (3)): the durable scan ran WRAP_ANY so a
1475+
* single COMMITTED match below the horizon with unreliable retention is
1476+
* wrap-suspect -> fail closed (规则 8.A: a wrong deleter scn would false-
1477+
* hide a live row). A memo hit above ran the SAME helper.
14181478
*/
1419-
if (cluster_tt_recovery_wrap_suspect(
1420-
expected_wrap, *out_scn, cluster_undo_retention_horizon(),
1421-
cluster_undo_retention_horizon_enabled
1422-
&& cluster_tt_slot_retention_off_recycle_count() == 0)) {
1479+
if (!cluster_cr_accept_resolved_scn(*out_scn)) {
14231480
*out_scn = InvalidScn;
1424-
cluster_tt_recovery_count_wrap_generation_disambiguated();
14251481
return CLUSTER_CR_XMAX_INVALID_OR_AMBIGUOUS;
14261482
}
1483+
/*
1484+
* spec-5.55 D5: cache the matched {seg,slot,wrap} position hint (own-
1485+
* instance gate (5) + COMMITTED gate (6) + acceptance-passed) so a peer
1486+
* backend can re-validate it in O(1) instead of re-scanning every segment.
1487+
* No-op unless the memo region is live.
1488+
*/
1489+
cluster_resolver_cache_install((uint16)cluster_node_id, cr_xmax, resolved_seg,
1490+
resolved_slot, resolved_wrap, *out_scn);
14271491
return CLUSTER_CR_XMAX_RESOLVED_SCN; /* *out_scn set by the resolve */
14281492
case CLUSTER_TT_DURABLE_RECYCLED_ZERO_MATCH:
14291493
*out_scn = InvalidScn;

src/backend/cluster/cluster_debug.c

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ PG_FUNCTION_INFO_V1(cluster_dump_state);
9999
#include "cluster/cluster_cr_pool.h" /* cluster_cr_pool_* counters (spec-5.51 D9) */
100100
#include "cluster/cluster_cr_admit.h" /* cluster_cr_admit_stat_* counters (spec-5.52 D9) */
101101
#include "cluster/cluster_cr_tuple.h" /* cluster_cr_tuple_stat_* counters (spec-5.54 D5) */
102+
#include "cluster/cluster_resolver_cache.h" /* cluster_resolver_cache_* counters (spec-5.55 D8) */
102103
#include "cluster/cluster_wal_state.h" /* wal_state registry dump (spec-4.2 D5) */
103104
#include "cluster/cluster_wal_thread.h" /* wal_thread dump accessors (spec-4.1 D7) */
104105
#include "cluster/cluster_tt_durable.h" /* cluster_tt_durable_* counters (spec-3.11 D8) */
@@ -2436,6 +2437,34 @@ dump_cr(ReturnSetInfo *rsinfo)
24362437
fmt_int64((int64)cluster_cr_admit_stat_count(CR_ADMIT_REASON_REJECT_RELCAP)));
24372438
emit_row(rsinfo, "cr_pool", "admit_reject_pressure",
24382439
fmt_int64((int64)cluster_cr_admit_stat_count(CR_ADMIT_REASON_REJECT_PRESSURE)));
2440+
2441+
/* spec-5.55 D8: shared resolver cache (CR Source 3 by-xid search-shortcut)
2442+
* counters. All 0 unless resolver_cache_enabled / _measure is on. These feed
2443+
* the §0.6 value gate: redundancy = key_present / lookup, re-probe hit rate =
2444+
* hit / key_present, acceptance pass rate = acceptance_pass / hit. */
2445+
emit_row(rsinfo, "resolver_cache", "lookup",
2446+
fmt_int64((int64)cluster_resolver_cache_lookup_count()));
2447+
emit_row(rsinfo, "resolver_cache", "key_present",
2448+
fmt_int64((int64)cluster_resolver_cache_key_present_count()));
2449+
emit_row(rsinfo, "resolver_cache", "epoch_miss",
2450+
fmt_int64((int64)cluster_resolver_cache_epoch_miss_count()));
2451+
emit_row(rsinfo, "resolver_cache", "hit", fmt_int64((int64)cluster_resolver_cache_hit_count()));
2452+
emit_row(rsinfo, "resolver_cache", "revalidate_miss",
2453+
fmt_int64((int64)cluster_resolver_cache_revalidate_miss_count()));
2454+
emit_row(rsinfo, "resolver_cache", "acceptance_pass",
2455+
fmt_int64((int64)cluster_resolver_cache_acceptance_pass_count()));
2456+
emit_row(rsinfo, "resolver_cache", "acceptance_failclosed",
2457+
fmt_int64((int64)cluster_resolver_cache_acceptance_failclosed_count()));
2458+
emit_row(rsinfo, "resolver_cache", "install",
2459+
fmt_int64((int64)cluster_resolver_cache_install_count()));
2460+
emit_row(rsinfo, "resolver_cache", "evict",
2461+
fmt_int64((int64)cluster_resolver_cache_evict_count()));
2462+
emit_row(rsinfo, "resolver_cache", "nonown_skip",
2463+
fmt_int64((int64)cluster_resolver_cache_nonown_skip_count()));
2464+
emit_row(rsinfo, "resolver_cache", "nonterminal_skip",
2465+
fmt_int64((int64)cluster_resolver_cache_nonterminal_skip_count()));
2466+
emit_row(rsinfo, "resolver_cache", "live_entries",
2467+
fmt_int64((int64)cluster_resolver_cache_live_entries()));
24392468
}
24402469

24412470

src/backend/cluster/cluster_guc.c

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@
4646
#include "cluster/cluster_cr_cache.h" /* cluster_cr_cache_max_blocks (spec-3.10 D4) */
4747
#include "cluster/cluster_grd.h" /* spec-5.10 starvation-protection shared flag */
4848
#include "cluster/cluster_cr_pool.h" /* cluster_shared_cr_pool_* (spec-5.51 D8) */
49+
#include "cluster/cluster_resolver_cache.h" /* cluster_shared_resolver_cache_* (spec-5.55 D7) */
4950
#include "cluster/cluster_guc.h"
5051
#include "cluster/cluster_hang.h" /* CLUSTER_HANG_MAX_SAMPLES (spec-5.11 D7) */
5152
#include "cluster/cluster_hang_resolve.h" /* HANG_RESOLVE_* + disposition GUCs (spec-5.12 D6) */
@@ -2591,6 +2592,46 @@ cluster_init_guc(void)
25912592
"affects hit/miss. The threshold is pending spec-5.58 calibration."),
25922593
&cluster_cr_pool_admit_pressure_ratio, 0, 0, 100000, PGC_SIGHUP, 0, NULL, NULL, NULL);
25932594

2595+
/* spec-5.55 D7: shared resolver cache (CR Source 3 by-xid search-shortcut
2596+
* memo). Both PGC_POSTMASTER: the entry count + measure switch are final at
2597+
* shmem reservation. Default entries 0 / measure off = true zero memory (the
2598+
* region is registered but reserves 0 bytes), so the spec-3.22 by-xid path is
2599+
* byte-identical. v1 ships in MEASURE mode only (the value gate, §0.6): it
2600+
* always re-runs the authoritative scan and never trusts the hint -- the
2601+
* trust path (flip-on) is gated on measured redundancy + re-probe hit rate. */
2602+
DefineCustomBoolVariable(
2603+
"cluster.resolver_cache_enabled",
2604+
gettext_noop("Enable spec-5.55 shared resolver cache TRUST mode (skip the by-xid scan on a "
2605+
"re-validated + accepted hint)."),
2606+
gettext_noop("Spec-5.55. Default off. When on (with resolver_cache_entries > 0), a CR "
2607+
"Source 3 by-xid resolution that hits the shared memo re-validates the hint "
2608+
"slot in O(1) and re-runs the SAME wrap_suspect acceptance as a fresh scan, "
2609+
"resolving WITHOUT the O(segments) scan (verdict-equivalent by construction). "
2610+
"The recommended non-zero default is bound to the §0.6 value gate evidence "
2611+
"(spec-5.58). PGC_POSTMASTER: requires a restart."),
2612+
&cluster_resolver_cache_enabled, false, PGC_POSTMASTER, 0, NULL, NULL, NULL);
2613+
2614+
DefineCustomBoolVariable(
2615+
"cluster.resolver_cache_measure",
2616+
gettext_noop("Enable spec-5.55 shared resolver cache MEASURE mode (value gate, no trust)."),
2617+
gettext_noop("Spec-5.55 §0.6. Default off. When on (with resolver_cache_entries > 0), CR "
2618+
"Source 3 records whether its own-instance by-xid scan result was already "
2619+
"memoized by a peer backend and whether an O(1) re-validation + acceptance "
2620+
"would have passed -- the cross-backend redundancy + re-probe hit rate that "
2621+
"gate the trust path. Never changes a visibility verdict (the authoritative "
2622+
"scan always runs). Orthogonal to resolver_cache_enabled. PGC_POSTMASTER."),
2623+
&cluster_resolver_cache_measure, false, PGC_POSTMASTER, 0, NULL, NULL, NULL);
2624+
2625+
DefineCustomIntVariable(
2626+
"cluster.resolver_cache_entries",
2627+
gettext_noop("Shared resolver cache hint-slot count (0 = disabled / zero memory)."),
2628+
gettext_noop("Spec-5.55 D3/D7. Default 0 (true zero memory; the region is registered but "
2629+
"reserves 0 bytes). PGC_POSTMASTER: requires a restart. The recommended "
2630+
"non-zero default + sizing are bound to the §0.6 measure-leg value gate "
2631+
"evidence (spec-5.58); resolver_cache_measure must also be on to allocate. "
2632+
"Each hint slot costs a few dozen bytes of shared memory."),
2633+
&cluster_shared_resolver_cache_entries, 0, 0, 1048576, PGC_POSTMASTER, 0, NULL, NULL, NULL);
2634+
25942635
DefineCustomIntVariable(
25952636
"cluster.boc_sweep_interval_ms",
25962637
gettext_noop("walwriter BOC sweep staleness target in milliseconds."),

src/backend/cluster/cluster_remote_xact.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -292,7 +292,7 @@ cluster_remote_outcome_durable_checked(int origin_node, TransactionId xid, SCN *
292292
*/
293293
if (!outcome_wrap_valid
294294
|| cluster_tt_slot_durable_resolve_by_xid_origin(origin_node, xid, (uint32)outcome_wrap,
295-
&durable_scn)
295+
&durable_scn, NULL, NULL, NULL)
296296
!= CLUSTER_TT_DURABLE_RESOLVED_SCN
297297
|| durable_scn != outcome_scn) {
298298
if (RemoteXactShared != NULL)

0 commit comments

Comments
 (0)