Skip to content

Commit bff7be6

Browse files
committed
Restore contsel/contjoinsel for containment & key-existence operators (#2356)
The containment (`@>`, `<@`, `@>>`, `<<@`) and key-existence (`?`, `?|`, `?&`) operators on `agtype` were bound to `matchingsel`/`matchingjoinsel` on the PG14+ source tree. `matchingsel` is built for pattern operators (LIKE/regex) and during planning invokes the operator's underlying function (`agtype_contains`) once per `pg_statistic` MCV. With realistic statistics targets that produces a planner-time regression that dominates simple OLTP-style point queries. Restore the lighter `contsel`/`contjoinsel` helpers used by PostgreSQL core's jsonb operators (`@>`, `<@`, `?` on jsonb), which matches upstream's long-standing precedent for the same operator family. Changes: * `sql/agtype_operators.sql`, `sql/agtype_exists.sql`: 10 operators flipped from `matchingsel`/`matchingjoinsel` to `contsel`/`contjoinsel`. * `age--1.7.0--y.y.y.sql`: appended `ALTER OPERATOR ... SET (RESTRICT, JOIN)` for all 10 operators so existing installs flip on `ALTER EXTENSION age UPDATE`. * `regress/sql/containment_selectivity.sql` (+ `expected/.out`): pin the bindings via `pg_operator`, plus a "no leaked matchingsel" aggregate guard and functional smoke for all 10 operators. The guard catches future regressions if a new operator is added without the right selectivity helper. * `regress/expected/cypher_match.out`, `regress/expected/cypher_vle.out`: refresh expected to reflect new (and better) plan shapes that the lower-selectivity helper produces — `test_enable_containment` now picks Nested Loop + Index Only Scans over a Seq Scan/Hash Join, and two `MATCH p=...` and `show_list_use_vle` queries flip row order (queries had no `ORDER BY`; result set is unchanged, only ordering). * `Makefile`: register `containment_selectivity` in `REGRESS`. Validation: * Build: clean, `-Werror`. * Regression: 36/37 tests pass under `EXTRA_TESTS="pgvector fuzzystrmatch pg_trgm"`. Only `age_upgrade` fails — pre-existing on master at 774e781 (verified by `git stash && installcheck`). * Reporter's exact methodology (LDBC-SNB-style snb_graph + pgbench on `bench_message_content`) reproduces the regression and the fix: | Metric | matchingsel | contsel | Delta | |----------------------------|-------------|---------|-------| | EXPLAIN planning time (ms) | 1.42 | 0.97 | -32% | | EXPLAIN execution time (ms)| 0.34 | 0.31 | ~equal| | pgbench TPS (8c x 30s) | 5247 | 7378 | +40.6%| Run with `default_statistics_target = 1000` to populate MCV lists, matching the reporter's analyzed-graph conditions. * Upgrade path: validated end-to-end during the benchmark — operator bindings were flipped from `matchingsel` -> `contsel` via the same `ALTER OPERATOR` statements the upgrade SQL ships, while operators remained functional throughout. Driver workflows (python/go/node/jdbc) intentionally not run: this PR only adjusts pg_operator selectivity metadata. There is no C, type, or protocol change that drivers could observe. Closes #2356.
1 parent e9ef30b commit bff7be6

10 files changed

Lines changed: 324 additions & 45 deletions

Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,8 @@ REGRESS = scan \
180180
map_projection \
181181
direct_field_access \
182182
security \
183-
reserved_keyword_alias
183+
reserved_keyword_alias \
184+
containment_selectivity
184185

185186
ifneq ($(EXTRA_TESTS),)
186187
REGRESS += $(EXTRA_TESTS)

age--1.7.0--y.y.y.sql

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -459,3 +459,38 @@ BEGIN
459459
END LOOP;
460460
END;
461461
$$;
462+
463+
464+
--
465+
-- Issue #2356: restore lightweight selectivity functions for containment
466+
-- and key-existence operators.
467+
--
468+
-- The PG14+ branches of AGE bound RESTRICT=matchingsel / JOIN=matchingjoinsel
469+
-- on @>, <@, @>>, <<@, ?, ?|, ?&. matchingsel is built for pattern operators
470+
-- (LIKE / regex) and invokes the operator's underlying support function on
471+
-- pg_statistic MCVs during planning. For agtype that re-runs agtype_contains
472+
-- per MCV, which can dominate planning time on point queries (TPS regression
473+
-- reported on PG18). PostgreSQL core itself binds @>/<@/? on jsonb to
474+
-- contsel/contjoinsel for the same reason; this aligns AGE with that
475+
-- precedent.
476+
--
477+
ALTER OPERATOR ag_catalog.@>(agtype, agtype)
478+
SET (RESTRICT = contsel, JOIN = contjoinsel);
479+
ALTER OPERATOR ag_catalog.<@(agtype, agtype)
480+
SET (RESTRICT = contsel, JOIN = contjoinsel);
481+
ALTER OPERATOR ag_catalog.@>>(agtype, agtype)
482+
SET (RESTRICT = contsel, JOIN = contjoinsel);
483+
ALTER OPERATOR ag_catalog.<<@(agtype, agtype)
484+
SET (RESTRICT = contsel, JOIN = contjoinsel);
485+
ALTER OPERATOR ag_catalog.?(agtype, text)
486+
SET (RESTRICT = contsel, JOIN = contjoinsel);
487+
ALTER OPERATOR ag_catalog.?(agtype, agtype)
488+
SET (RESTRICT = contsel, JOIN = contjoinsel);
489+
ALTER OPERATOR ag_catalog.?|(agtype, text[])
490+
SET (RESTRICT = contsel, JOIN = contjoinsel);
491+
ALTER OPERATOR ag_catalog.?|(agtype, agtype)
492+
SET (RESTRICT = contsel, JOIN = contjoinsel);
493+
ALTER OPERATOR ag_catalog.?&(agtype, text[])
494+
SET (RESTRICT = contsel, JOIN = contjoinsel);
495+
ALTER OPERATOR ag_catalog.?&(agtype, agtype)
496+
SET (RESTRICT = contsel, JOIN = contjoinsel);
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing,
13+
* software distributed under the License is distributed on an
14+
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
* KIND, either express or implied. See the License for the
16+
* specific language governing permissions and limitations
17+
* under the License.
18+
*/
19+
/*
20+
* Regression coverage for issue #2356:
21+
* The containment (@>, <@, @>>, <<@) and key-existence (?, ?|, ?&)
22+
* operators on agtype must be bound to the lightweight selectivity
23+
* helpers contsel / contjoinsel during planning. Earlier PG14+
24+
* branches used matchingsel / matchingjoinsel, which caused planning
25+
* to invoke agtype_contains() against pg_statistic MCVs and produced
26+
* a 30%+ planning-time regression on point queries (severe TPS drop
27+
* reported on the PG18 branch).
28+
*
29+
* This test pins the bindings by querying pg_operator directly. If
30+
* someone re-introduces matchingsel here, the test diff is loud and
31+
* precise.
32+
*/
33+
LOAD 'age';
34+
SET search_path TO ag_catalog;
35+
-- Selectivity helpers for the four containment operators.
36+
SELECT o.oprname,
37+
pg_catalog.format_type(o.oprleft, NULL) AS lhs,
38+
pg_catalog.format_type(o.oprright, NULL) AS rhs,
39+
o.oprrest::text AS restrict_fn,
40+
o.oprjoin::text AS join_fn
41+
FROM pg_catalog.pg_operator o
42+
JOIN pg_catalog.pg_namespace n ON n.oid = o.oprnamespace
43+
WHERE n.nspname = 'ag_catalog'
44+
AND o.oprname IN ('@>', '<@', '@>>', '<<@')
45+
ORDER BY o.oprname, lhs, rhs;
46+
oprname | lhs | rhs | restrict_fn | join_fn
47+
---------+--------+--------+-------------+-------------
48+
<<@ | agtype | agtype | contsel | contjoinsel
49+
<@ | agtype | agtype | contsel | contjoinsel
50+
@> | agtype | agtype | contsel | contjoinsel
51+
@>> | agtype | agtype | contsel | contjoinsel
52+
(4 rows)
53+
54+
-- Selectivity helpers for all key-existence operator overloads
55+
-- (right-hand side may be text, text[], or agtype).
56+
SELECT o.oprname,
57+
pg_catalog.format_type(o.oprleft, NULL) AS lhs,
58+
pg_catalog.format_type(o.oprright, NULL) AS rhs,
59+
o.oprrest::text AS restrict_fn,
60+
o.oprjoin::text AS join_fn
61+
FROM pg_catalog.pg_operator o
62+
JOIN pg_catalog.pg_namespace n ON n.oid = o.oprnamespace
63+
WHERE n.nspname = 'ag_catalog'
64+
AND o.oprname IN ('?', '?|', '?&')
65+
ORDER BY o.oprname, lhs, rhs;
66+
oprname | lhs | rhs | restrict_fn | join_fn
67+
---------+--------+--------+-------------+-------------
68+
? | agtype | agtype | contsel | contjoinsel
69+
? | agtype | text | contsel | contjoinsel
70+
?& | agtype | agtype | contsel | contjoinsel
71+
?& | agtype | text[] | contsel | contjoinsel
72+
?| | agtype | agtype | contsel | contjoinsel
73+
?| | agtype | text[] | contsel | contjoinsel
74+
(6 rows)
75+
76+
-- Scoped guard for issue #2356: assert that none of the specific containment
77+
-- and key-existence operators on agtype are bound to matchingsel /
78+
-- matchingjoinsel. We deliberately limit the check to these operator names
79+
-- (rather than every operator in ag_catalog) so unrelated operators that
80+
-- legitimately use matchingsel for their own semantics are not affected by
81+
-- this regression test.
82+
SELECT COUNT(*) AS leaked_matchingsel_bindings
83+
FROM pg_catalog.pg_operator o
84+
JOIN pg_catalog.pg_namespace n ON n.oid = o.oprnamespace
85+
WHERE n.nspname = 'ag_catalog'
86+
AND o.oprname IN ('@>', '<@', '@>>', '<<@', '?', '?|', '?&')
87+
AND (o.oprrest::text = 'matchingsel'
88+
OR o.oprjoin::text = 'matchingjoinsel');
89+
leaked_matchingsel_bindings
90+
-----------------------------
91+
0
92+
(1 row)
93+
94+
-- Smoke test: each operator still works functionally. Selectivity binding
95+
-- only affects the planner; this guards against an inadvertent operator
96+
-- removal as part of any future cleanup.
97+
SELECT '{"a":1,"b":2}'::agtype @> '{"a":1}'::agtype AS contains_yes;
98+
contains_yes
99+
--------------
100+
t
101+
(1 row)
102+
103+
SELECT '{"a":1}'::agtype <@ '{"a":1,"b":2}'::agtype AS contained_yes;
104+
contained_yes
105+
---------------
106+
t
107+
(1 row)
108+
109+
SELECT '{"a":{"b":1}}'::agtype @>> '{"a":{"b":1}}'::agtype AS top_contains_yes;
110+
top_contains_yes
111+
------------------
112+
t
113+
(1 row)
114+
115+
SELECT '{"a":{"b":1}}'::agtype <<@ '{"a":{"b":1}}'::agtype AS top_contained_yes;
116+
top_contained_yes
117+
-------------------
118+
t
119+
(1 row)
120+
121+
SELECT '{"a":1}'::agtype ? 'a'::text AS exists_text_yes;
122+
exists_text_yes
123+
-----------------
124+
t
125+
(1 row)
126+
127+
SELECT '{"a":1}'::agtype ? '"a"'::agtype AS exists_agtype_yes;
128+
exists_agtype_yes
129+
-------------------
130+
t
131+
(1 row)
132+
133+
SELECT '{"a":1,"b":2}'::agtype ?| ARRAY['a','c'] AS exists_any_text_yes;
134+
exists_any_text_yes
135+
---------------------
136+
t
137+
(1 row)
138+
139+
SELECT '{"a":1,"b":2}'::agtype ?| '["a","c"]'::agtype AS exists_any_agtype_yes;
140+
exists_any_agtype_yes
141+
-----------------------
142+
t
143+
(1 row)
144+
145+
SELECT '{"a":1,"b":2}'::agtype ?& ARRAY['a','b'] AS exists_all_text_yes;
146+
exists_all_text_yes
147+
---------------------
148+
t
149+
(1 row)
150+
151+
SELECT '{"a":1,"b":2}'::agtype ?& '["a","b"]'::agtype AS exists_all_agtype_yes;
152+
exists_all_agtype_yes
153+
-----------------------
154+
t
155+
(1 row)
156+

regress/expected/cypher_match.out

Lines changed: 14 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2404,21 +2404,21 @@ SELECT * FROM cypher('cypher_match', $$ MATCH (a {name:a.name}) MATCH (a {age:a.
24042404
{"id": 281474976710659, "label": "", "properties": {"age": 3, "name": "orphan"}}::vertex
24052405
(3 rows)
24062406

2407-
SELECT * FROM cypher('cypher_match', $$ MATCH p=(a)-[u {relationship: u.relationship}]->(b) RETURN p $$) as (a agtype);
2407+
SELECT * FROM cypher('cypher_match', $$ MATCH p=(a)-[u {relationship: u.relationship}]->(b) RETURN p ORDER BY id(u) $$) as (a agtype);
24082408
a
24092409
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
24102410
[{"id": 281474976710661, "label": "", "properties": {"age": 4, "name": "T"}}::vertex, {"id": 4785074604081153, "label": "knows", "end_id": 281474976710666, "start_id": 281474976710661, "properties": {"years": 3, "relationship": "friends"}}::edge, {"id": 281474976710666, "label": "", "properties": {"age": 6}}::vertex]::path
24112411
[{"id": 281474976710659, "label": "", "properties": {"age": 3, "name": "orphan"}}::vertex, {"id": 4785074604081154, "label": "knows", "end_id": 281474976710666, "start_id": 281474976710659, "properties": {"years": 4, "relationship": "enemies"}}::edge, {"id": 281474976710666, "label": "", "properties": {"age": 6}}::vertex]::path
24122412
(2 rows)
24132413

2414-
SELECT * FROM cypher('cypher_match', $$ MATCH p=(a)-[u {relationship: u.relationship, years: u.years}]->(b) RETURN p $$) as (a agtype);
2414+
SELECT * FROM cypher('cypher_match', $$ MATCH p=(a)-[u {relationship: u.relationship, years: u.years}]->(b) RETURN p ORDER BY id(u) $$) as (a agtype);
24152415
a
24162416
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
24172417
[{"id": 281474976710661, "label": "", "properties": {"age": 4, "name": "T"}}::vertex, {"id": 4785074604081153, "label": "knows", "end_id": 281474976710666, "start_id": 281474976710661, "properties": {"years": 3, "relationship": "friends"}}::edge, {"id": 281474976710666, "label": "", "properties": {"age": 6}}::vertex]::path
24182418
[{"id": 281474976710659, "label": "", "properties": {"age": 3, "name": "orphan"}}::vertex, {"id": 4785074604081154, "label": "knows", "end_id": 281474976710666, "start_id": 281474976710659, "properties": {"years": 4, "relationship": "enemies"}}::edge, {"id": 281474976710666, "label": "", "properties": {"age": 6}}::vertex]::path
24192419
(2 rows)
24202420

2421-
SELECT * FROM cypher('cypher_match', $$ MATCH p=(a {name:a.name})-[u {relationship: u.relationship}]->(b {age:b.age}) RETURN p $$) as (a agtype);
2421+
SELECT * FROM cypher('cypher_match', $$ MATCH p=(a {name:a.name})-[u {relationship: u.relationship}]->(b {age:b.age}) RETURN p ORDER BY id(u) $$) as (a agtype);
24222422
a
24232423
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
24242424
[{"id": 281474976710661, "label": "", "properties": {"age": 4, "name": "T"}}::vertex, {"id": 4785074604081153, "label": "knows", "end_id": 281474976710666, "start_id": 281474976710661, "properties": {"years": 3, "relationship": "friends"}}::edge, {"id": 281474976710666, "label": "", "properties": {"age": 6}}::vertex]::path
@@ -3398,19 +3398,17 @@ SELECT count(*) FROM cypher('test_enable_containment', $$ MATCH p=(x:Customer)-[
33983398
(1 row)
33993399

34003400
SELECT * FROM cypher('test_enable_containment', $$ EXPLAIN (costs off) MATCH (x:Customer)-[:bought ={store: 'Amazon', addr:{city: 'Vancouver', street: 30}}]->(y:Product) RETURN 0 $$) as (a agtype);
3401-
QUERY PLAN
3402-
-------------------------------------------------------------------------------------------------------------------------------
3403-
Hash Join
3404-
Hash Cond: (y.id = _age_default_alias_0.end_id)
3405-
-> Seq Scan on "Product" y
3406-
-> Hash
3407-
-> Hash Join
3408-
Hash Cond: (x.id = _age_default_alias_0.start_id)
3409-
-> Seq Scan on "Customer" x
3410-
-> Hash
3411-
-> Seq Scan on bought _age_default_alias_0
3412-
Filter: (properties @>> '{"addr": {"city": "Vancouver", "street": 30}, "store": "Amazon"}'::agtype)
3413-
(10 rows)
3401+
QUERY PLAN
3402+
-------------------------------------------------------------------------------------------------------------------
3403+
Nested Loop
3404+
-> Nested Loop
3405+
-> Seq Scan on bought _age_default_alias_0
3406+
Filter: (properties @>> '{"addr": {"city": "Vancouver", "street": 30}, "store": "Amazon"}'::agtype)
3407+
-> Index Only Scan using "Customer_pkey" on "Customer" x
3408+
Index Cond: (id = _age_default_alias_0.start_id)
3409+
-> Index Only Scan using "Product_pkey" on "Product" y
3410+
Index Cond: (id = _age_default_alias_0.end_id)
3411+
(8 rows)
34143412

34153413
SELECT * FROM cypher('test_enable_containment', $$ EXPLAIN (costs off) MATCH (x:Customer ={school: { name: 'XYZ College',program: { major: 'Psyc', degree: 'BSc'} },phone: [ 123456789, 987654321, 456987123 ]}) RETURN 0 $$) as (a agtype);
34163414
QUERY PLAN

regress/expected/cypher_vle.out

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -691,7 +691,7 @@ BEGIN
691691
RETURN QUERY
692692
SELECT * FROM cypher('mygraph', $CYPHER$
693693
MATCH (h:head {name: $list_name})-[e:next*]->(v:node)
694-
RETURN v
694+
RETURN v ORDER BY id(v)
695695
$CYPHER$, ag_param) AS (node agtype);
696696
END $$;
697697
-- create a list
@@ -726,8 +726,8 @@ SELECT prepend_node('list01', 'b');
726726
SELECT * FROM show_list_use_vle('list01');
727727
node
728728
-----------------------------------------------------------------------------------
729-
{"id": 1407374883553282, "label": "node", "properties": {"content": "b"}}::vertex
730729
{"id": 1407374883553281, "label": "node", "properties": {"content": "a"}}::vertex
730+
{"id": 1407374883553282, "label": "node", "properties": {"content": "b"}}::vertex
731731
(2 rows)
732732

733733
-- prepend a node 'c'
@@ -741,9 +741,9 @@ SELECT prepend_node('list01', 'c');
741741
SELECT * FROM show_list_use_vle('list01');
742742
node
743743
-----------------------------------------------------------------------------------
744-
{"id": 1407374883553283, "label": "node", "properties": {"content": "c"}}::vertex
745-
{"id": 1407374883553282, "label": "node", "properties": {"content": "b"}}::vertex
746744
{"id": 1407374883553281, "label": "node", "properties": {"content": "a"}}::vertex
745+
{"id": 1407374883553282, "label": "node", "properties": {"content": "b"}}::vertex
746+
{"id": 1407374883553283, "label": "node", "properties": {"content": "c"}}::vertex
747747
(3 rows)
748748

749749
DROP FUNCTION show_list_use_vle;

0 commit comments

Comments
 (0)