Skip to content

Commit 4c87ddf

Browse files
Richard Guokongfanshen
authored andcommitted
Support "Right Semi Join" plan shapes
Hash joins can support semijoin with the LHS input on the right, using the existing logic for inner join, combined with the assurance that only the first match for each inner tuple is considered, which can be achieved by leveraging the HEAP_TUPLE_HAS_MATCH flag. This can be very useful in some cases since we may now have the option to hash the smaller table instead of the larger. Merge join could likely support "Right Semi Join" too. However, the benefit of swapping inputs tends to be small here, so we do not address that in this patch. Note that this patch also modifies a test query in join.sql to ensure it continues testing as intended. With this patch the original query would result in a right-semi-join rather than semi-join, compromising its original purpose of testing the fix for neqjoinsel's behavior for semi-joins. Author: Richard Guo Reviewed-by: wenhui qiu, Alena Rybakina, Japin Li Discussion: https://postgr.es/m/CAMbWs4_X1mN=ic+SxcyymUqFx9bB8pqSLTGJ-F=MHy4PW3eRXw@mail.gmail.com (cherry picked from commit aa86129)
1 parent c9dc773 commit 4c87ddf

9 files changed

Lines changed: 75 additions & 29 deletions

File tree

src/backend/commands/explain.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2183,6 +2183,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
21832183
case JOIN_LASJ_NOTIN:
21842184
jointype = "Left Anti Semi (Not-In)";
21852185
break;
2186+
case JOIN_RIGHT_SEMI:
2187+
jointype = "Right Semi";
2188+
break;
21862189
case JOIN_RIGHT_ANTI:
21872190
jointype = "Right Anti";
21882191
break;

src/backend/executor/nodeHashjoin.c

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -688,6 +688,14 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
688688
}
689689
}
690690

691+
/*
692+
* In a right-semijoin, we only need the first match for each
693+
* inner tuple.
694+
*/
695+
if (node->js.jointype == JOIN_RIGHT_SEMI &&
696+
HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
697+
continue;
698+
691699
/*
692700
* We've got a match, but still need to test non-hashed quals.
693701
* ExecScanHashBucket already set up all the state needed to
@@ -704,10 +712,10 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
704712
{
705713
node->hj_MatchedOuter = true;
706714

707-
708715
/*
709-
* This is really only needed if HJ_FILL_INNER(node), but
710-
* we'll avoid the branch and just set it always.
716+
* This is really only needed if HJ_FILL_INNER(node) or if
717+
* we are in a right-semijoin, but we'll avoid the branch
718+
* and just set it always.
711719
*/
712720
if (!HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
713721
HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
@@ -1024,6 +1032,7 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
10241032
{
10251033
case JOIN_INNER:
10261034
case JOIN_SEMI:
1035+
case JOIN_RIGHT_SEMI:
10271036
break;
10281037
case JOIN_LEFT:
10291038
case JOIN_ANTI:

src/backend/optimizer/path/joinpath.c

Lines changed: 30 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -338,8 +338,8 @@ add_paths_to_join_relation(PlannerInfo *root,
338338
* sorted. This includes both nestloops and mergejoins where the outer
339339
* path is already ordered. Again, skip this if we can't mergejoin.
340340
* (That's okay because we know that nestloop can't handle
341-
* right/right-anti/full joins at all, so it wouldn't work in the
342-
* prohibited cases either.)
341+
* right/right-anti/right-semi/full joins at all, so it wouldn't work in
342+
* the prohibited cases either.)
343343
*/
344344
if (mergejoin_allowed)
345345
match_unsorted_outer(root, joinrel, outerrel, innerrel,
@@ -1897,6 +1897,13 @@ match_unsorted_outer(PlannerInfo *root,
18971897
if (jointype == JOIN_DEDUP_SEMI || jointype == JOIN_DEDUP_SEMI_REVERSE)
18981898
jointype = JOIN_INNER;
18991899

1900+
/*
1901+
* For now we do not support RIGHT_SEMI join in mergejoin or nestloop
1902+
* join.
1903+
*/
1904+
if (jointype == JOIN_RIGHT_SEMI)
1905+
return;
1906+
19001907
/*
19011908
* Nestloop only supports inner, left, semi, and anti joins. Also, if we
19021909
* are doing a right, right-anti or full mergejoin, we must use *all* the
@@ -2501,12 +2508,13 @@ hash_inner_and_outer(PlannerInfo *root,
25012508
* total inner path will also be parallel-safe, but if not, we'll
25022509
* have to search for the cheapest safe, unparameterized inner
25032510
* path. If doing JOIN_UNIQUE_INNER, we can't use any alternative
2504-
* inner path. If full, right, or right-anti join, we can't use
2505-
* parallelism (building the hash table in each backend) because
2506-
* no one process has all the match bits.
2511+
* inner path. If full, right, right-semi or right-anti join, we
2512+
* can't use parallelism (building the hash table in each backend)
2513+
* because no one process has all the match bits.
25072514
*/
25082515
if (save_jointype == JOIN_FULL ||
25092516
save_jointype == JOIN_RIGHT ||
2517+
save_jointype == JOIN_RIGHT_SEMI ||
25102518
save_jointype == JOIN_RIGHT_ANTI)
25112519
cheapest_safe_inner = NULL;
25122520
else if (cheapest_total_inner->parallel_safe)
@@ -2533,13 +2541,13 @@ hash_inner_and_outer(PlannerInfo *root,
25332541
* Returns a list of RestrictInfo nodes for those clauses.
25342542
*
25352543
* *mergejoin_allowed is normally set to true, but it is set to false if
2536-
* this is a right/right-anti/full join and there are nonmergejoinable join
2537-
* clauses. The executor's mergejoin machinery cannot handle such cases, so
2538-
* we have to avoid generating a mergejoin plan. (Note that this flag does
2539-
* NOT consider whether there are actually any mergejoinable clauses. This is
2540-
* correct because in some cases we need to build a clauseless mergejoin.
2541-
* Simply returning NIL is therefore not enough to distinguish safe from
2542-
* unsafe cases.)
2544+
* this is a right-semi join, or this is a right/right-anti/full join and
2545+
* there are nonmergejoinable join clauses. The executor's mergejoin
2546+
* machinery cannot handle such cases, so we have to avoid generating a
2547+
* mergejoin plan. (Note that this flag does NOT consider whether there are
2548+
* actually any mergejoinable clauses. This is correct because in some
2549+
* cases we need to build a clauseless mergejoin. Simply returning NIL is
2550+
* therefore not enough to distinguish safe from unsafe cases.)
25432551
*
25442552
* We also mark each selected RestrictInfo to show which side is currently
25452553
* being considered as outer. These are transient markings that are only
@@ -2563,6 +2571,16 @@ select_mergejoin_clauses(PlannerInfo *root,
25632571
bool have_nonmergeable_joinclause = false;
25642572
ListCell *l;
25652573

2574+
/*
2575+
* For now we do not support RIGHT_SEMI join in mergejoin: the benefit of
2576+
* swapping inputs tends to be small here.
2577+
*/
2578+
if (jointype == JOIN_RIGHT_SEMI)
2579+
{
2580+
*mergejoin_allowed = false;
2581+
return NIL;
2582+
}
2583+
25662584
foreach(l, restrictlist)
25672585
{
25682586
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);

src/backend/optimizer/path/joinrels.c

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1128,6 +1128,18 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
11281128
JOIN_SEMI, sjinfo,
11291129
restrictlist);
11301130

1131+
/*
1132+
* Also consider "Right Semi Join" plan shapes, with the inputs
1133+
* swapped so that the LHS (rel1) becomes the hash/build side.
1134+
* This lets us hash the smaller table. Like JOIN_SEMI above,
1135+
* this does not add a UniquePath, so (unlike the GPDB
1136+
* JOIN_DEDUP_SEMI paths below) it is safe to consider
1137+
* unconditionally.
1138+
*/
1139+
add_paths_to_joinrel(root, joinrel, rel2, rel1,
1140+
JOIN_RIGHT_SEMI, sjinfo,
1141+
restrictlist);
1142+
11311143
if (root->upd_del_replicated_table > 0 &&
11321144
(bms_is_member(root->upd_del_replicated_table, rel1->relids) ||
11331145
bms_is_member(root->upd_del_replicated_table, rel2->relids)))

src/backend/optimizer/path/pathkeys.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1425,6 +1425,9 @@ build_join_pathkeys(PlannerInfo *root,
14251425
JoinType jointype,
14261426
List *outer_pathkeys)
14271427
{
1428+
/* RIGHT_SEMI should not come here */
1429+
Assert(jointype != JOIN_RIGHT_SEMI);
1430+
14281431
if (jointype == JOIN_FULL ||
14291432
jointype == JOIN_RIGHT ||
14301433
jointype == JOIN_RIGHT_ANTI)

src/backend/optimizer/prep/prepjointree.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -430,8 +430,8 @@ pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
430430
* point of the available_rels machinations is to ensure that we only
431431
* pull up quals for which that's okay.
432432
*
433-
* We don't expect to see any pre-existing JOIN_SEMI, JOIN_ANTI, or
434-
* JOIN_RIGHT_ANTI jointypes here.
433+
* We don't expect to see any pre-existing JOIN_SEMI, JOIN_ANTI,
434+
* JOIN_RIGHT_SEMI, or JOIN_RIGHT_ANTI jointypes here.
435435
*/
436436
switch (j->jointype)
437437
{
@@ -3277,7 +3277,7 @@ reduce_outer_joins_pass2(Node *jtnode,
32773277
* These could only have been introduced by pull_up_sublinks,
32783278
* so there's no way that upper quals could refer to their
32793279
* righthand sides, and no point in checking. We don't expect
3280-
* to see JOIN_RIGHT_ANTI yet.
3280+
* to see JOIN_RIGHT_SEMI or JOIN_RIGHT_ANTI yet.
32813281
*/
32823282
break;
32833283
default:

src/include/nodes/nodes.h

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1026,6 +1026,7 @@ typedef enum JoinType
10261026
JOIN_LASJ_NOTIN, /* Left Anti Semi Join with Not-In semantics:
10271027
If any NULL values are produced by inner side,
10281028
return no join results. Otherwise, same as LASJ */
1029+
JOIN_RIGHT_SEMI, /* 1 copy of each RHS row that has match(es) */
10291030
JOIN_RIGHT_ANTI, /* 1 copy of each RHS row that has no match */
10301031

10311032
/*
@@ -1054,10 +1055,10 @@ typedef enum JoinType
10541055

10551056
/*
10561057
* OUTER joins are those for which pushed-down quals must behave differently
1057-
* from the join's own quals. This is in fact everything except INNER and
1058-
* SEMI joins. However, this macro must also exclude the JOIN_UNIQUE symbols
1059-
* since those are temporary proxies for what will eventually be an INNER
1060-
* join.
1058+
* from the join's own quals. This is in fact everything except INNER, SEMI
1059+
* and RIGHT_SEMI joins. However, this macro must also exclude the
1060+
* JOIN_UNIQUE symbols since those are temporary proxies for what will
1061+
* eventually be an INNER join.
10611062
*
10621063
* Note: semijoins are a hybrid case, but we choose to treat them as not
10631064
* being outer joins. This is okay principally because the SQL syntax makes

src/include/nodes/pathnodes.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3275,9 +3275,9 @@ typedef struct PlaceHolderVar
32753275
* min_lefthand and min_righthand for higher joins.)
32763276
*
32773277
* jointype is never JOIN_RIGHT; a RIGHT JOIN is handled by switching
3278-
* the inputs to make it a LEFT JOIN. It's never JOIN_RIGHT_ANTI either.
3279-
* So the allowed values of jointype in a join_info_list member are only
3280-
* LEFT, FULL, SEMI, or ANTI.
3278+
* the inputs to make it a LEFT JOIN. It's never JOIN_RIGHT_SEMI or
3279+
* JOIN_RIGHT_ANTI either. So the allowed values of jointype in a
3280+
* join_info_list member are only LEFT, FULL, SEMI, or ANTI.
32813281
*
32823282
* ojrelid is the RT index of the join RTE representing this outer join,
32833283
* if there is one. It is zero when jointype is INNER or SEMI, and can be

src/test/regress/sql/join.sql

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -214,10 +214,10 @@ SELECT *
214214
-- semijoin selectivity for <>
215215
--
216216
explain (costs off)
217-
select * from int4_tbl i4, tenk1 a
218-
where exists(select * from tenk1 b
219-
where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
220-
and i4.f1 = a.tenthous;
217+
select * from tenk1 a, tenk1 b
218+
where exists(select * from tenk1 c
219+
where b.twothousand = c.twothousand and b.fivethous <> c.fivethous)
220+
and a.tenthous = b.tenthous and a.tenthous < 5000;
221221

222222

223223
--

0 commit comments

Comments
 (0)