Skip to content

[Enhancement] Support bare-field join criteria join on <field> shorthand#5517

Merged
Swiddis merged 2 commits into
opensearch-project:mainfrom
Hanyu-W:ppl-join-shorthand
Jun 11, 2026
Merged

[Enhancement] Support bare-field join criteria join on <field> shorthand#5517
Swiddis merged 2 commits into
opensearch-project:mainfrom
Hanyu-W:ppl-join-shorthand

Conversation

@Hanyu-W

@Hanyu-W Hanyu-W commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Description

Lets join on/where criteria accept a bare field name (or an AND chain of bare fields) as shorthand for the qualified equality on that common field. Before this change, the only way to join on a shared column was to spell out both qualifiers:

source=t1 | inner join on a t2          == on t1.a = t2.a
source=t1 | inner join on a AND b t2     == on t1.a = t2.a AND t1.b = t2.b
source=t1 | join on a t2                 == on t1.a = t2.a   (no prefix/alias)

The shorthand behaves exactly like the explicit on l.f = r.f it sugars: it keeps both key columns (the right key surfaced as <rightAlias>.f, or <rightTable>.f when no alias is given). This is intentionally different from the field-list join join f t2, which merges the duplicate key into a single column.

Two things made this more than a one-line grammar tweak, and shaped the design:

  1. Grammar ambiguity. on is a keyword that can also be an identifier, so the no-prefix form join on a t2 was greedily consumed by the field-list alternative — on parsed as a field name, failing with field [on] not found. Reordering the joinCommand alternatives so the criteria alternative is listed first makes ANTLR ALL(*) prefer reading on as the criteria keyword on genuine ambiguity, with zero change to the field-list form's behavior.
  2. Qualifier resolution. A bare on a carries no table qualifier, but the keep-both planner branch needs left.a = right.a. Resolution is done in the planner (CalciteRelNodeVisitor.visitJoin), where both inputs are already on the RelBuilder stack: each bare field is resolved by stack position (LEFT.f = RIGHT.f) via the existing buildJoinConditionByFieldName helper. The unresolved AST keeps the criteria exactly as the user wrote it (on a), so explain/anonymizer reflect the original query. Because matching is positional rather than by qualifier, a self-join join on f t is a genuine cross-scan equi-join (=($7, $15)), not the f = f tautology a name-based rewrite would collapse to.

The keep-both column-rename logic is unchanged — it works off the output row types, independent of how the join condition was built. A mixed condition that isn't all bare fields (e.g. on a AND b.x = c.y) is detected as not-bare and falls through to the normal analyzeJoinCondition path untouched.

Resolves #5484.

Grammar precedence change

This PR reorders the two joinCommand alternatives so the criteria alternative is listed first:

joinCommand
   // Criteria alt listed first - so `join on a` reads `on` as the criteria keyword, not a field.
   : sqlLikeJoinType? JOIN (joinOption)* sideAlias joinHintList? joinCriteria right = tableOrSubqueryClause
   | JOIN (joinOption)* (fieldList)? right = tableOrSubqueryClause
   ;

This is broader than the new shorthand. ANTLR's ALL(*) picks the first-listed alternative on genuine ambiguity, so the reorder changes the parse path of all join queries, not just the new bare-field form. Without it, the no-prefix form join on a <right> would let the field-list alternative greedily consume on as a field name instead of as the criteria keyword.

Existing behavior was re-verified, not assumed. Two guard tests pin the unchanged paths:

  • testJoinFieldListStillParsesAsFieldListsource=t1 | join a t2 still parses as a field-list join (getJoinFields() == [a], no condition).
  • testJoinPrefixWithoutCriteriaKeywordIsSyntaxErrorsource=t1 | inner join a t2 (a prefix with a bare field but no on/where) is still a SyntaxCheckException.

Edge case (verified). ON and WHERE are both reachable as bare-field identifiers via keywordsCanBeId, and joinCriteria: (ON | WHERE) logicalExpression treats them identically. So a field literally named on or where in the no-prefix form (join on t2 / join where t2) shifts from a field-list parse to a criteria parse. This is acceptable: on/where are keyword-like, and their use as bare field-list identifiers in a join prefix was already ambiguous/undocumented.

Changes

Area File Change
Grammar ppl/src/main/antlr/OpenSearchPPLParser.g4 Reorder joinCommand so the criteria alternative precedes the field-list alternative, so join on a reads on as the criteria keyword instead of a field
Parser ppl/.../ppl/parser/AstBuilder.java Leave a bare single-part field (or AND chain) as the join condition rather than treating it as a field list — the unresolved AST reflects what the user wrote
Planner core/.../calcite/CalciteRelNodeVisitor.java visitJoin criteria branch detects a bare-field condition and builds the equi-join by stack position (LEFT.f = RIGHT.f) via the existing buildJoinConditionByFieldName; otherwise falls through to analyzeJoinCondition unchanged
Util core/.../calcite/utils/JoinAndLookupUtils.java collectBareFields detects a bare single-part field / AND-chain and returns Optional<List<String>> (the name list is assembled only when every node is a bare field)
Docs docs/user/ppl/cmd/join.md Shorthand examples in both syntax sections, <joinCriteria> parameter description, and the field-dedup limitation note
Tests ppl/.../calcite/CalcitePPLJoinTest.java, ppl/.../parser/AstBuilderTest.java, ppl/.../utils/PPLQueryDataAnonymizerTest.java, core/.../calcite/utils/JoinAndLookupUtilsTest.java Plan-shape, AST-parse, anonymizer, and bare-field-detector coverage
QA (IT) integ-test/.../calcite/remote/CalcitePPLJoinIT.java End-to-end coverage on a live cluster

Testing

CalcitePPLJoinIT on a live cluster, exercising the new shorthand against pre-change behavior:

Test Exercises Before After
testJoinWithImplicitField inner join on name — keep-both; asserts shorthand output is identical to explicit on l.name = r.name; result asserted in sort order via verifyDataRowsInOrder
testJoinNoPrefixImplicitField join on name — no prefix; on parsed as the criteria keyword, identical to the explicit form (not the field-list merge)
testLeftJoinWithImplicitField left join on name — unmatched-left rows keep the left key (no NULL key; keep-both does not coalesce)
testAliasedSelfJoinWithImplicitField self-join on a unique key — a real equi-join (8 rows), not the 64-row cross product a tautology would give

Before = the same query run against pre-change code (parse/plan error — the bare field was read as a field name, e.g. field [on] not found).

Unit suites (all green, run locally):

  • CalcitePPLJoinTest: 57 — plan shape for single/multi-field, alias, where, no-prefix, left/semi/anti, self-join (real equi-join), max=N dedup wrapper, mixed-condition fall-through, and not-on-both-sides cases.
  • AstBuilderTest: 158 — rewrite-vs-passthrough decisions (bare field kept; qualified/comparison criteria untouched; field-list still parses as a field list; prefix-without-criteria is a syntax error).
  • PPLQueryDataAnonymizerTest: 119 — shorthand anonymizes correctly (renders on identifier).
  • JoinAndLookupUtilsTest: 5collectBareFields over single field, AND-chain, qualified field, mixed condition, and the no-partial-state case.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit 19e6693)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The collectBareFields method does not prevent duplicate field names from being added to the out list. If the join condition is on a AND a, the list will contain ["a", "a"], which may lead to duplicate join conditions like a = a AND a = a. This could cause unexpected behavior or inefficiency in the generated query plan.

static boolean collectBareFields(UnresolvedExpression expr, List<String> out) {
  if (expr instanceof And and) {
    return collectBareFields(and.getLeft(), out) && collectBareFields(and.getRight(), out);
  }
  if (expr instanceof Field field
      && field.getField() instanceof QualifiedName qn
      && qn.getParts().size() == 1) {
    out.add(qn.getParts().get(0));
    return true;
  }
  return false;
}
Possible Issue

When bareFields is empty after collectBareFields returns true, the reduce operation produces Optional.empty(), causing orElse(context.relBuilder.literal(true)) to generate a tautology join condition. This can occur if the input expression is a degenerate AND chain with no actual fields. The result is a cross join instead of the intended equi-join, potentially causing a massive performance issue or incorrect results.

bareFields.stream()
    .map(f -> buildJoinConditionByFieldName(context, f))
    .reduce(context.rexBuilder::and)
    .orElse(context.relBuilder.literal(true));

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to 19e6693
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent partial list modification on failure

The method modifies the out list even when returning false, which can leave it in an
inconsistent state. If the left side of an And succeeds but the right side fails,
the list will contain partial results. Clear the list or use a temporary collection
when the method returns false.

core/src/main/java/org/opensearch/sql/calcite/utils/JoinAndLookupUtils.java [49-60]

 static boolean collectBareFields(UnresolvedExpression expr, List<String> out) {
   if (expr instanceof And and) {
-    return collectBareFields(and.getLeft(), out) && collectBareFields(and.getRight(), out);
+    int sizeBefore = out.size();
+    if (!collectBareFields(and.getLeft(), out)) {
+      out.subList(sizeBefore, out.size()).clear();
+      return false;
+    }
+    if (!collectBareFields(and.getRight(), out)) {
+      out.subList(sizeBefore, out.size()).clear();
+      return false;
+    }
+    return true;
   }
   if (expr instanceof Field field
       && field.getField() instanceof QualifiedName qn
       && qn.getParts().size() == 1) {
     out.add(qn.getParts().get(0));
     return true;
   }
   return false;
 }
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a critical issue where the out list can be left in an inconsistent state if collectBareFields returns false after partially populating it. The improved code properly cleans up partial modifications by tracking the size before recursion and clearing added elements on failure.

Medium
Prevent empty field list cross join

The orElse(context.relBuilder.literal(true)) fallback creates a tautology when
bareFields is empty, which would produce a cross join instead of failing. Since the
grammar requires at least one field in the join criteria, an empty list should be
treated as an error rather than silently creating an unintended cross join.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [1893-1901]

 List<String> bareFields = new ArrayList<>();
 if (JoinAndLookupUtils.collectBareFields(node.getJoinCondition().get(), bareFields)) {
   ...
+  if (bareFields.isEmpty()) {
+    throw new IllegalStateException("Join criteria must contain at least one field");
+  }
   joinCondition =
       bareFields.stream()
           .map(f -> buildJoinConditionByFieldName(context, f))
           .reduce(context.rexBuilder::and)
-          .orElse(context.relBuilder.literal(true));
+          .get();
Suggestion importance[1-10]: 7

__

Why: The suggestion identifies a potential issue where an empty bareFields list would create a tautology condition. However, the grammar and collectBareFields logic should prevent this scenario from occurring in practice, making this more of a defensive check than a critical bug fix.

Medium

Previous suggestions

Suggestions up to commit 374d412
CategorySuggestion                                                                                                                                    Impact
Possible issue
Clear partial results on validation failure

The method returns false when encountering non-bare-field expressions (like
comparisons or OR), but doesn't validate that the accumulated out list is non-empty
before returning. If a mixed expression like a AND (b = c) is encountered, the
method will return false after partially populating out, leaving it in an
inconsistent state that could cause incorrect behavior in the caller.

core/src/main/java/org/opensearch/sql/ast/tree/Join.java [108-120]

 private static boolean collectBareFields(UnresolvedExpression expr, List<String> out) {
+  int initialSize = out.size();
   // AND is left-recursive (a AND b == And(a,b)); recurse both sides to flatten.
   if (expr instanceof And and) {
-    return collectBareFields(and.getLeft(), out) && collectBareFields(and.getRight(), out);
+    boolean result = collectBareFields(and.getLeft(), out) && collectBareFields(and.getRight(), out);
+    if (!result) {
+      out.subList(initialSize, out.size()).clear();
+    }
+    return result;
   }
   if (expr instanceof Field field
       && field.getField() instanceof QualifiedName qn
       && qn.getParts().size() == 1) {
     out.add(qn.getParts().get(0));
     return true;
   }
   return false;
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential issue where out could be left in an inconsistent state if a mixed expression is encountered. Clearing partial results on validation failure ensures the list remains consistent, improving robustness.

Medium
General
Prevent stack overflow in recursion

The recursive traversal lacks depth protection, which could cause a stack overflow
if the plan tree is deeply nested or contains cycles. Although cycles are unlikely
in a well-formed AST, adding a depth limit provides a safety guard against malformed
input or implementation bugs.

core/src/main/java/org/opensearch/sql/ast/tree/Join.java [155-164]

 private static Relation findRelation(UnresolvedPlan plan) {
+  return findRelation(plan, 0);
+}
+
+private static Relation findRelation(UnresolvedPlan plan, int depth) {
+  if (depth > 100) {
+    throw new SemanticCheckException("join plan tree exceeds maximum depth");
+  }
   if (plan instanceof Relation relation) {
     return relation;
   }
   List<? extends Node> children = plan.getChild();
   if (children.size() == 1 && children.get(0) instanceof UnresolvedPlan child) {
-    return findRelation(child);
+    return findRelation(child, depth + 1);
   }
   return null;
 }
Suggestion importance[1-10]: 6

__

Why: Adding a depth limit to the recursive findRelation method is a reasonable safeguard against stack overflow from deeply nested or malformed plan trees. While unlikely in practice, this defensive measure improves robustness with minimal overhead.

Low
Add explicit empty list validation

The orElseThrow will never be reached because rewriteBareFieldCriteria calls this
method only after collectBareFields returns true, which guarantees at least one
field in the list. However, if the list is empty due to a logic error elsewhere, the
exception message won't be thrown, making debugging harder. Add an explicit
precondition check at the method entry.

core/src/main/java/org/opensearch/sql/ast/tree/Join.java [123-136]

 private static UnresolvedExpression buildQualifiedEqualityConjunction(
     String leftQual, String rightQual, List<String> fieldNames) {
+  if (fieldNames.isEmpty()) {
+    throw new SemanticCheckException("join 'on' shorthand requires at least one field");
+  }
   return fieldNames.stream()
       .map(
           f ->
               (UnresolvedExpression)
                   new Compare(
                       "=",
                       new Field(QualifiedName.of(leftQual, f)),
                       new Field(QualifiedName.of(rightQual, f))))
       .reduce((l, r) -> new And(l, r))
-      .orElseThrow(
-          () -> new SemanticCheckException("join 'on' shorthand requires at least one field"));
+      .get();
Suggestion importance[1-10]: 5

__

Why: While the orElseThrow is indeed unreachable given the current call path, adding an explicit precondition check at method entry improves defensive programming and makes the code more maintainable. The suggestion correctly replaces .orElseThrow() with .get() after the check.

Low

@RyanL1997 RyanL1997 added bugFix enhancement New feature or request and removed bugFix labels Jun 4, 2026
@RyanL1997 RyanL1997 changed the title feat(ppl): support bare-field join criteria join on <field> shorthand [Enhancement] Support bare-field join criteria join on <field> shorthand Jun 4, 2026
new Field(QualifiedName.of(rightQual, f))))
.reduce((l, r) -> new And(l, r))
.orElseThrow(
() -> new SemanticCheckException("join 'on' shorthand requires at least one field"));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the only error case that can throw here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were two (this one, and an empty-list guard above it). Both are removed in the redesign, so neither throws anymore.

private static String qualifierOf(UnresolvedPlan plan, String side) {
Relation relation = findRelation(plan);
if (relation == null) {
throw new SemanticCheckException(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice detail on the error message!

consider using the ReportBuilder to split out suggestions from the problem

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That error path is going away — once matching is done by position in the planner, there's no "must be a named table" restriction, so the exception it was attached to is deleted.

if (rightAlias.isEmpty() && this.right instanceof SubqueryAlias alias) {
rightAlias = Optional.of(alias.getAlias());
}
rewriteBareFieldCriteria();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is putting it here the right boundary over resolving these fields in the constructor (or even frontloading it to the AST builder)?

Putting the rewrite here introduces a weird boundary, where now Join plays the role of dynamically parsing itself during attachment based directly on AST grammar choices. That multiple places comment "we actually resolve these fields in attach()" also hints that this is unintuitive.

It might be clearer to put this in the constructor. Alternatively, it's a little more work, but we could consider moving all of leftAlias, rightAlias, joinFields and similar into a dedicated JoinClause data type that supports operations like clause.expandBareFields() and clause.resolveAliases()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, same concern I called out here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — that's the fix I'm making. I'm going to move the expansion into visitJoin instead of attach(), so the AST keeps query how wrote it and the analysis happens in the planner where it belongs. Explain output will now match the original query too.

Yeah, I'll make the change Ryan suggested to fix this.

Comment thread docs/user/ppl/cmd/join.md Outdated
| `source=table1 \| join right=tt on table1.id=t2.id [ source=table2 as t2 \| eval b = id ] \| eval a = 1` | `table1.id, tt.id, tt.b, a` |

* **Field deduplication in extended syntax** – When using the extended syntax with a field list, duplicate field names in the output are deduplicated according to the `overwrite` option.
* **Field deduplication in extended syntax** – When using the extended syntax with a field list (`<join-field-list>`), duplicate field names are deduplicated according to the `overwrite` option. The bare-field `on`/`where` shorthand keeps both key columns, unlike the field-list form.

@Swiddis Swiddis Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the query grammar at the top of the doc wasn't updated to show what join-field-list is

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I'll cut it because it isn't adding anything valuable to the docs.

@RyanL1997 RyanL1997 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Hanyu-W , thanks for the change, and just left some reviews.

Comment thread core/src/main/java/org/opensearch/sql/ast/tree/Join.java Outdated
if (rightAlias.isEmpty() && this.right instanceof SubqueryAlias alias) {
rightAlias = Optional.of(alias.getAlias());
}
rewriteBareFieldCriteria();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performing semantic rewriting inside the AST node's attach() mixes tree construction with analysis, which usually lives in the analyzer layer here. The unresolved tree no longer reflects what the user wrote, which can make explain output and debugging confusing. Was an analyzer pass considered?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — that's the fix I'm making. I'm going to move the expansion into visitJoin instead of attach(), so the AST
keeps query how wrote it and the analysis happens in the planner where it belongs. Explain output will now match the original query too.

Comment thread core/src/main/java/org/opensearch/sql/ast/tree/Join.java Outdated
Comment thread core/src/main/java/org/opensearch/sql/ast/tree/Join.java Outdated
Comment thread core/src/main/java/org/opensearch/sql/ast/tree/Join.java Outdated
Comment thread ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLJoinTest.java Outdated
Comment thread ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLJoinTest.java Outdated
Let `on`/`where` join criteria accept a bare field name (or `AND` of bare
fields) as shorthand for the qualified equality on that common field:

  source=t1 | inner join on a t2          == on t1.a = t2.a
  source=t1 | inner join on a AND b t2    == on t1.a = t2.a AND t1.b = t2.b
  source=t1 | join on a t2                == on t1.a = t2.a   (no prefix/alias)

The shorthand behaves exactly like the explicit `on l.f = r.f` criteria: it
KEEPS BOTH key columns (the right key renamed `<rightAlias>.f`, or
`<rightTable>.f` when no alias is given). This differs from the field-list
join (`join f t2`), which merges the duplicate key to a single column.

Implementation:
- Grammar: reorder the joinCommand alternatives so the criteria alternative
  is listed first. This lets the no-prefix form `join on a <right>` parse `on`
  as the criteria keyword instead of the field-list alternative greedily
  consuming `on`/`where` as a field name. ANTLR ALL(*) picks the first-listed
  alternative on genuine ambiguity.
- AstBuilder: a bare single-part field (or AND-chain of them) is left as the
  join condition verbatim -- the unresolved AST reflects what the user wrote.
- Planner (CalciteRelNodeVisitor.visitJoin): the criteria branch detects a
  bare-field condition and builds the equi-join from it, resolving each field
  by stack position (LEFT.f = RIGHT.f) via the existing
  buildJoinConditionByFieldName helper. No new planner code, and no AST
  mutation. Because resolution is positional rather than by qualifier, a
  self-join `join on f t` is a genuine cross-scan equi-join rather than the
  `f = f` tautology a name-based rewrite would collapse to.

Adds AstBuilder, Calcite plan, anonymizer, and integration tests; updates the
join command user manual.

Signed-off-by: Hanyu Wei <weihanyu@amazon.com>
@Hanyu-W Hanyu-W force-pushed the ppl-join-shorthand branch from 374d412 to 19e6693 Compare June 5, 2026 16:36
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 19e6693

@anasalkouz anasalkouz added the PPL Piped processing language label Jun 5, 2026
@anasalkouz anasalkouz added the calcite calcite migration releated label Jun 5, 2026
@Hanyu-W

Hanyu-W commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

📝 Implementation note — the description above predates the redesign.

Heads up for anyone reading top-to-bottom: after the review feedback, I moved the bare-field expansion out of the AST (Join.attach()) and into the planner, so the original description's "Changes" table and its "planner 100% unchanged" line no longer match the committed code. I'm leaving the original post as-is for review-thread continuity; here's the accurate summary of what's on the branch now (19e6693):

Design (as committed):

  • The bare-field condition is kept verbatim in the unresolved AST, so explain/anonymizer reflect what the user wrote. Join.java is unchanged vs main.
  • Expansion happens in CalciteRelNodeVisitor.visitJoin: the criteria branch detects a bare single-part field / AND-chain via a new JoinAndLookupUtils.collectBareFields, then resolves each field positionally through the existing buildJoinConditionByFieldNamefield(2,0,f) = field(2,1,f), i.e. left-input.f = right-input.f by stack position, not by qualifier.
  • Because resolution is positional, a self-join join on f t is a real cross-scan equi-join (=($7,$15)), not the f = f tautology a name-qualified rewrite would collapse to — this is the issue @RyanL1997 flagged, now fixed by construction. There is no SemanticCheckException / qualifierOf / findRelation / rewriteBareFieldCriteria (all removed along with the AST rewrite).

Files actually changed (git diff main...HEAD):

File Change
ppl/src/main/antlr/OpenSearchPPLParser.g4 Reorder joinCommand so the criteria alt precedes the field-list alt (so join on a reads on as the criteria keyword, not a field)
ppl/.../ppl/parser/AstBuilder.java Keep the bare on <field> (or AND chain) verbatim instead of folding it into the field list
core/.../calcite/CalciteRelNodeVisitor.java visitJoin criteria branch: build the equi-join positionally for a bare-field condition; otherwise fall through to analyzeJoinCondition unchanged
core/.../calcite/utils/JoinAndLookupUtils.java New collectBareFields helper (true only for a bare single-part Field or an AND-chain of them)
docs/user/ppl/cmd/join.md Shorthand examples + <joinCriteria> parameter wording
test files CalcitePPLJoinTest, AstBuilderTest, PPLQueryDataAnonymizerTest, CalcitePPLJoinIT

Join.java is not touched — contrary to the "Rewrite" row in the original Changes table.

Tests (re-run locally just now, all green):

  • CalcitePPLJoinTest: 53/53 (13 new) — plan shape for single/multi-field, alias, where, no-prefix, left/semi, the self-join real equi-join =($7,$15), aliased/unaliased subquery, and not-on-both-sides error.
  • AstBuilderTest: 158/158 (11 new) — asserts the bare field is kept verbatim (not rewritten); qualified/comparison criteria untouched; field-list still parses as a field list; prefix-without-criteria is a syntax error.
  • PPLQueryDataAnonymizerTest: 119/119 (1 new) — anonymizes the un-rewritten condition (on identifier).
  • CalcitePPLJoinIT: 4 new methods, green in CI's build-linux (…, integration) jobs on this commit.

Comment thread core/src/main/java/org/opensearch/sql/calcite/utils/JoinAndLookupUtils.java Outdated
Comment thread ppl/src/main/antlr/OpenSearchPPLParser.g4
Follow-ups from the second review round on opensearch-project#5517:

- collectBareFields: replace the boolean-return + out-param pattern with an
  Optional<List<String>> return (RyanL1997). The name list is now assembled
  only when every node in the AND-tree is a bare field, so the "valid only
  when true" partial-state footgun is gone. Update the visitJoin caller.
- Add JoinAndLookupUtilsTest covering single field, AND-chain, qualified
  field, mixed condition, and the no-partial-state case.
- Add testJoinMixedConditionFallsThrough / testJoinMixedConditionWithInequality
  confirming a mixed condition (e.g. `on a AND b.x = c.y`) is not detected as
  bare and falls through to analyzeJoinCondition unchanged (RyanL1997).
- Add testJoinWithImplicitFieldMaxGreaterThanZero and testAntiJoinWithImplicitField
  for the max=N dedup wrapper and the anti-join early-return branch.
- CalcitePPLJoinIT.testJoinWithImplicitField: assert the sorted result with
  verifyDataRowsInOrder so the `| sort name, occupation` is actually validated
  instead of being silently order-independent.
- Tighten the not-on-both-sides assertion to IllegalArgumentException with the
  field-not-found substring; correct the comment to note the phrase is
  Calcite's own wording and may shift on a Calcite upgrade.

Signed-off-by: Hanyu Wei <weihanyu@amazon.com>
@Swiddis Swiddis merged commit d08df37 into opensearch-project:main Jun 11, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support implicit field joins when fields match across indices

4 participants