Skip to content

[BugFix] Fix multiple appendpipe error while revisiting parent AST#5322

Merged
LantaoJin merged 7 commits into
opensearch-project:mainfrom
songkant-aws:bug-06b-appendpipe-planner-5173
Apr 10, 2026
Merged

[BugFix] Fix multiple appendpipe error while revisiting parent AST#5322
LantaoJin merged 7 commits into
opensearch-project:mainfrom
songkant-aws:bug-06b-appendpipe-planner-5173

Conversation

@songkant-aws

@songkant-aws songkant-aws commented Apr 8, 2026

Copy link
Copy Markdown
Collaborator

Description

Chained appendpipe queries can produce literal-only projections during prepare-time field trimming. Calcite may simplify those projections into Values, which can trigger planner mismatch assertions during execution. Preserve the Project shape for this narrow case in OpenSearchRelFieldTrimmer.

Related Issues

Resolves #5173

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit 9124702)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Fix planner mismatch by overriding trimUnusedFields and preserving hint strategies

Relevant files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java

Sub-PR theme: Add regression tests for multiple appendpipe scenarios (issue

Relevant files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAppendPipeCommandIT.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAppendPipeTest.java
  • integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5173.yml

⚡ Recommended focus areas for review

Cluster Reuse Risk

In the overridden trimUnusedFields, the converter is constructed using root.rel.getCluster() to reuse the incoming RelNode's cluster. While the comment explains the rationale, this could cause subtle issues if the cluster's planner state is mutated during trimming, potentially affecting the original RelNode tree. This should be carefully validated to ensure no side effects on shared cluster state.

protected RelRoot trimUnusedFields(RelRoot root) {
  final SqlToRelConverter.Config config =
      SqlToRelConverter.config()
          .withTrimUnusedFields(shouldTrim(root.rel))
          .withExpand(THREAD_EXPAND.get())
          .withInSubQueryThreshold(requireNonNull(THREAD_INSUBQUERY_THRESHOLD.get()));
  // PPL analyzes into a pre-built RelNode before prepareStatement(rel). Reuse the incoming
  // RelNode's cluster here so prepare-time trimming does not create replacement nodes under a
  // different planner than the rest of the tree.
  final SqlToRelConverter converter =
      new OpenSearchSqlToRelConverter(
          this,
          getSqlValidator(),
          catalogReader,
          root.rel.getCluster(),
          convertletTable,
          config);
  final boolean ordered = !root.collation.getFieldCollations().isEmpty();
  final boolean dml = SqlKind.DML.contains(root.kind);
  return root.withRel(converter.trimUnusedFields(dml || ordered, root.rel));
}
Copied Logic

The shouldTrim method and the trimUnusedFields override appear to be copied/adapted from Calcite's internal CalcitePrepareImpl. The comment about "3 joins" threshold is a heuristic from Calcite internals. If Calcite upgrades change this logic, the copied version here could diverge silently. Consider referencing or delegating to the parent implementation where possible.

private static boolean shouldTrim(RelNode rootRel) {
  // For now, don't trim if there are more than 3 joins. The projects
  // near the leaves created by trim migrate past joins and seem to
  // prevent join-reordering.
  return THREAD_TRIM.get() || RelOptUtil.countJoins(rootRel) < 2;
}
Unused Parameter

The private constructor OpenSearchSqlToRelConverter has a parameter named ignored (boolean) which is only used to differentiate the constructor signature. This is a code smell — consider using a different pattern (e.g., a factory method or a named inner builder) to avoid confusion and potential misuse.

private OpenSearchSqlToRelConverter(
    ViewExpander viewExpander,
    @Nullable SqlValidator validator,
    CatalogReader catalogReader,
    RelOptCluster cluster,
    SqlRexConvertletTable convertletTable,
    Config effectiveConfig,
    boolean ignored) {
  super(viewExpander, validator, catalogReader, cluster, convertletTable, effectiveConfig);
Hardcoded Expected Values

The integration test testDoubleAppendPipeWithFilter expects specific row results including rows(15224, "M") appearing twice, but the comment says "1 (M filter from cumulative 3 rows)". This logic depends on the order of appendpipe evaluation and the cumulative nature of the pipe. If the behavior changes or data ordering is non-deterministic, the test may become flaky.

public void testDoubleAppendPipeWithFilter() throws IOException {
  JSONObject actual =
      executeQuery(
          String.format(
              Locale.ROOT,
              "source=%s | stats sum(age) as sum_age by gender"
                  + " | appendpipe [ where gender = 'F' ]"
                  + " | appendpipe [ where gender = 'M' ]",
              TEST_INDEX_ACCOUNT));
  // 2 original + 1 (F filter from original) + 1 (M filter from cumulative 3 rows)
  verifyDataRows(actual, rows(14947, "F"), rows(15224, "M"), rows(14947, "F"), rows(15224, "M"));
}

@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to 9124702

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix join-count threshold mismatch with comment

The comment says "more than 3 joins" but the condition < 2 means trimming is skipped
when there are 2 or more joins, which contradicts the comment. The threshold should
be < 4 (or the comment should say "more than 1 join") to match the stated intent.

core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java [396-401]

 private static boolean shouldTrim(RelNode rootRel) {
   // For now, don't trim if there are more than 3 joins. The projects
   // near the leaves created by trim migrate past joins and seem to
   // prevent join-reordering.
-  return THREAD_TRIM.get() || RelOptUtil.countJoins(rootRel) < 2;
+  return THREAD_TRIM.get() || RelOptUtil.countJoins(rootRel) < 4;
 }
Suggestion importance[1-10]: 6

__

Why: The comment says "more than 3 joins" but < 2 skips trimming for 2+ joins, which contradicts the stated intent. Changing to < 4 would align the code with the comment, though the actual desired threshold may depend on the original Calcite source this was adapted from.

Low
General
Replace meaningless boolean marker with typed enum

The ignored boolean parameter is a code smell — it exists solely to differentiate
the private constructor from the public one, but it carries no semantic meaning and
can confuse future maintainers. Consider using a private marker enum or a named
factory method instead to make the intent explicit.

core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java [424-431]

+private enum ConstructorMarker { INSTANCE }
+
 private OpenSearchSqlToRelConverter(
     ViewExpander viewExpander,
     @Nullable SqlValidator validator,
     CatalogReader catalogReader,
     RelOptCluster cluster,
     SqlRexConvertletTable convertletTable,
     Config effectiveConfig,
-    boolean ignored) {
+    ConstructorMarker ignored) {
Suggestion importance[1-10]: 3

__

Why: The ignored boolean parameter is a minor code smell, but using a private enum marker is a style improvement with low functional impact. The suggestion is valid but offers only marginal maintainability benefit.

Low
Verify correctness of expected test data rows

The comment says the second appendpipe filters gender = 'M' from the cumulative 3
rows (original 2 + 1 F-filtered), so the expected result should be 3 rows (original
2 + 1 F-filtered row) plus 2 M-filtered rows from the 3-row set, totalling 5 rows.
The current assertion of 4 rows may be incorrect — verify the expected behavior and
row count carefully against the actual query semantics.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAppendPipeCommandIT.java [158]

+// Verify expected row count and content match actual appendpipe semantics
 verifyDataRows(actual, rows(14947, "F"), rows(15224, "M"), rows(14947, "F"), rows(15224, "M"));
Suggestion importance[1-10]: 2

__

Why: The suggestion asks to verify the row count but the improved_code is identical to the existing_code (just adds a comment), providing no actual fix. The concern about row count may be valid but the suggestion doesn't provide a concrete correction.

Low

Previous suggestions

Suggestions up to commit 0c3a695
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix join count threshold mismatch with comment

The comment says "more than 3 joins" but the condition < 2 means trimming is skipped
when there are 2 or more joins, which is inconsistent with the stated threshold of
3. The condition should be < 3 (or <= 3) to match the comment's intent of skipping
trim only when there are more than 3 joins.

core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java [395-400]

 private static boolean shouldTrim(RelNode rootRel) {
   // For now, don't trim if there are more than 3 joins. The projects
   // near the leaves created by trim migrate past joins and seem to
   // prevent join-reordering.
-  return THREAD_TRIM.get() || RelOptUtil.countJoins(rootRel) < 2;
+  return THREAD_TRIM.get() || RelOptUtil.countJoins(rootRel) < 3;
 }
Suggestion importance[1-10]: 6

__

Why: The comment says "more than 3 joins" but the condition < 2 would skip trimming for 2+ joins, not 3+. The condition should be < 3 to match the stated intent. This is a real inconsistency between the comment and the code logic.

Low
General
Guard against null thread-local value

THREAD_EXPAND.get() and THREAD_INSUBQUERY_THRESHOLD.get() may return null if the
thread-local values have not been initialized, which would cause a
NullPointerException. THREAD_INSUBQUERY_THRESHOLD.get() is already guarded with
requireNonNull, but THREAD_EXPAND.get() is not. Consider adding a null check or
default value for THREAD_EXPAND.get() as well.

core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java [372-379]

 @Override
 protected RelRoot trimUnusedFields(RelRoot root) {
   final SqlToRelConverter.Config config =
       SqlToRelConverter.config()
           .withTrimUnusedFields(shouldTrim(root.rel))
-          .withExpand(THREAD_EXPAND.get())
+          .withExpand(requireNonNull(THREAD_EXPAND.get()))
           .withInSubQueryThreshold(requireNonNull(THREAD_INSUBQUERY_THRESHOLD.get()));
Suggestion importance[1-10]: 4

__

Why: Adding requireNonNull to THREAD_EXPAND.get() is a defensive improvement, but since the parent class's trimUnusedFields also uses THREAD_EXPAND without null-checking, this is likely already handled by the initialization pattern of these thread-locals elsewhere in the codebase.

Low
Suggestions up to commit e07ed44
CategorySuggestion                                                                                                                                    Impact
General
Handle wrapped inputs in fixed row count check

The isFixedRowCount method only checks direct Values or Aggregate inputs, but does
not handle cases where the input is a Project wrapping a Values or Aggregate. This
could cause shouldAvoidSimplifyValues to return false when it should return true,
potentially leading to incorrect simplification behavior. Consider recursively
checking through Project nodes to reach the underlying Values or Aggregate.

core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchRelFieldTrimmer.java [170-179]

 private boolean isFixedRowCount(RelNode input) {
   if (input instanceof Values) {
     return true;
   }
   if (input instanceof Aggregate aggregate) {
     return aggregate.getGroupSet().isEmpty()
         && aggregate.getGroupType() == Aggregate.Group.SIMPLE;
   }
+  if (input instanceof Project project) {
+    return isFixedRowCount(project.getInput());
+  }
   return false;
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion is logically sound — isFixedRowCount could miss cases where a Project wraps a Values or Aggregate. However, the method is specifically designed to guard against Calcite's simplifyValues optimization in the context of field trimming, and the current cases (Values and groupless Aggregate) cover the known problematic patterns. The improvement is reasonable but addresses an edge case not demonstrated to cause issues.

Low
Possible issue
Replace assert with explicit runtime validation

The assert statements used to validate that correlationIds.size() == 1 are only
active when assertions are enabled (i.e., with the -ea JVM flag). In production
environments where assertions are disabled, this check is silently skipped, and the
code may behave incorrectly if there are multiple correlation IDs. Consider
replacing the assert with an explicit runtime check (e.g., throwing an
IllegalStateException) to ensure correctness in all environments.

core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchRelFieldTrimmer.java [65-71]

 List<RexSubQuery> subQueries = RexUtil.SubQueryCollector.collect(project);
 Set<CorrelationId> correlationIds = RelOptUtil.getVariablesUsed(subQueries);
 ImmutableBitSet requiredColumns = ImmutableBitSet.of();
 if (!correlationIds.isEmpty()) {
-  assert correlationIds.size() == 1;
+  if (correlationIds.size() != 1) {
+    throw new IllegalStateException("Expected at most one correlation ID, but found: " + correlationIds.size());
+  }
   requiredColumns = RelOptUtil.correlationColumns(correlationIds.iterator().next(), project);
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion is valid — assert statements are disabled in production JVMs by default. However, this pattern (using assert for internal invariants) is common in Calcite's own codebase, and the scenario of multiple correlation IDs in a single project is extremely unlikely. The improvement is minor and defensive rather than fixing a real bug.

Low
Suggestions up to commit 48bbc4b
CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid pushing shared RelNode reference onto stack

Calling peek() and then push() with the same node pushes a duplicate reference to
the same RelNode onto the stack. This means the subquery will operate on a shared
reference rather than an independent copy, which could cause issues when the
subquery modifies or consumes the node. Consider using a proper copy or snapshot of
the current plan before pushing it for the subquery to consume.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [305-307]

 RelNode mainNode = context.relBuilder.peek();
-context.relBuilder.push(mainNode);
+context.relBuilder.push(mainNode.copy(mainNode.getTraitSet(), mainNode.getInputs()));
 node.getSubQuery().accept(this, context);
Suggestion importance[1-10]: 4

__

Why: The suggestion raises a valid concern about pushing a shared RelNode reference. However, in Calcite's RelBuilder, peek() followed by push() is a common pattern to duplicate a reference on the stack for branching purposes, and RelNode trees are typically immutable — the subquery visitor builds new nodes on top. The fix may not be necessary and could introduce issues if copy() doesn't behave as expected in all cases.

Low
General
Clarify misleading test comment on row count

The comment says "1 (M filter from cumulative 3 rows)" but the second appendpipe
operates on the cumulative result of 3 rows (2 original + 1 F), so filtering by
gender = 'M' from those 3 rows would yield 1 row (the original M row). However, the
first appendpipe adds the F row again, so the cumulative set before the second
appendpipe has 2 original + 1 F = 3 rows, of which 1 is M. The expected count of 4
may be correct, but the comment is misleading and should be clarified to avoid
confusion.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAppendPipeCommandIT.java [148-149]

-// 2 original + 1 (F filter) + 1 (M filter from cumulative 3 rows) = 4 rows
+// 2 original (F, M) + 1 (F filter appended) + 1 (M filter from 3 cumulative rows) = 4 rows
 verifyNumOfRows(actual, 4);
Suggestion importance[1-10]: 2

__

Why: The suggestion only improves a comment's clarity without changing any logic. The improved_code is nearly identical to the existing_code with a minor rewording, offering minimal value.

Low

visitAppendPipe re-visits parent AST in new planner context, causing
assertion failure. Use relBuilder stack directly instead of AST
re-visitation to stay within the same planner context.

Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Chained appendpipe queries can produce literal-only projections
during prepare-time field trimming. Calcite may simplify those
projections into Values, which can trigger planner mismatch
assertions during execution. Preserve the Project shape for this
narrow case in OpenSearchRelFieldTrimmer and add YAML REST
regression coverage for double and triple appendpipe.

Signed-off-by: Songkan Tang <songkant@amazon.com>
@songkant-aws songkant-aws force-pushed the bug-06b-appendpipe-planner-5173 branch from 48bbc4b to e07ed44 Compare April 8, 2026 08:58
@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit e07ed44

Comment on lines +128 to +130
openSearchRelBuilder.push(newInput);
openSearchRelBuilder.project(newProjects, newRowType.getFieldNames(), false, correlationIds);
return result(openSearchRelBuilder.build(), mapping, project);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In trimFields(Project, ...), the old openSearchRelBuilder.project(...) call could trigger Calcite's internal simplifyValues() when trimming produced a literal-only projection over a fixed-row-count input. That rewrite could replace the Project with Values, which is the shape that led to the planner mismatch in chained appendpipe.

LantaoJin
LantaoJin previously approved these changes Apr 9, 2026
@qianheng-aws

qianheng-aws commented Apr 9, 2026

Copy link
Copy Markdown
Collaborator

As discussed offline, it's root cause is inconsistent planner/cluster are being used in trim phase and optimize phase.

In original calcite, one prepare statement will be used through the whole process. While in PPL, we create 2 different prepares for analyzing and executing(including trimming, optimizing and planning, only trimming uses the planner from this new created prepare while others reuse the planner in RelNode).

The ideal solution should be reusing the same planner for trimming as well.

Signed-off-by: Songkan Tang <songkant@amazon.com>
@github-actions

github-actions Bot commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 0c3a695

Signed-off-by: Songkan Tang <songkant@amazon.com>
@github-actions

github-actions Bot commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 9124702

@songkant-aws

songkant-aws commented Apr 9, 2026

Copy link
Copy Markdown
Collaborator Author

@qianheng-aws As suggested, prepare-time trimming now reuses the incoming RelNode's cluster so it stays aligned with the analyzed plan. The change also preserves pre-registered hint strategies required by PPL custom aggregate hints. If we find other prepare-time state that also needs to stay aligned with the incoming RelNode, we can address those cases incrementally.

@qianheng-aws

Copy link
Copy Markdown
Collaborator

LGTM

@LantaoJin LantaoJin merged commit 582b4d2 into opensearch-project:main Apr 10, 2026
38 of 39 checks passed
@LantaoJin

Copy link
Copy Markdown
Member

@songkant-aws the multiple LogicalUnions could be optimized as discussed offline?

@songkant-aws

Copy link
Copy Markdown
Collaborator Author

@LantaoJin I checked Calcite already has some optimization rules for multiple LogicalUnions. Sometimes it can merge multiple unions into one.

@songkant-aws songkant-aws deleted the bug-06b-appendpipe-planner-5173 branch April 10, 2026 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Double appendpipe causes 'different planner' assertion due to AST re-visitation

3 participants