Skip to content

Make addcoltotals/addtotals summary row backend-deterministic#5477

Merged
ahkcs merged 1 commit into
opensearch-project:mainfrom
ahkcs:fix/ppl-addcoltotals-analytics-route
May 29, 2026
Merged

Make addcoltotals/addtotals summary row backend-deterministic#5477
ahkcs merged 1 commit into
opensearch-project:mainfrom
ahkcs:fix/ppl-addcoltotals-analytics-route

Conversation

@ahkcs

@ahkcs ahkcs commented May 27, 2026

Copy link
Copy Markdown
Collaborator

What's broken

Two bugs in the lowering of addcoltotals (and addtotals col=true). Both manifest on the analytics-engine route and were hidden on the v2/Calcite path by single-node execution semantics.

  1. The summary row lands in a random position. addcoltotals lowers to UNION ALL(data, single-row SUM). The user docs say the summary row is "the last row." SQL UNION ALL makes no ordering guarantee — DataFusion does not preserve input order, so the summary row comes out anywhere. Tests asserting "last row is the summary" fail non-deterministically.
  2. HTTP 500 when labelfield names a new column. The lowering types that new column as VARCHAR(n) where n = max(len(label), len(labelfield)). substrait-java emits a VarChar(n) literal; upstream DataFusion's datafusion-substrait consumer does not implement length-bounded VarChar literals → Unsupported literal_type.

Both fixes are in CalciteRelNodeVisitor#buildAddRowTotalAggregate (the method shared by visitAddColTotals and visitAddTotals for col=true).


Bug 1 — summary row in random position

Step 1 — what addcoltotals does mechanically

testAddColTotalsTotalWithTotalField:

source=accounts | where age > 25 | fields age, balance | addcoltotals

Lowering produces:

LogicalUnion(all=[true])
  LogicalProject(age, balance)                       ← N data rows
    <upstream filter + scan>
  LogicalProject(age, balance)                       ← exactly 1 summary row
    LogicalAggregate(group=[{}], age=[SUM(age)], balance=[SUM(balance)])
      <upstream filter + scan>

Step 2 — on v2 (serial), UNION ALL preserves input order

The Calcite enumerable adapter executes UNION ALL by emitting the first input's tuples, then the second's:

rows[0]      data
rows[1]      data
…
rows[n-2]    data
rows[n-1]    summary   ← always last

verifyColTotals (the test helper) inspects rows[length-1] and asserts rows[length-1].age == sum(rows[0..length-2].age). Passes.

Step 3 — on DataFusion (parallel), UNION ALL doesn't preserve input order

SQL UNION ALL gives no ordering guarantee, and DataFusion's UnionExec interleaves its inputs' partitions as they're ready. One observed run:

rows[0]    data
rows[1]    data
…
rows[k]    summary   ← arbitrary position
…
rows[n-1]  data      ← just another account

Step 4 — exactly what the IT fails on

expected:<35.0> but was:<48323.0>

Working backwards: let T be the true age-total (the summary row's age). The misplaced summary row sits inside rows[0..length-2], so the client's cColTotals = sum(rows[0..length-2].age) = T + (T − 35) = 2T − 35. Solving 48323 = 2T − 35 gives T ≈ 24179. The numbers prove the bug is "summary row not last," not a wrong total. Across 3 runs we saw 2/6, 4/6, 4/6 — different tests failing each time.

Step 5 — a downstream PPL sort can't fix this

Adding | sort age after addcoltotals reorders by age, but the summary's age (24179) is a real numeric value indistinguishable from a data row, so the summary still doesn't land in any predictable position. There's no field that says "I'm the summary." The lowering itself has to encode it.

Where the offending union(true) is emitted

CalciteRelNodeVisitor#buildAddRowTotalAggregate:

RelNode totalsRow = context.relBuilder.build();
// 4. Union original data with totals row
context.relBuilder.push(originalData);
context.relBuilder.push(totalsRow);
context.relBuilder.union(true); // Use UNION ALL to preserve order

End-to-end path:

PPL "| addcoltotals"
   ↓  parser → AST AddColTotals
visitAddColTotals → buildAddRowTotalAggregate
   ↓  push(data) + push(totalsRow) + union(true)   ← no marker, no sort
LogicalUnion(all=[true])
   ↓  Calcite → Substrait
DataFusion's UnionExec emits tuples in arrival order across partitions

The comment // Use UNION ALL to preserve order was wishful — SQL UNION ALL has no ordering guarantee.

Fix

Tag each branch with a constant ordering key (0 on data rows, 1 on the summary row), UNION ALL, sort by the key, then projectExcept to drop it. The summary row is now > every data row on the sort key, so it sorts last on every backend:

RelNode totalsRow = context.relBuilder.build();
// 4. Union original data with totals row.
// UNION ALL does not guarantee row order, and some backends (e.g. DataFusion on the
// analytics-engine route) do not preserve input order, so the summary row can land
// anywhere. Tag each branch with a constant ordering key (0 = data rows, 1 = summary row),
// union, sort on it so the summary row is deterministically last, then drop the helper.
context.relBuilder.push(originalData);
context.relBuilder.projectPlus(
    context.relBuilder.alias(context.relBuilder.literal(0), ORDER_COLUMN_FOR_ADDCOLTOTALS));
RelNode dataWithOrder = context.relBuilder.build();
context.relBuilder.push(totalsRow);
context.relBuilder.projectPlus(
    context.relBuilder.alias(context.relBuilder.literal(1), ORDER_COLUMN_FOR_ADDCOLTOTALS));
RelNode totalsWithOrder = context.relBuilder.build();
context.relBuilder.push(dataWithOrder);
context.relBuilder.push(totalsWithOrder);
context.relBuilder.union(true); // UNION ALL
context.relBuilder.sort(context.relBuilder.field(ORDER_COLUMN_FOR_ADDCOLTOTALS));
context.relBuilder.projectExcept(context.relBuilder.field(ORDER_COLUMN_FOR_ADDCOLTOTALS));

Reuses the sort + projectExcept idiom that streamstats already uses in the same visitor. New PlanUtils.ORDER_COLUMN_FOR_ADDCOLTOTALS = "_addcoltotals_order_". No behavior change on v2 (which already happened to produce this order).


Bug 2 — Unsupported literal_type: VarChar on a new labelfield

Step 1 — what labelfield='Grand Total' triggers in the lowering

testAddColTotalsWithCustomLabel:

source=accounts | where age > 25 | head 2 | fields age, balance
  | addcoltotals label='Sum' labelfield='Grand Total'

labelfield='Grand Total' names a new column (doesn't exist in accounts). buildAddRowTotalAggregate creates that column with type VARCHAR(labelLength), where labelLength = max(len(label), len(labelfield)) — here max(3, 11) = 11 — and writes 'Sum':VARCHAR(11) in the summary row:

LogicalProject(age, balance, Grand_Total=[null:VARCHAR(11)])      ← data rows
LogicalProject(age, balance, Grand_Total=['Sum':VARCHAR(11)])     ← summary row

Step 2 — substrait-java renders bounded VARCHAR as Substrait VarChar(N)

substrait-java's type/literal converter maps:

  • Calcite VARCHAR with explicit length N → Substrait VarChar(N)
  • Calcite VARCHAR with no length → Substrait String (unbounded Utf8)

So 'Sum':VARCHAR(11) becomes a VarChar { value: "Sum", length: 11 } literal in the Substrait plan.

Step 3 — DataFusion's Substrait consumer doesn't implement VarChar(N) literals

Upstream datafusion-substrait's from_substrait_literal handles only String for string literals; VarChar and FixedChar fall through to not_impl_err!. The cluster returns:

HTTP 500
"details": "This feature is not implemented: Unsupported literal_type:
            Some(VarChar(VarChar { value: \"Sum\", length: 11 }))"

No rows are ever returned. Fails on every run (deterministic, unlike Bug 1).

Step 4 — why existing-field labels work and new-field labels don't

OpenSearchTypeFactory#createRelDataType maps every OpenSearch STRING field (text/keyword) to createSqlType(VARCHAR) with no explicit length:

// OpenSearchTypeFactory.java:170
case STRING:
  return TYPE_FACTORY.createSqlType(SqlTypeName.VARCHAR, nullable);   // no length

A label placed into an existing string column inherits that unbounded type → Substrait String → DataFusion accepts. testAddColTotalsWithLabelAndLabelField, which uses labelfield='firstname' (existing keyword field), works for exactly that reason. Only the new-labelfield path was creating bounded VARCHAR — inconsistent with how every other string column in the system is typed.

Where the bounded VARCHAR(labelLength) is emitted

CalciteRelNodeVisitor#buildAddRowTotalAggregate:

int labelLength =
    (labelField != null) && (labelField.length() > label.length())
        ? labelField.length()
        : label.length();

RelDataType labelVarcharType =
    context.relBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR, labelLength);

labelVarcharType then feeds two sites: makeNullLiteral(labelVarcharType) for the new label column on the data branch, and (via fieldDataType.getType()) makeLiteral(label, …) for the label value on the summary branch.

Fix

Drop the explicit length. Type the synthesized label column the same way OpenSearchTypeFactory types every other string column — unbounded VARCHAR:

// Type the synthesized label column as an unbounded VARCHAR, matching how OpenSearchTypeFactory
// maps STRING fields. A length-bounded VARCHAR(n) would render as a Substrait VarChar(n)
// literal, which the DataFusion backend (analytics-engine route) does not support.
RelDataType labelVarcharType =
    context.relBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR);

labelLength is removed — it served no purpose other than producing the bounded type. The literal now renders as Substrait String and DataFusion accepts it. No behavior change on v2.


Results

IT on analytics-engine route, 3 runs each Before After
CalciteAddColTotalsCommandIT (6 tests, all exercise the fix path) flaky 2/6, 4/6, 4/6 stable 6/6, 6/6, 6/6
CalciteAddTotalsCommandITtestAddTotalsWithLabelAndLabelField (the only col=true test; new labelfield so exercises both fixes) stable pass, pass, pass (other 10 col=false tests are unaffected and also pass)
v2/Calcite IT + unit tests + explain tests green green

Testing

  • CalcitePPLAddColTotalsTest 8/8, CalcitePPLAddTotalsTest 12/12 (regenerated logical-plan + Spark-SQL expectations; result rows unchanged).
  • CalciteExplainIT#testaddColTotalsExplain / testaddTotalsExplain ✅ (regenerated explain_add_col_totals.yaml / explain_add_totals.yaml).
  • CalciteAddColTotalsCommandIT 6/6 stable across 3 runs on analytics route; 6/6 on v2 ✅
  • CalciteAddTotalsCommandIT 11/11 stable across 3 runs on analytics route; 11/11 on v2 ✅
  • :core:spotlessCheck + :ppl:spotlessCheck clean ✅

Check List

  • New functionality includes testing.
  • Behavior matches existing documentation (no doc change needed).
  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit df0456f)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to df0456f

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Validate field reference after sort operation

The projectExcept operation may fail if ORDER_COLUMN_FOR_ADDCOLTOTALS doesn't exist
in the projection list after the sort. Verify that the field reference is valid
after the sort operation, or use an alternative approach like explicitly listing all
fields except the ordering column to ensure robustness.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3484-3496]

 context.relBuilder.push(originalData);
 context.relBuilder.projectPlus(
     context.relBuilder.alias(context.relBuilder.literal(0), ORDER_COLUMN_FOR_ADDCOLTOTALS));
 RelNode dataWithOrder = context.relBuilder.build();
 context.relBuilder.push(totalsRow);
 context.relBuilder.projectPlus(
     context.relBuilder.alias(context.relBuilder.literal(1), ORDER_COLUMN_FOR_ADDCOLTOTALS));
 RelNode totalsWithOrder = context.relBuilder.build();
 context.relBuilder.push(dataWithOrder);
 context.relBuilder.push(totalsWithOrder);
 context.relBuilder.union(true); // UNION ALL
 context.relBuilder.sort(context.relBuilder.field(ORDER_COLUMN_FOR_ADDCOLTOTALS));
-context.relBuilder.projectExcept(context.relBuilder.field(ORDER_COLUMN_FOR_ADDCOLTOTALS));
+// Get all fields except the ordering column
+List<RexNode> projectFields = new ArrayList<>();
+RelDataType rowType = context.relBuilder.peek().getRowType();
+for (int i = 0; i < rowType.getFieldCount() - 1; i++) {
+  projectFields.add(context.relBuilder.field(i));
+}
+context.relBuilder.project(projectFields);
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern about projectExcept potentially failing if the field doesn't exist, but the improved code is more verbose and less maintainable than the existing approach. The projectExcept method is a standard Calcite operation that should handle the field reference correctly. The suggestion would be more valuable if there was evidence of actual failures.

Low

Previous suggestions

Suggestions up to commit df0456f
CategorySuggestion                                                                                                                                    Impact
General
Ensure ordering column removal is robust

The projectExcept operation may fail if ORDER_COLUMN_FOR_ADDCOLTOTALS doesn't exist
in the projection. Verify that the field reference is valid after the sort
operation, or use an index-based approach to remove the ordering column more
reliably.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3484-3496]

 context.relBuilder.push(originalData);
 context.relBuilder.projectPlus(
     context.relBuilder.alias(context.relBuilder.literal(0), ORDER_COLUMN_FOR_ADDCOLTOTALS));
 RelNode dataWithOrder = context.relBuilder.build();
 context.relBuilder.push(totalsRow);
 context.relBuilder.projectPlus(
     context.relBuilder.alias(context.relBuilder.literal(1), ORDER_COLUMN_FOR_ADDCOLTOTALS));
 RelNode totalsWithOrder = context.relBuilder.build();
 context.relBuilder.push(dataWithOrder);
 context.relBuilder.push(totalsWithOrder);
 context.relBuilder.union(true); // UNION ALL
 context.relBuilder.sort(context.relBuilder.field(ORDER_COLUMN_FOR_ADDCOLTOTALS));
-context.relBuilder.projectExcept(context.relBuilder.field(ORDER_COLUMN_FOR_ADDCOLTOTALS));
+// Project all fields except the last one (ORDER_COLUMN_FOR_ADDCOLTOTALS)
+int fieldCount = context.relBuilder.peek().getRowType().getFieldCount();
+List<RexNode> projectFields = new ArrayList<>();
+for (int i = 0; i < fieldCount - 1; i++) {
+  projectFields.add(context.relBuilder.field(i));
+}
+context.relBuilder.project(projectFields);
Suggestion importance[1-10]: 7

__

Why: The suggestion addresses a potential robustness issue with projectExcept. Using an index-based approach to remove the ordering column is more reliable than field name reference, especially after sort operations. However, this is a defensive improvement rather than fixing a critical bug, as projectExcept should work correctly if the field exists.

Medium
Suggestions up to commit 170d132
CategorySuggestion                                                                                                                                    Impact
General
Verify sort direction is deterministic

The sort operation should specify a sort direction explicitly. While ASC is likely
the default, explicitly specifying it ensures consistent behavior across different
Calcite configurations and makes the intent clear that the summary row (tagged with
1) should appear last.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3502-3503]

-context.relBuilder.sort(context.relBuilder.field(ORDER_COLUMN_FOR_ADDCOLTOTALS));
+context.relBuilder.sort(
+    context.relBuilder.field(ORDER_COLUMN_FOR_ADDCOLTOTALS));
 context.relBuilder.projectExcept(context.relBuilder.field(ORDER_COLUMN_FOR_ADDCOLTOTALS));
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that the sort direction should be explicit for clarity and consistency. However, the improved_code is identical to the existing_code, and the PR already includes a comment explaining the sorting behavior. The suggestion asks for verification rather than proposing a concrete fix, so it receives a moderate score.

Low

@ahkcs ahkcs added PPL Piped processing language enhancement New feature or request labels May 28, 2026
@ahkcs ahkcs force-pushed the fix/ppl-addcoltotals-analytics-route branch from 170d132 to e8678df Compare May 28, 2026 16:48
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit e8678df

penghuo
penghuo previously approved these changes May 29, 2026
Comment on lines +3375 to +3377
// Type the synthesized label column as an unbounded VARCHAR, matching how OpenSearchTypeFactory
// maps STRING fields. A length-bounded VARCHAR(n) would render as a Substrait VarChar(n)
// literal, which the DataFusion backend (analytics-engine route) does not support.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete comments. mention DF/Substrait is misleading.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to remove/trim comments

addcoltotals (and addtotals with col=true) lowers to UNION ALL(data,
single SUM row). SQL UNION ALL has no ordering guarantee: the v2 engine
preserves input order incidentally, but the DataFusion backend on the
analytics-engine route does not, so the summary row landed in a
non-deterministic position. The command contract and user docs require the
summary row to be last.

Separately, the synthesized column for a new `labelfield` was typed as a
bounded VARCHAR(n), which renders as a Substrait VarChar(n) literal that the
DataFusion substrait consumer rejects ("Unsupported literal_type: VarChar").

Both fixes are in CalciteRelNodeVisitor.buildAddRowTotalAggregate:
- Tag each union branch with a constant ordering key (0 = data rows,
  1 = summary row), sort on it, then drop the helper column, so the summary
  row is deterministically last regardless of backend. Reuses the existing
  streamstats sort-then-projectExcept idiom; adds PlanUtils constant
  ORDER_COLUMN_FOR_ADDCOLTOTALS.
- Type the synthesized label column as an unbounded VARCHAR, matching how
  OpenSearchTypeFactory maps STRING fields.

Regenerated the affected CalcitePPLAddColTotalsTest / CalcitePPLAddTotalsTest
plan and Spark-SQL expectations and the explain_add_col_totals /
explain_add_totals YAML plans; result rows are unchanged.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the fix/ppl-addcoltotals-analytics-route branch from e8678df to df0456f Compare May 29, 2026 18:16
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit df0456f

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit df0456f

@ahkcs ahkcs requested a review from penghuo May 29, 2026 18:18
@ahkcs ahkcs merged commit 585fa9e into opensearch-project:main May 29, 2026
77 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants