Skip to content

Branch percentile and sum-null IT expectations for the analytics-engine route#5522

Merged
penghuo merged 1 commit into
opensearch-project:mainfrom
ahkcs:analytics-percentile-sumnull-it-parity
Jun 8, 2026
Merged

Branch percentile and sum-null IT expectations for the analytics-engine route#5522
penghuo merged 1 commit into
opensearch-project:mainfrom
ahkcs:analytics-percentile-sumnull-it-parity

Conversation

@ahkcs

@ahkcs ahkcs commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Description

CalcitePPLAggregationIT (and its bucketSize=2 subclass CalcitePPLAggregationPaginatingIT) hard-coded three expectations from the Calcite DSL-pushdown path. When the same suite is routed through the analytics-engine (DataFusion) backend with -Dtests.analytics.parquet_indices=true, those expectations are wrong even though the backend behaves correctly:

  • testPercentilepercentile() is approximate. DataFusion's t-digest interpolation returns 46576 for percentile(balance, 90), where the OpenSearch/Calcite percentile_approx returns 48086 (p50 agrees on both at 32838). Both are valid approximations of the same percentile; the value was confirmed deterministic across repeated runs.
  • testSumNull / testSumGroupByNullValueSUM over an all-null bucket is null per the SQL spec. The DSL-pushdown path returns 0 instead — a known pushdown quirk tracked in [BUG] Sum multiple null values should return null instead of 0 #3408. DataFusion follows the spec like the Calcite-no-pushdown path and returns null.

The fix branches the expected values on the existing isAnalyticsParquetIndicesEnabled() helper, exactly the pattern already used in StatsCommandIT.testSumWithNull. No production code changes — these are test-expectation corrections only, and the assertions remain unchanged for the default Calcite path.

Pass rate (these test methods)

Test method analytics route before analytics route after Calcite/v2 route
testPercentile ✅ (unchanged)
testSumNull ✅ (unchanged)
testSumGroupByNullValue ✅ (unchanged)
same 3 via CalcitePPLAggregationPaginatingIT ✅ (unchanged)

Verified by running :integ-test:integTestRemote against a 9-plugin analytics cluster both with and without -Dtests.analytics.parquet_indices=true.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…ne route

CalcitePPLAggregationIT.testPercentile, testSumNull, and testSumGroupByNullValue
hard-coded expectations from the Calcite DSL-pushdown path, so they failed when
run through the analytics-engine (DataFusion) backend via
-Dtests.analytics.parquet_indices=true:

- percentile() is approximate. DataFusion's t-digest interpolation returns 46576
  for percentile(balance, 90) where the OpenSearch/Calcite percentile_approx
  returns 48086 (p50 agrees). Both are valid approximations.
- SUM over an all-null bucket is null per the SQL spec. The DSL-pushdown path
  returns 0 (a known quirk, opensearch-project#3408); DataFusion follows the spec like
  Calcite-no-pushdown and returns null.

Branch the expected values on the existing isAnalyticsParquetIndicesEnabled()
helper, matching the pattern already used in StatsCommandIT.testSumWithNull. No
production code change; both engine paths now pass.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Extract magic numbers to constants

Consider using a constant or configuration value for the expected percentile values
instead of hardcoding them. This makes the test more maintainable and easier to
update if the approximation algorithm changes in the future.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java [986]

-int expectedP90 = isAnalyticsParquetIndicesEnabled() ? 46576 : 48086;
+private static final int ANALYTICS_ENGINE_P90 = 46576;
+private static final int CALCITE_P90 = 48086;
+int expectedP90 = isAnalyticsParquetIndicesEnabled() ? ANALYTICS_ENGINE_P90 : CALCITE_P90;
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies the line and proposes extracting magic numbers to constants, which improves code maintainability. However, this is a minor code style improvement with limited impact on functionality, and the current inline approach with a clear comment is already reasonably maintainable for test code.

Low

@ahkcs ahkcs added the enhancement New feature or request label Jun 8, 2026
@penghuo penghuo merged commit c6360e9 into opensearch-project:main Jun 8, 2026
40 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants