Branch percentile and sum-null IT expectations for the analytics-engine route#5522
Merged
penghuo merged 1 commit intoJun 8, 2026
Conversation
…ne route CalcitePPLAggregationIT.testPercentile, testSumNull, and testSumGroupByNullValue hard-coded expectations from the Calcite DSL-pushdown path, so they failed when run through the analytics-engine (DataFusion) backend via -Dtests.analytics.parquet_indices=true: - percentile() is approximate. DataFusion's t-digest interpolation returns 46576 for percentile(balance, 90) where the OpenSearch/Calcite percentile_approx returns 48086 (p50 agrees). Both are valid approximations. - SUM over an all-null bucket is null per the SQL spec. The DSL-pushdown path returns 0 (a known quirk, opensearch-project#3408); DataFusion follows the spec like Calcite-no-pushdown and returns null. Branch the expected values on the existing isAnalyticsParquetIndicesEnabled() helper, matching the pattern already used in StatsCommandIT.testSumWithNull. No production code change; both engine paths now pass. Signed-off-by: Kai Huang <ahkcs@amazon.com>
Contributor
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
Contributor
PR Code Suggestions ✨Explore these optional code suggestions:
|
Swiddis
approved these changes
Jun 8, 2026
penghuo
approved these changes
Jun 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
CalcitePPLAggregationIT(and itsbucketSize=2subclassCalcitePPLAggregationPaginatingIT) hard-coded three expectations from the Calcite DSL-pushdown path. When the same suite is routed through the analytics-engine (DataFusion) backend with-Dtests.analytics.parquet_indices=true, those expectations are wrong even though the backend behaves correctly:testPercentile—percentile()is approximate. DataFusion's t-digest interpolation returns 46576 forpercentile(balance, 90), where the OpenSearch/Calcitepercentile_approxreturns 48086 (p50agrees on both at 32838). Both are valid approximations of the same percentile; the value was confirmed deterministic across repeated runs.testSumNull/testSumGroupByNullValue—SUMover an all-null bucket isnullper the SQL spec. The DSL-pushdown path returns0instead — a known pushdown quirk tracked in [BUG] Sum multiplenullvalues should returnnullinstead of0#3408. DataFusion follows the spec like the Calcite-no-pushdown path and returnsnull.The fix branches the expected values on the existing
isAnalyticsParquetIndicesEnabled()helper, exactly the pattern already used inStatsCommandIT.testSumWithNull. No production code changes — these are test-expectation corrections only, and the assertions remain unchanged for the default Calcite path.Pass rate (these test methods)
testPercentiletestSumNulltestSumGroupByNullValueCalcitePPLAggregationPaginatingITVerified by running
:integ-test:integTestRemoteagainst a 9-plugin analytics cluster both with and without-Dtests.analytics.parquet_indices=true.Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.