Skip to content

Commit c6360e9

Browse files
authored
Branch percentile and sum-null IT expectations for the analytics-engine route (#5522)
CalcitePPLAggregationIT.testPercentile, testSumNull, and testSumGroupByNullValue hard-coded expectations from the Calcite DSL-pushdown path, so they failed when run through the analytics-engine (DataFusion) backend via -Dtests.analytics.parquet_indices=true: - percentile() is approximate. DataFusion's t-digest interpolation returns 46576 for percentile(balance, 90) where the OpenSearch/Calcite percentile_approx returns 48086 (p50 agrees). Both are valid approximations. - SUM over an all-null bucket is null per the SQL spec. The DSL-pushdown path returns 0 (a known quirk, #3408); DataFusion follows the spec like Calcite-no-pushdown and returns null. Branch the expected values on the existing isAnalyticsParquetIndicesEnabled() helper, matching the pattern already used in StatsCommandIT.testSumWithNull. No production code change; both engine paths now pass. Signed-off-by: Kai Huang <ahkcs@amazon.com>
1 parent e5de8f6 commit c6360e9

1 file changed

Lines changed: 14 additions & 4 deletions

File tree

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -980,7 +980,11 @@ public void testPercentile() throws IOException {
980980
"source=%s | stats percentile(balance, 50) as p50, percentile(balance, 90) as p90",
981981
TEST_INDEX_BANK));
982982
verifySchema(actual, schema("p50", "bigint"), schema("p90", "bigint"));
983-
verifyDataRows(actual, rows(32838, 48086));
983+
// percentile() is approximate. The analytics-engine backend (DataFusion) uses a different
984+
// t-digest interpolation than the Calcite/OpenSearch percentile_approx implementation, so p90
985+
// lands on a different value (p50 agrees). Both are valid approximations.
986+
int expectedP90 = isAnalyticsParquetIndicesEnabled() ? 46576 : 48086;
987+
verifyDataRows(actual, rows(32838, expectedP90));
984988
}
985989

986990
@Test
@@ -990,14 +994,18 @@ public void testSumGroupByNullValue() throws IOException {
990994
String.format(
991995
"source=%s | stats sum(balance) as a by age", TEST_INDEX_BANK_WITH_NULL_VALUES));
992996
verifySchema(response, schema("a", null, "bigint"), schema("age", null, "int"));
997+
// SUM of an all-null bucket is null per the SQL spec. The DSL-pushdown path returns 0 instead
998+
// (a known pushdown quirk); the analytics-engine backend (DataFusion) follows the spec like
999+
// Calcite-no-pushdown and returns null. See testSumNull and #3408.
1000+
Object emptySum = (isPushdownDisabled() || isAnalyticsParquetIndicesEnabled()) ? null : 0;
9931001
verifyDataRows(
9941002
response,
995-
rows(isPushdownDisabled() ? null : 0, null),
1003+
rows(emptySum, null),
9961004
rows(32838, 28),
9971005
rows(39225, 32),
9981006
rows(4180, 33),
9991007
rows(48086, 34),
1000-
rows(isPushdownDisabled() ? null : 0, 36));
1008+
rows(emptySum, 36));
10011009
}
10021010

10031011
@Test
@@ -1061,7 +1069,9 @@ public void testSumNull() throws IOException {
10611069
+ " ],\n"
10621070
+ " \"datarows\": [\n"
10631071
+ " [\n"
1064-
+ (isPushdownDisabled() ? " null\n" : " 0\n")
1072+
+ ((isPushdownDisabled() || isAnalyticsParquetIndicesEnabled())
1073+
? " null\n"
1074+
: " 0\n")
10651075
+ " ]\n"
10661076
+ " ],\n"
10671077
+ " \"total\": 1,\n"

0 commit comments

Comments
 (0)