Skip to content

Commit c4583ab

Browse files
committed
Stabilize more PPL ITs on the analytics-engine route (sort/streamstats/IP-UDT/metadata/strip-verifier)
Brings 16 more PPL IT classes to parity on the analytics-engine route (-Dtests.analytics.parquet_indices=true). Route-only divergences are gated AE-only via @RequiresCapability + a matching build.gradle excludeTestsMatching; the v2/Calcite path runs every test unchanged. Strip-verifier (the #5541 guardrail): - AnalyticsUnsupportedFieldStripVerifyIT was failing because 8 datasets carry a multi-value JSON array for a scalar-mapped field, which the parquet store rejects at bulk load. That's a cardinality limitation, not an unsupported field *type*, so it's out of scope for the type strip — the same situation as the existing `join` out-of-scope skip. Added a curated MULTI_VALUE_DATASETS allowlist + safeToSkipForMultiValueLoad that skips only the exact multi-value signature on a known dataset; any other failure still surfaces loudly, and Legs 2-3 still type-check every index that loads. Init-load contamination (not divergences — fixed, not gated): - CalciteWhereCommandIT failed on testDoubleEqual* because init() loaded game_of_thrones (base) and deep_nested (subclass), both multi-value datasets whose bulk-load failure aborted init() and mislabeled the first test. Guarded both loads with isAnalyticsParquetIndicesEnabled(); no test in the hierarchy queries them on the AE route. 32/32 now pass. Engine divergences gated (new capabilities): - SORT_TIE_ORDER_UNSTABLE: sort on a non-unique key leaves ties in an engine-dependent order (3 CalcitePPLSortIT tests) - INVALID_DATETIME_ERROR_SHAPE: dayname/monthname over an invalid literal throw a different message shape (2 tests) - RAND_SEED_UNSUPPORTED: seeded RAND(seed) is rejected on AE - IP_UDT_BINARY_REPRESENTATION: the IP UDT is materialized as BINARY, so cast(... as IP) and cidrmatch over an IP column fail (2 tests) - TIME_TYPE_WIDENED_TO_TIMESTAMP: a TIME field reads back as TIMESTAMP, defeating TIMEDIFF's [TIME,TIME] signature - BINARY_FIELD_STRIPPED: binary fields are stripped at load - VALUES_LIMIT_NOT_HONORED: values()/list() ignore the configured limit - INDEX_METADATA: _index metadata not exposed (sibling of ID_METADATA) - CROSS_INDEX_OBJECT_LEAF_MERGE: an object leaf in only some wildcard member indices resolves to FIELD_NOT_FOUND - TEXT_KEYWORD_PUSHDOWN_REWRITE: like() doesn't rewrite to .keyword in the explain plan (no Lucene term-pushdown) - LUCENE_PUSHDOWN_EXPLAIN: a test asserting a Lucene SORT-> pushdown fragment can't match the DataFusion plan Reused existing capabilities: - WILDCARD_COLUMN_ORDER: streamstats carries all source columns through; AE returns them in a different order (4 CalciteReverseCommandIT tests) - HEAD_WITHOUT_STABLE_SORT: head N without a stable sort (testHeadThenSort, testAppendWithMergedColumn) - DEDUP_NONDETERMINISTIC: consecutive dedup has no working V2 fallback on the AE route Out of scope: - FieldsCommandIT.testEnhancedFieldsWhenCalciteDisabled asserts the Calcite-DISABLED error; the AE route is always Calcite-enabled. build.gradle exclude only. Results (this batch, on the AE route): 16 classes, 373 run, 0 failures (was 24 failures). V2 baseline: 408 run, 0 failures, 2 pre-existing/ by-design skips (none from these gates). Signed-off-by: Kai Huang <ahkcs@amazon.com>
1 parent 08d8ba1 commit c4583ab

18 files changed

Lines changed: 364 additions & 3 deletions

integ-test/build.gradle

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1285,6 +1285,54 @@ task integTestRemote(type: RestIntegTestTask) {
12851285
excludeTestsMatching '*CalciteChartCommandIT.testChartLimitTopWithUseOther'
12861286
excludeTestsMatching '*CalciteChartCommandIT.testChartLimitBottomWithUseOther'
12871287
excludeTestsMatching '*CalciteChartCommandIT.testChartLimitTopWithMinAgg'
1288+
1289+
// === Excludes: sort on a non-unique key leaves ties in an engine-dependent order ===
1290+
excludeTestsMatching '*CalcitePPLSortIT.testSortWithNullValue'
1291+
excludeTestsMatching '*CalcitePPLSortIT.testSortAgeAndFieldsNameAge'
1292+
excludeTestsMatching '*CalcitePPLSortIT.testSortWithAutoCast'
1293+
1294+
// === Excludes: streamstats carries all columns through; AE reorders them ===
1295+
excludeTestsMatching '*CalciteReverseCommandIT.testStreamstatsWithReverse'
1296+
excludeTestsMatching '*CalciteReverseCommandIT.testStreamstatsByWithReverse'
1297+
excludeTestsMatching '*CalciteReverseCommandIT.testStreamstatsWindowWithReverse'
1298+
excludeTestsMatching '*CalciteReverseCommandIT.testStreamstatsWithSortThenReverse'
1299+
1300+
// === Excludes: invalid-datetime error-message shape differs on AE ===
1301+
excludeTestsMatching '*CalcitePPLBuiltinDatetimeFunctionInvalidIT.testDAYNAMEInvalid'
1302+
excludeTestsMatching '*CalcitePPLBuiltinDatetimeFunctionInvalidIT.testMONTHNAMEInvalid'
1303+
1304+
// === Excludes: seeded RAND(seed) unsupported on AE ===
1305+
excludeTestsMatching '*MathematicalFunctionIT.testRand'
1306+
1307+
// === Excludes: IP UDT is materialized as BINARY/byte[] on AE ===
1308+
excludeTestsMatching '*CastFunctionIT.testCastToIP'
1309+
excludeTestsMatching '*CalcitePPLAppendCommandIT.testAppendSchemaMergeWithIpUDT'
1310+
1311+
// === Excludes: append head-without-stable-sort + TIME-widened-to-TIMESTAMP signature ===
1312+
excludeTestsMatching '*CalcitePPLAppendCommandIT.testAppendWithMergedColumn'
1313+
excludeTestsMatching '*CalcitePPLBuiltinFunctionsNullIT.testTimediffNull'
1314+
1315+
// === Excludes: consecutive dedup has no working V2 fallback on AE ===
1316+
excludeTestsMatching '*CalciteDedupCommandIT.testConsecutiveDedup'
1317+
1318+
// === Excludes: binary field stripped at load; values() ignores configured limit ===
1319+
excludeTestsMatching '*CalciteMultiValueStatsIT.testListFunctionWithBinary'
1320+
excludeTestsMatching '*CalciteMultiValueStatsIT.testValuesFunctionRespectsConfiguredLimit'
1321+
1322+
// === Excludes: _id/_index metadata not exposed; cross-index object-leaf merge ===
1323+
excludeTestsMatching '*FieldsCommandIT.testMetadataFields'
1324+
excludeTestsMatching '*FieldsCommandIT.testMergedObjectFields'
1325+
// OOS: asserts the Calcite-disabled error, but the AE route is always Calcite-enabled.
1326+
excludeTestsMatching '*FieldsCommandIT.testEnhancedFieldsWhenCalciteDisabled'
1327+
1328+
// === Excludes: head N without a stable sort returns a non-deterministic row set ===
1329+
excludeTestsMatching '*SortCommandIT.testHeadThenSort'
1330+
1331+
// === Excludes: like() doesn't rewrite to .keyword in the explain plan on AE ===
1332+
excludeTestsMatching '*LikeQueryIT.test_convert_field_text_to_keyword'
1333+
1334+
// === Excludes: asserts a Lucene pushdown fragment absent on the AE route ===
1335+
excludeTestsMatching '*CalciteSortCommandIT.testPushdownSortCastToDoubleExpression'
12881336
}
12891337
}
12901338

integ-test/src/test/java/org/opensearch/sql/calcite/remote/AnalyticsUnsupportedFieldStripVerifyIT.java

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,32 @@ public class AnalyticsUnsupportedFieldStripVerifyIT extends PPLIntegTestCase {
7070
*/
7171
private static final Set<String> OUT_OF_SCOPE_TYPES = Set.of("join");
7272

73+
/**
74+
* Datasets that carry a multi-value JSON array for a scalar-mapped field (e.g. a {@code text} or
75+
* {@code long} field given several values in a doc). The parquet/composite store rejects these at
76+
* bulk load with {@code Cannot accept multiple values for field} — a cardinality limitation, not
77+
* an unsupported field *type*, so it is out of scope for the type strip ({@link
78+
* org.opensearch.sql.legacy.TestUtils.AnalyticsIndexConfig}) the same way {@link
79+
* #OUT_OF_SCOPE_TYPES} is. Tracked by the {@code MULTI_VALUE_FIELD_LOAD} capability. An index
80+
* here is skipped only when its failure is the exact multi-value signature AND it is on this
81+
* curated list (see {@link #safeToSkipForMultiValueLoad}); any other failure surfaces loudly.
82+
* Legs 2-3 still type-check the live mapping of every index that loads, so a missed type-strip
83+
* can't hide.
84+
*/
85+
private static final Set<String> MULTI_VALUE_DATASETS =
86+
Set.of(
87+
"GAME_OF_THRONES",
88+
"NESTED",
89+
"NESTED_WITH_QUOTES",
90+
"DEEP_NESTED",
91+
"NESTED_WITH_NULLS",
92+
"GRAPH_AIRPORTS",
93+
"ARRAY",
94+
"OTELLOGS");
95+
96+
/** Cluster's per-item bulk error when a doc supplies an array to a scalar-mapped field. */
97+
private static final String MULTI_VALUE_SIGNATURE = "Cannot accept multiple values for field";
98+
7399
@Override
74100
public void init() throws Exception {
75101
super.init();
@@ -109,6 +135,12 @@ public void everyDatasetIngestsCleanlyOnAnalyticsEngine() throws IOException {
109135
// unsupported type we're responsible for — not our concern. Skip.
110136
continue;
111137
}
138+
if (safeToSkipForMultiValueLoad(e, index)) {
139+
// Load failed with the multi-value signature on a known multi-value-array-into-scalar
140+
// dataset (a cardinality limitation, not an unsupported type) — out of scope for the type
141+
// strip, tracked by MULTI_VALUE_FIELD_LOAD. Skip.
142+
continue;
143+
}
112144
failures.add(
113145
"["
114146
+ index.name()
@@ -380,6 +412,23 @@ private static boolean safeToSkipForOutOfScopeType(Throwable t, String mapping)
380412
return !mappingContainsUnsupportedType(mapping);
381413
}
382414

415+
/**
416+
* Safe to skip an index whose load failed, only when BOTH hold: (a) the error is exactly the
417+
* multi-value bulk signature ({@code Cannot accept multiple values for field}), AND (b) the index
418+
* is on the curated {@link #MULTI_VALUE_DATASETS} allowlist. Unlike {@link
419+
* #safeToSkipForOutOfScopeType} this does NOT also require the raw mapping to be free of
420+
* unsupported types: several of these datasets legitimately declare {@code nested} fields that
421+
* the type strip removes at load, while the multi-value failure is on a separate scalar leaf. The
422+
* curated allowlist plus the exact failure signature is the masking guard — an unanticipated
423+
* failure (different message, or a dataset not on the list) is never skipped and surfaces loudly.
424+
*/
425+
private static boolean safeToSkipForMultiValueLoad(Throwable t, Index index) {
426+
String msg = rootMessage(t);
427+
return msg != null
428+
&& msg.contains(MULTI_VALUE_SIGNATURE)
429+
&& MULTI_VALUE_DATASETS.contains(index.name());
430+
}
431+
383432
/** True if the raw mapping JSON declares any {@link #UNSUPPORTED} field type at any depth. */
384433
private static boolean mappingContainsUnsupportedType(String mapping) {
385434
if (mapping == null || mapping.isEmpty()) {

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteDedupCommandIT.java

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,11 @@
55

66
package org.opensearch.sql.calcite.remote;
77

8+
import static org.opensearch.sql.util.Capability.DEDUP_NONDETERMINISTIC;
9+
810
import java.io.IOException;
911
import org.opensearch.sql.ppl.DedupCommandIT;
12+
import org.opensearch.sql.util.RequiresCapability;
1013

1114
public class CalciteDedupCommandIT extends DedupCommandIT {
1215
@Override
@@ -15,6 +18,11 @@ public void init() throws Exception {
1518
enableCalcite();
1619
}
1720

21+
@RequiresCapability(
22+
value = DEDUP_NONDETERMINISTIC,
23+
note =
24+
"consecutive dedup falls back to V2 on the Calcite path, but the AE route has no working"
25+
+ " V2 fallback (DEDUP_NONDETERMINISTIC).")
1826
@Override
1927
public void testConsecutiveDedup() throws IOException {
2028
withFallbackEnabled(

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMultiValueStatsIT.java

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@
1010
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_CALCS;
1111
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_DATATYPE_NONNUMERIC;
1212
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_DATATYPE_NUMERIC;
13+
import static org.opensearch.sql.util.Capability.BINARY_FIELD_STRIPPED;
14+
import static org.opensearch.sql.util.Capability.VALUES_LIMIT_NOT_HONORED;
1315
import static org.opensearch.sql.util.MatcherUtils.rows;
1416
import static org.opensearch.sql.util.MatcherUtils.schema;
1517
import static org.opensearch.sql.util.MatcherUtils.verifyDataRows;
@@ -21,6 +23,7 @@
2123
import org.json.JSONObject;
2224
import org.junit.jupiter.api.Test;
2325
import org.opensearch.sql.ppl.PPLIntegTestCase;
26+
import org.opensearch.sql.util.RequiresCapability;
2427

2528
public class CalciteMultiValueStatsIT extends PPLIntegTestCase {
2629

@@ -169,6 +172,9 @@ public void testListFunctionWithIP() throws IOException {
169172
}
170173

171174
@Test
175+
@RequiresCapability(
176+
value = BINARY_FIELD_STRIPPED,
177+
note = "binary_value is stripped at load on the AE route (BINARY_FIELD_STRIPPED).")
172178
public void testListFunctionWithBinary() throws IOException {
173179
JSONObject response =
174180
executeQuery(
@@ -420,6 +426,11 @@ public void testValuesFunctionWithUnlimitedValues() throws IOException {
420426
}
421427

422428
@Test
429+
@RequiresCapability(
430+
value = VALUES_LIMIT_NOT_HONORED,
431+
note =
432+
"values() ignores plugins.ppl.values.max.limit on the AE route"
433+
+ " (VALUES_LIMIT_NOT_HONORED).")
423434
public void testValuesFunctionRespectsConfiguredLimit() throws IOException, InterruptedException {
424435
// Test 1: Set limit to 3 and verify only 3 values are returned
425436
updateClusterSettings(new ClusterSetting(TRANSIENT, "plugins.ppl.values.max.limit", "3"));

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAppendCommandIT.java

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@
88
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_ACCOUNT;
99
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_BANK;
1010
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_WEBLOGS;
11+
import static org.opensearch.sql.util.Capability.HEAD_WITHOUT_STABLE_SORT;
12+
import static org.opensearch.sql.util.Capability.IP_UDT_BINARY_REPRESENTATION;
1113
import static org.opensearch.sql.util.MatcherUtils.rows;
1214
import static org.opensearch.sql.util.MatcherUtils.schema;
1315
import static org.opensearch.sql.util.MatcherUtils.verifyDataRows;
@@ -22,6 +24,7 @@
2224
import org.opensearch.client.ResponseException;
2325
import org.opensearch.sql.common.setting.Settings;
2426
import org.opensearch.sql.ppl.PPLIntegTestCase;
27+
import org.opensearch.sql.util.RequiresCapability;
2528

2629
public class CalcitePPLAppendCommandIT extends PPLIntegTestCase {
2730
@Override
@@ -195,6 +198,11 @@ public void testAppendDifferentIndex() throws IOException {
195198
}
196199

197200
@Test
201+
@RequiresCapability(
202+
value = HEAD_WITHOUT_STABLE_SORT,
203+
note =
204+
"head 5 over the two-branch append has no globally-unique sort key, so the"
205+
+ " surviving/ordered rows diverge on the AE route (HEAD_WITHOUT_STABLE_SORT).")
198206
public void testAppendWithMergedColumn() throws IOException {
199207
JSONObject actual =
200208
executeQuery(
@@ -258,6 +266,11 @@ public void testAppendSchemaMergeWithTimestampUDT() throws IOException {
258266
}
259267

260268
@Test
269+
@RequiresCapability(
270+
value = IP_UDT_BINARY_REPRESENTATION,
271+
note =
272+
"cidrmatch over an appended IP column hits the IP-UDT-as-byte[] gap on the AE route"
273+
+ " (IP_UDT_BINARY_REPRESENTATION).")
261274
public void testAppendSchemaMergeWithIpUDT() throws IOException {
262275
JSONObject actual =
263276
executeQuery(

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBuiltinDatetimeFunctionInvalidIT.java

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,14 @@
66
package org.opensearch.sql.calcite.remote;
77

88
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_DATE_FORMATS_WITH_NULL;
9+
import static org.opensearch.sql.util.Capability.INVALID_DATETIME_ERROR_SHAPE;
910
import static org.opensearch.sql.util.MatcherUtils.verifyErrorMessageContains;
1011

1112
import org.junit.jupiter.api.Test;
1213
import org.opensearch.sql.exception.ExpressionEvaluationException;
1314
import org.opensearch.sql.legacy.SQLIntegTestCase;
1415
import org.opensearch.sql.ppl.PPLIntegTestCase;
16+
import org.opensearch.sql.util.RequiresCapability;
1517

1618
public class CalcitePPLBuiltinDatetimeFunctionInvalidIT extends PPLIntegTestCase {
1719
@Override
@@ -216,6 +218,11 @@ public void testDAYInvalid() {
216218
}
217219

218220
@Test
221+
@RequiresCapability(
222+
value = INVALID_DATETIME_ERROR_SHAPE,
223+
note =
224+
"dayname/monthname over an invalid datetime literal throws a different error-message"
225+
+ " shape on the AE route (INVALID_DATETIME_ERROR_SHAPE).")
219226
public void testDAYNAMEInvalid() {
220227

221228
Throwable e1 =
@@ -760,6 +767,11 @@ public void testMONTH_OF_YEARInvalid() {
760767
}
761768

762769
@Test
770+
@RequiresCapability(
771+
value = INVALID_DATETIME_ERROR_SHAPE,
772+
note =
773+
"dayname/monthname over an invalid datetime literal throws a different error-message"
774+
+ " shape on the AE route (INVALID_DATETIME_ERROR_SHAPE).")
763775
public void testMONTHNAMEInvalid() {
764776

765777
Throwable e1 =

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBuiltinFunctionsNullIT.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_DATE_FORMATS_WITH_NULL;
99
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_NULL_MISSING;
1010
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_STATE_COUNTRY_WITH_NULL;
11+
import static org.opensearch.sql.util.Capability.TIME_TYPE_WIDENED_TO_TIMESTAMP;
1112
import static org.opensearch.sql.util.MatcherUtils.*;
1213
import static org.opensearch.sql.util.MatcherUtils.rows;
1314

@@ -17,6 +18,7 @@
1718
import org.junit.jupiter.api.Test;
1819
import org.opensearch.sql.exception.ExpressionEvaluationException;
1920
import org.opensearch.sql.ppl.PPLIntegTestCase;
21+
import org.opensearch.sql.util.RequiresCapability;
2022

2123
public class CalcitePPLBuiltinFunctionsNullIT extends PPLIntegTestCase {
2224
@Override
@@ -887,6 +889,11 @@ public void testTimeToSecNull() throws IOException {
887889
}
888890

889891
@Test
892+
@RequiresCapability(
893+
value = TIME_TYPE_WIDENED_TO_TIMESTAMP,
894+
note =
895+
"the TIME field reads back as TIMESTAMP on the AE route, defeating TIMEDIFF's [TIME,TIME]"
896+
+ " signature (TIME_TYPE_WIDENED_TO_TIMESTAMP).")
890897
public void testTimediffNull() throws IOException {
891898
JSONObject actual =
892899
executeQuery(

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSortIT.java

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77

88
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_BANK;
99
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_BANK_WITH_NULL_VALUES;
10+
import static org.opensearch.sql.util.Capability.SORT_TIE_ORDER_UNSTABLE;
1011
import static org.opensearch.sql.util.MatcherUtils.rows;
1112
import static org.opensearch.sql.util.MatcherUtils.schema;
1213
import static org.opensearch.sql.util.MatcherUtils.verifyDataRowsInOrder;
@@ -16,6 +17,7 @@
1617
import org.json.JSONObject;
1718
import org.junit.jupiter.api.Test;
1819
import org.opensearch.sql.ppl.PPLIntegTestCase;
20+
import org.opensearch.sql.util.RequiresCapability;
1921

2022
public class CalcitePPLSortIT extends PPLIntegTestCase {
2123

@@ -165,6 +167,11 @@ public void testSortAgeAndFieldsAge() throws IOException {
165167
}
166168

167169
@Test
170+
@RequiresCapability(
171+
value = SORT_TIE_ORDER_UNSTABLE,
172+
note =
173+
"sort -age leaves the age=36 tie in an engine-dependent order on the AE route"
174+
+ " (SORT_TIE_ORDER_UNSTABLE).")
168175
public void testSortAgeAndFieldsNameAge() throws IOException {
169176
JSONObject actual =
170177
executeQuery(
@@ -200,6 +207,11 @@ public void testSortAgeNameAndFieldsNameAge() throws IOException {
200207
}
201208

202209
@Test
210+
@RequiresCapability(
211+
value = SORT_TIE_ORDER_UNSTABLE,
212+
note =
213+
"sort balance leaves the three null-balance rows in an engine-dependent order on the AE"
214+
+ " route (SORT_TIE_ORDER_UNSTABLE).")
203215
public void testSortWithNullValue() throws IOException {
204216
JSONObject result =
205217
executeQuery(
@@ -315,6 +327,11 @@ public void testSortWithStrCast() throws IOException {
315327
}
316328

317329
@Test
330+
@RequiresCapability(
331+
value = SORT_TIE_ORDER_UNSTABLE,
332+
note =
333+
"sort AUTO(age) leaves the age=36 tie in an engine-dependent order on the AE route"
334+
+ " (SORT_TIE_ORDER_UNSTABLE).")
318335
public void testSortWithAutoCast() throws IOException {
319336
JSONObject result =
320337
executeQuery(

0 commit comments

Comments
 (0)