Skip to content

Commit ab4e7c6

Browse files
committed
[analytics-engine] Address review: path-aware strip, bulk-error fail-loud, stronger verifier
Resolves the correctness holes raised in review. Blocking 1 — path-aware mapping+source strip. stripUnsupportedMappingFields now returns exact dropped PATHS (e.g. [location, point]) instead of only top-level names; stripBulkFields removes those paths recursively from _source, descending through objects AND arrays of objects, preserving siblings (location.city etc.). Adds unit coverage for a complex_geo shape and arrays of objects. Blocking 2 — fail loud on bulk item failures. loadDataByRestClient now parses the _bulk response and throws on errors=true (naming the index + first item errors), for ALL test bulk loads. Silent partial ingestion no longer slips through a 200 status. Blocking 3 — verifier proves the invariant. AnalyticsUnsupportedFieldStripVerifyIT deletes each index first so the real load path always runs (not short-circuited by isIndexExist), then per dataset checks: clean live mapping, no stripped path left in sampled _source, and doc-count == source-doc count (tolerating a cluster-side count error on system indices with a transparent logged skip). NB1 — GeoPointFormatsIT exclude moved under the parsed analyticsEnabled gate (was gated on mere property presence, so =false wrongly excluded it). NB2 — out-of-scope skip (join) now refuses to skip if the mapping also declares any unsupported type, so a mixed failure can't be hidden. binary is now conditional: kept when store:true (engine reads VARBINARY), but stripped when store:false because the parquet store can't create it ("Unable to derive source for [X] with store disabled", verified on the AE cluster). Confirmed end-to-end: the verifier passes over all Index datasets on a force-routed AE cluster. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
1 parent efdcc7e commit ab4e7c6

4 files changed

Lines changed: 530 additions & 117 deletions

File tree

integ-test/build.gradle

Lines changed: 8 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1033,16 +1033,6 @@ task integTestRemote(type: RestIntegTestTask) {
10331033
// to the analytics-engine planner via index settings.
10341034
if (System.getProperty("tests.analytics.parquet_indices") != null) {
10351035
systemProperty 'tests.analytics.parquet_indices', System.getProperty("tests.analytics.parquet_indices")
1036-
1037-
// === Class-level excludes: PPL ITs doomed by the unsupported-field strip ===
1038-
// On the analytics-engine route, TestUtils.AnalyticsIndexConfig strips
1039-
// nested/geo_point/geo_shape/alias fields from every test mapping AND its bulk data, because
1040-
// the parquet/composite store can't scan them. Tests whose queries reference such a field
1041-
// therefore can't pass — the data no longer exists. Exclude them here rather than asserting.
1042-
//
1043-
// GeoPointFormatsIT: EVERY test reads a geo_point field (point / location.point / nested geo
1044-
// in geopoint_index_mapping + complex_geo_index_mapping) — whole class is doomed.
1045-
exclude 'org/opensearch/sql/ppl/GeoPointFormatsIT.class'
10461036
}
10471037

10481038
// Primary-shard count for analytics-backed test indices (default 1). Set to e.g. 3 to
@@ -1075,12 +1065,15 @@ task integTestRemote(type: RestIntegTestTask) {
10751065
excludeTestsMatching '*OperatorIT.testGteOperator'
10761066
excludeTestsMatching '*OperatorIT.testNotOperator'
10771067

1078-
// === Method-level excludes: PPL tests querying a stripped field ===
1079-
// These reference a nested/geo_point/alias-typed field that the unsupported-field strip
1080-
// removes from the test data+mapping on the AE route, so they can't pass. (Verified
1081-
// field-by-field against the index mappings; only the doomed methods are excluded — the
1082-
// rest of each class still runs.)
1068+
// === Excludes: PPL ITs doomed by the unsupported-field strip ===
1069+
// On the AE route TestUtils.AnalyticsIndexConfig strips nested/geo_point/geo_shape/alias
1070+
// fields from every test mapping AND its bulk data (the parquet/composite store can't
1071+
// scan them). A test whose query references such a field can't pass — the data is gone.
1072+
// Verified field-by-field against the index mappings; only doomed work is excluded.
1073+
// Gated on the parsed boolean so `-Dtests.analytics.parquet_indices=false` excludes none.
10831074
//
1075+
// GeoPointFormatsIT: EVERY test reads a geo_point field — whole class doomed.
1076+
excludeTestsMatching 'org.opensearch.sql.ppl.GeoPointFormatsIT'
10841077
// datatypes_index_mapping: nested_value=nested, geo_point_value=geo_point.
10851078
excludeTestsMatching 'org.opensearch.sql.ppl.DataTypeIT.test_nonnumeric_data_types'
10861079
excludeTestsMatching 'org.opensearch.sql.ppl.SystemFunctionIT.typeof_opensearch_types'

0 commit comments

Comments
 (0)