Skip to content

Stabilize order-dependent PPL ITs with explicit sort for multi-shard analytics runs#5537

Merged
ahkcs merged 1 commit into
opensearch-project:mainfrom
ahkcs:stabilize-order-dependent-ppl-its
Jun 10, 2026
Merged

Stabilize order-dependent PPL ITs with explicit sort for multi-shard analytics runs#5537
ahkcs merged 1 commit into
opensearch-project:mainfrom
ahkcs:stabilize-order-dependent-ppl-its

Conversation

@ahkcs

@ahkcs ahkcs commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Description

When PPL integration-test indices are created with more than one primary shard on the analytics-engine route, the engine merges per-shard Arrow batches in arrival order (RowProducingSink.feed() has no shard-ordinal tiebreaker, unlike the Lucene path's TopDocs.merge() shardIndex ordering). Any test whose asserted rows depend on document encounter order — head without sort, first-row assertions after patterns/grok/eval, trendline over unsorted input, result-set truncation — fails on a 3-shard run even though it passes on 1 shard.

This PR stabilizes every such test that can be fixed by adding an explicit | sort <unique key> to the query:

  • bank-based tests (bank.json is account_number-ascending): expectations unchanged.
  • accounts-based tests (accounts.json is NOT account_number-ascending): expectations updated to the sorted rows — deterministic on every route.
  • | sort placed before any fields projection that drops the sort key (verified deterministic through the projection on the multi-shard route).

It also adds a tests.analytics.num_shards system property (default 1, so existing runs are unchanged) wired through integTestRemote and TestUtils.AnalyticsIndexConfig, to make multi-shard coverage runs reproducible:

./gradlew :integ-test:integTestRemote \
  -Dtests.rest.cluster=localhost:9200 -Dtests.cluster=localhost:9300 -Dtests.clustername=runTask \
  -Dtests.analytics.parquet_indices=true -Dtests.analytics.num_shards=3

Results (suite failures on the affected classes)

Class 3-shard before 3-shard after 1-shard regular route
CalciteHeadCommandIT (+ v2 HeadCommandIT) 4 0 0 0
TrailingPipeIT 5+1 0 0 0
CalcitePPLEnhancedCoalesceIT 14 6 (pre-existing) 8 → 4 0
CalcitePPLTrendlineIT 7 1 (pre-existing) 1 0
CalcitePPLGrokIT 2 0 0 0
CalciteMVAppendFunctionIT 7 4 (pre-existing) 4 0
CalciteBinCommandIT 8+1 6 (pre-existing) 6 0
CalcitePPLPatternsIT 13 10 (see notes) 5 0
CalciteMultisearchCommandIT 6 5 (see notes) 6 0
CalcitePPLAggregationIT (×2 incl. Paginating variant) 37+37 34+35 (see notes) 10+10 0
CalciteQueryStringIT 4 4 (engine-side, see notes) 1 0
CalciteStreamstatsCommandIT 25 24 (engine-side, see notes) 23 0

"+1" = same-family tests that passed the baseline run only by lucky arrival order and were fixed here too. All remaining failures pre-exist this PR (most also fail on 1 shard) or are multi-shard engine gaps that cannot be fixed test-side:

  • stats first()/last()/take() stay non-deterministic even with a preceding sort — per-shard partial accumulators merge in arrival order at the coordinator. The existing OrdinalAppendingSink shard ordinals are not used by the gather stage; an engine-side ordinal merge would make those tests pass unmodified.
  • streamstats computes windows per shard fragment even with a global sort preceding it.
  • multisearch same-index subsearches execute the first subsearch's filter for both branches (silent row duplication — | multisearch [where M] [where F] | stats count() returns 2×M on the analytics route).
  • multi-shard 500s: derive_schema_from_partial_plan protobuf decode failure, Substrait Float64/Int64 schema mismatch for eval unix_timestamp(...) | stats ... by, TRY_CAST List(Struct) → Utf8 for BRAIN patterns.
  • query_string intermittently drops one row across shards (16→15).

These are tracked separately for upstream engine fixes.

Related Issues

Part of the analytics-engine PPL compatibility effort.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • Commits are signed per the DCO using --signoff or -s.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…analytics runs

The analytics engine merges per-shard Arrow batches in arrival order (no
shard-ordinal tiebreaker), so any IT whose asserted rows depend on document
encounter order fails when test indices have more than one primary shard.
Add an explicit sort on a unique key to the affected queries so results are
deterministic on every route; where the sorted result differs from insertion
order (accounts.json is not account_number-ascending), update expectations
to the sorted rows.

Also add the tests.analytics.num_shards system property (default 1, wired
through integTestRemote and TestUtils.AnalyticsIndexConfig) so multi-shard
coverage runs can be reproduced:

  ./gradlew :integ-test:integTestRemote \
    -Dtests.rest.cluster=localhost:9200 -Dtests.cluster=localhost:9300 \
    -Dtests.clustername=runTask \
    -Dtests.analytics.parquet_indices=true -Dtests.analytics.num_shards=3

Verified on three routes: 3-shard analytics, 1-shard analytics, and the
regular Calcite route (self-managed cluster, all green).

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@github-actions

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate positive shard count

Add validation to ensure the parsed number of shards is positive. Invalid or
negative values could cause index creation failures or unexpected behavior in
multi-shard test scenarios.

integ-test/src/test/java/org/opensearch/sql/legacy/TestUtils.java [78-80]

 public static int getNumShards() {
-  return Integer.parseInt(System.getProperty(NUM_SHARDS_PROP, "1"));
+  int numShards = Integer.parseInt(System.getProperty(NUM_SHARDS_PROP, "1"));
+  if (numShards <= 0) {
+    throw new IllegalArgumentException("Number of shards must be positive, got: " + numShards);
+  }
+  return numShards;
 }
Suggestion importance[1-10]: 7

__

Why: Adding validation for positive shard count prevents potential index creation failures from invalid configuration. However, this is primarily a defensive check for test configuration rather than a critical bug fix, as the system property is typically controlled by test infrastructure.

Medium

@ahkcs ahkcs added the enhancement New feature or request label Jun 10, 2026
@ahkcs ahkcs merged commit 90358dc into opensearch-project:main Jun 10, 2026
41 of 42 checks passed
@ahkcs ahkcs deleted the stabilize-order-dependent-ppl-its branch June 10, 2026 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants