Stabilize order-dependent PPL ITs with explicit sort for multi-shard analytics runs by ahkcs · Pull Request #5537 · opensearch-project/sql

ahkcs · 2026-06-10T18:49:49Z

Description

When PPL integration-test indices are created with more than one primary shard on the analytics-engine route, the engine merges per-shard Arrow batches in arrival order (RowProducingSink.feed() has no shard-ordinal tiebreaker, unlike the Lucene path's TopDocs.merge() shardIndex ordering). Any test whose asserted rows depend on document encounter order — head without sort, first-row assertions after patterns/grok/eval, trendline over unsorted input, result-set truncation — fails on a 3-shard run even though it passes on 1 shard.

This PR stabilizes every such test that can be fixed by adding an explicit | sort <unique key> to the query:

bank-based tests (bank.json is account_number-ascending): expectations unchanged.
accounts-based tests (accounts.json is NOT account_number-ascending): expectations updated to the sorted rows — deterministic on every route.
| sort placed before any fields projection that drops the sort key (verified deterministic through the projection on the multi-shard route).

It also adds a tests.analytics.num_shards system property (default 1, so existing runs are unchanged) wired through integTestRemote and TestUtils.AnalyticsIndexConfig, to make multi-shard coverage runs reproducible:

./gradlew :integ-test:integTestRemote \
  -Dtests.rest.cluster=localhost:9200 -Dtests.cluster=localhost:9300 -Dtests.clustername=runTask \
  -Dtests.analytics.parquet_indices=true -Dtests.analytics.num_shards=3

Results (suite failures on the affected classes)

Class	3-shard before	3-shard after	1-shard
CalciteHeadCommandIT (+ v2 HeadCommandIT)	4	0	0
TrailingPipeIT	5+1	0	0
CalcitePPLEnhancedCoalesceIT	14	6 (pre-existing)	8 → 4
CalcitePPLTrendlineIT	7	1 (pre-existing)	1
CalcitePPLGrokIT	2	0	0
CalciteMVAppendFunctionIT	7	4 (pre-existing)	4
CalciteBinCommandIT	8+1	6 (pre-existing)	6
CalcitePPLPatternsIT	13	10 (see notes)	5
CalciteMultisearchCommandIT	6	5 (see notes)	6
CalcitePPLAggregationIT (×2 incl. Paginating variant)	37+37	34+35 (see notes)	10+10
CalciteQueryStringIT	4	4 (engine-side, see notes)	1
CalciteStreamstatsCommandIT	25	24 (engine-side, see notes)	23

"+1" = same-family tests that passed the baseline run only by lucky arrival order and were fixed here too. All remaining failures pre-exist this PR (most also fail on 1 shard) or are multi-shard engine gaps that cannot be fixed test-side:

stats first()/last()/take() stay non-deterministic even with a preceding sort — per-shard partial accumulators merge in arrival order at the coordinator. The existing OrdinalAppendingSink shard ordinals are not used by the gather stage; an engine-side ordinal merge would make those tests pass unmodified.
streamstats computes windows per shard fragment even with a global sort preceding it.
multisearch same-index subsearches execute the first subsearch's filter for both branches (silent row duplication — | multisearch [where M] [where F] | stats count() returns 2×M on the analytics route).
multi-shard 500s: derive_schema_from_partial_plan protobuf decode failure, Substrait Float64/Int64 schema mismatch for eval unix_timestamp(...) | stats ... by, TRY_CAST List(Struct) → Utf8 for BRAIN patterns.
query_string intermittently drops one row across shards (16→15).

These are tracked separately for upstream engine fixes.

Related Issues

Part of the analytics-engine PPL compatibility effort.

Check List

New functionality includes testing.
New functionality has been documented.
Commits are signed per the DCO using --signoff or -s.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…analytics runs The analytics engine merges per-shard Arrow batches in arrival order (no shard-ordinal tiebreaker), so any IT whose asserted rows depend on document encounter order fails when test indices have more than one primary shard. Add an explicit sort on a unique key to the affected queries so results are deterministic on every route; where the sorted result differs from insertion order (accounts.json is not account_number-ascending), update expectations to the sorted rows. Also add the tests.analytics.num_shards system property (default 1, wired through integTestRemote and TestUtils.AnalyticsIndexConfig) so multi-shard coverage runs can be reproduced: ./gradlew :integ-test:integTestRemote \ -Dtests.rest.cluster=localhost:9200 -Dtests.cluster=localhost:9300 \ -Dtests.clustername=runTask \ -Dtests.analytics.parquet_indices=true -Dtests.analytics.num_shards=3 Verified on three routes: 3-shard analytics, 1-shard analytics, and the regular Calcite route (self-managed cluster, all green). Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-06-10T18:51:13Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

github-actions · 2026-06-10T18:51:31Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Validate positive shard count Add validation to ensure the parsed number of shards is positive. Invalid or negative values could cause index creation failures or unexpected behavior in multi-shard test scenarios. integ-test/src/test/java/org/opensearch/sql/legacy/TestUtils.java [78-80] public static int getNumShards() { - return Integer.parseInt(System.getProperty(NUM_SHARDS_PROP, "1")); + int numShards = Integer.parseInt(System.getProperty(NUM_SHARDS_PROP, "1")); + if (numShards <= 0) { + throw new IllegalArgumentException("Number of shards must be positive, got: " + numShards); + } + return numShards; } Suggestion importance[1-10]: 7 __ Why: Adding validation for positive shard count prevents potential index creation failures from invalid configuration. However, this is primarily a defensive check for test configuration rather than a critical bug fix, as the system property is typically controlled by test infrastructure.	Medium

ahkcs requested review from LantaoJin, RyanL1997, Swiddis, acarbonetto, anirudha, dai-chen, joshuali925, mengweieric, noCharger, penghuo, ps48, qianheng-aws, songkant-aws, vamsimanohar, ykmr1224 and yuancu as code owners June 10, 2026 18:49

ahkcs added the enhancement New feature or request label Jun 10, 2026

Swiddis approved these changes Jun 10, 2026

View reviewed changes

vinaykpud approved these changes Jun 10, 2026

View reviewed changes

RyanL1997 approved these changes Jun 10, 2026

View reviewed changes

ahkcs merged commit 90358dc into opensearch-project:main Jun 10, 2026
41 of 42 checks passed

ahkcs deleted the stabilize-order-dependent-ppl-its branch June 10, 2026 23:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilize order-dependent PPL ITs with explicit sort for multi-shard analytics runs#5537

Stabilize order-dependent PPL ITs with explicit sort for multi-shard analytics runs#5537
ahkcs merged 1 commit into
opensearch-project:mainfrom
ahkcs:stabilize-order-dependent-ppl-its

ahkcs commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ahkcs commented Jun 10, 2026

Description

Results (suite failures on the affected classes)

Related Issues

Check List

Uh oh!

github-actions Bot commented Jun 10, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions Bot commented Jun 10, 2026

PR Code Suggestions ✨

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants