Extended UT Framework for Functional Correctness of query conversion by nssuresh2007 · Pull Request #21060 · opensearch-project/OpenSearch

nssuresh2007 · 2026-03-31T10:51:09Z

Changes to test the end to end DSL query conversion without OpenSearch cluster. Currently adds support for match_all and terms with agg for the initial commit.

Description

This feature introduces a golden-file-based unit test framework for the OpenSearch DSL query executor plugin (sandbox/plugins/dsl-query-executor/). The framework validates the correctness of both the forward conversion path (DSL → RelNode) and the reverse conversion path (RelNode execution results → SearchResponse DSL output) by comparing actual outputs against pre-approved golden files. Each golden file encodes a complete test case: the input DSL, the expected RelNode logical plan, simulated execution rows, and the expected DSL output.

Requirements

As a developer, I want each golden file to contain all inputs and expected outputs for a single test case, so that test cases are self-contained and easy to review in code reviews.
As a developer, I want the test framework to automatically discover and parse golden files, so that adding a new test case requires only adding a new golden file.
As a developer, I want to validate that a given DSL input produces the expected Calcite logical plan, so that regressions in the conversion pipeline are caught automatically.
As a developer, I want to validate that simulated execution results are correctly converted back into a SearchResponse, so that regressions in the response building logic are caught automatically.
As a developer, I want to verify that the forward and reverse paths are consistent within a golden file, so that the entire DSL → RelNode → execution → DSL output pipeline is validated end-to-end.
As a developer, I want golden file test cases covering the hits pipeline scenarios, so that query translation correctness is validated for common query types.
As a developer, I want golden file test cases covering the aggregation pipeline scenarios, so that aggregation translation correctness is validated for supported metric and bucket types.
As a developer, I want a way to regenerate golden files when intentional changes are made to the conversion logic, so that maintaining golden files does not become a burden.
As a developer, I want the golden file tests to run as fast unit tests without requiring a running OpenSearch cluster, so that they can be part of the standard build cycle.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-03-31T10:52:19Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Golden File Framework Infrastructure (POJO, Loader, Updater, CalciteInfra, Design Doc) Relevant files: sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/golden/GoldenTestCase.java sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/golden/GoldenFileLoader.java sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/golden/GoldenFileUpdater.java sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/golden/CalciteTestInfra.java sandbox/plugins/dsl-query-executor/src/test/design.md Sub-PR theme: Golden File Test Cases and Test Runner Relevant files: sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/golden/DslGoldenFileTests.java sandbox/plugins/dsl-query-executor/src/test/resources/golden/match_all_hits.json sandbox/plugins/dsl-query-executor/src/test/resources/golden/terms_with_avg_aggregation.json
⚡ Recommended focus areas for review Missing Aggregation Output The golden file `terms_with_avg_aggregation.json` has `expectedOutputDsl` that only contains an empty hits section with no aggregation buckets. The reverse path test for aggregation queries should validate the actual aggregation output (buckets with brand keys, avg_price values, doc_counts), but the expected output appears to be a placeholder that doesn't reflect the actual aggregation results from the executionRows. import java.util.List; import java.util.Map; /** * Golden file tests for the DSL query executor plugin. * * <p>Each test method loads its own specific golden file by name, keeping * dependencies explicit. Tests validate the forward path (DSL → RelNode), * reverse path (ExecutionResult → SearchResponse), and field name consistency. / public class DslGoldenFileTests extends OpenSearchTestCase { Update Mode Race Condition* In `runForwardPathTest`, update mode calls `GoldenFileUpdater.update(filePath, actualPlan, tc.getExpectedOutputDsl())` — passing the existing expected output DSL rather than the actual computed output. This means update mode for the forward path test will not regenerate the `expectedOutputDsl` with real values, potentially writing stale data back to the golden file. if (GoldenFileUpdater.isUpdateMode()) { Path filePath = goldenFilePath(goldenFileName); GoldenFileUpdater.update(filePath, actualPlan, tc.getExpectedOutputDsl()); } else { Mutable Map Comparison `stripNonDeterministicFields` mutates `tc.getExpectedOutputDsl()` directly, which modifies the `GoldenTestCase` object's internal state. If the same test case object were reused (e.g., in `runAllChecks` which calls both `runReversePathTest` and `runConsistencyCheck`), the stripped map could cause unexpected behavior. Consider working on a defensive copy instead. Map<String, Object> expectedOutput = tc.getExpectedOutputDsl(); stripNonDeterministicFields(actualOutput); stripNonDeterministicFields(expectedOutput); Missing planType Validation The `validate` method checks all required fields except `planType`, which is used in `DslGoldenFileTests` via `QueryPlans.Type.valueOf(tc.getPlanType())`. A missing or invalid `planType` in a golden file will cause a `NullPointerException` or `IllegalArgumentException` at test runtime rather than a clear validation error at load time. private static void validate(GoldenTestCase testCase, Path filePath) { requireNonNull(testCase.getTestName(), "testName", filePath); requireNonNull(testCase.getIndexName(), "indexName", filePath); requireNonNull(testCase.getIndexMapping(), "indexMapping", filePath); requireNonNull(testCase.getInputDsl(), "inputDsl", filePath); requireNonNull(testCase.getExpectedRelNodePlan(), "expectedRelNodePlan", filePath); requireNonNull(testCase.getExecutionFieldNames(), "executionFieldNames", filePath); requireNonNull(testCase.getExecutionRows(), "executionRows", filePath); requireNonNull(testCase.getExpectedOutputDsl(), "expectedOutputDsl", filePath); } Incorrect Expected Output The `expectedOutputDsl` for `match_all_hits.json` shows `"hits": []` with `"total": {"value": 0}`, but `executionRows` contains 2 rows (laptop and phone). The expected output does not reflect the actual execution rows, suggesting the golden file's expected output is a placeholder and the reverse path test may be asserting against incorrect expected values. "expectedOutputDsl": { "num_reduce_phases": 0, "hits": { "total": { "value": 0, "relation": "eq" }, "max_score": 0.0, "hits": [] } }

github-actions · 2026-03-31T10:52:57Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix stale value passed in forward-path update mode In update mode for the forward path, `tc.getExpectedOutputDsl()` is passed as the output DSL, but this is the old/existing value from the golden file rather than the actual computed output. The actual output DSL should be preserved from the reverse path run, or the forward-path update should only update `expectedRelNodePlan` while leaving `expectedOutputDsl` unchanged. Passing the stale expected value will silently overwrite the file with no real change to `expectedOutputDsl`. sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/golden/DslGoldenFileTests.java [74-77] if (GoldenFileUpdater.isUpdateMode()) { Path filePath = goldenFilePath(goldenFileName); - GoldenFileUpdater.update(filePath, actualPlan, tc.getExpectedOutputDsl()); + // Only update the relNodePlan; preserve existing expectedOutputDsl + GoldenFileUpdater.update(filePath, actualPlan, GoldenFileLoader.load(filePath).getExpectedOutputDsl()); } else { Suggestion importance[1-10]: 6 __ Why: In update mode for the forward path, `tc.getExpectedOutputDsl()` passes the old/stale value from the golden file rather than the actual computed output. While this is a real issue, the impact is limited since the forward path test is only meant to update `expectedRelNodePlan`, and the reverse path test handles `expectedOutputDsl` updates separately. The improved code re-reads the file which is redundant since `tc` already has the value.	Low
	Guard against null planType causing NPE `tc.getPlanType()` may be `null` if the `planType` field is missing from the golden file, since `GoldenFileLoader.validate()` does not require `planType`. Calling `QueryPlans.Type.valueOf(null)` will throw a `NullPointerException`. Add a null check or include `planType` in the required field validation. sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/golden/DslGoldenFileTests.java [106-107] -QueryPlans.Type expectedType = QueryPlans.Type.valueOf(tc.getPlanType()); +String planTypeStr = tc.getPlanType(); +if (planTypeStr == null) { + throw new IllegalStateException("Golden file " + goldenFileName + " missing required field: planType"); +} +QueryPlans.Type expectedType = QueryPlans.Type.valueOf(planTypeStr); QueryPlans.QueryPlan plan = plans.get(expectedType).get(0); Suggestion importance[1-10]: 5 __ Why: `planType` is not validated as a required field in `GoldenFileLoader.validate()`, so a null value could cause an NPE when `QueryPlans.Type.valueOf(null)` is called. Adding a null check or including `planType` in validation would prevent this, though in practice all golden files in the PR include `planType`.	Low
	Add missing empty-list guard in reverse and consistency helpers In `runReversePathTest` and `runConsistencyCheck`, `plans.get(expectedType).get(0)` is called without first checking whether the list is empty, which would throw an `IndexOutOfBoundsException`. The emptiness check (`assertFalse`) is only present in `runForwardPathTest`. Add the same guard in the other two helpers. sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/golden/DslGoldenFileTests.java [106-108] QueryPlans.Type expectedType = QueryPlans.Type.valueOf(tc.getPlanType()); List<QueryPlans.QueryPlan> matchingPlans = plans.get(expectedType); assertFalse("No " + expectedType + " plan produced for " + goldenFileName, matchingPlans.isEmpty()); +QueryPlans.QueryPlan plan = matchingPlans.get(0); -RelNode relNode = matchingPlans.get(0).relNode(); - Suggestion importance[1-10]: 5 __ Why: `runReversePathTest` and `runConsistencyCheck` call `.get(0)` on the plans list without checking if it's empty, which could throw `IndexOutOfBoundsException`. The forward path has this guard but the other two helpers don't, creating an inconsistency that could produce confusing test failures.	Low
General	Avoid mutating shared expected output map `stripNonDeterministicFields` mutates the map returned by `tc.getExpectedOutputDsl()`, which is the same map held inside the `GoldenTestCase` object. If the test is ever reused or the golden file is loaded once and run multiple times, the expected map will be permanently modified. Work on a defensive copy instead. sandbox/plugins/dsl-query-executor/src/test/java/org/opensearch/dsl/golden/DslGoldenFileTests.java [206-210] private void stripNonDeterministicFields(Map<String, Object> responseMap) { responseMap.remove("took"); responseMap.remove("timed_out"); responseMap.remove("_shards"); } +// In runReversePathTest, use copies: +// Map<String, Object> expectedOutput = new java.util.LinkedHashMap<>(tc.getExpectedOutputDsl()); +// Map<String, Object> actualOutputCopy = new java.util.LinkedHashMap<>(actualOutput); +// stripNonDeterministicFields(actualOutputCopy); +// stripNonDeterministicFields(expectedOutput); + Suggestion importance[1-10]: 3 __ Why: The `stripNonDeterministicFields` method mutates the map from `tc.getExpectedOutputDsl()` directly. However, since each test loads its own `GoldenTestCase` instance via `GoldenFileLoader.load()`, the mutation doesn't affect other tests. The improved_code snippet is also not a clean replacement of the existing_code, making it harder to apply.	Low

github-actions · 2026-03-31T10:57:07Z

❌ Gradle check result for b859be5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

vinaykpud · 2026-04-02T18:20:37Z

Thanks @nssuresh2007

This PR adds a "golden file" testing layer that reads JSON files and validates:

Forward path: SearchSourceBuilder → SearchSourceConverter.convert() → RelNode.explain() → assert matches expected plan string
Reverse path: mock rows → SearchResponseBuilder.build() → assert JSON matches expected output

I think it's worth considering how this fits with what we already have:

Unit tests per converter/translator : these test individual modules for each DSL construct and RelNode construct.
internalClusterTest integration tests (DslQueryIT, DslAggregationIT, DslSortIT, DslProjectIT) : these test real execution against a real cluster.

The golden file tests chain SearchSourceConverter + SearchResponseBuilder but still use mocked infrastructure (no cluster), so there's a fair bit of overlap with the existing unit tests. One concern is that maintenance burden could grow over time : for example, if a RelNode plan shape changes due to a refactor (even though semantics are correct), all golden files would need updating.

A couple of alternatives worth considering:

We could extend the existing internalClusterTest suite for each new query shape. These already test real execution against a single-node cluster:

// In DslQueryIT.java
public void testRangeQuery() {
    SearchResponse resp = executeQuery(indexName,
        "{\"query\":{\"range\":{\"price\":{\"gte\":100}}}}");
    assertEquals(RestStatus.OK, resp.status());
    // validate actual hits from real execution
}

If we want JSON-driven test cases for convenience, we could build that on top of integration test infrastructure. Two options that come to mind:

Java REST QA test : similar to the PPL query tests in https://github.com/opensearch-project/OpenSearch/tree/feature/datafusion/sandbox/qa/analytics-engine-rest. Tests queries against a
real cluster for a standard set of query shapes.
YAML REST QA test : since the DSL query executor intercepts standard _search via SearchActionFilter, we're essentially testing the search API with different query bodies. YAML REST tests could be a natural fit here with no custom framework needed (same pattern as https://github.com/opensearch-project/OpenSearch/tree/main/rest-api-spec).

What do you think? Happy to discuss which approach makes the most sense here.

nssuresh2007 · 2026-04-03T07:29:30Z

@vinaykpud thanks for the feedback.

This extended unit tests are needed for functional correctness of DSL plugin as a stand alone testing framework without OpenSearch cluster dependency. This requirement is not satisfied by the existing tests:

Unit tests - Very class level focused. Does not help in functional correctness verification testing to validate if the entire query conversion flow works or not.
Integration tests - Runs tests on actual cluster. While Integration Tests help to cover these test scenarios, IT can still fail if the underlying analytics plugin or backend plugin has issues since it focuses on end to end testing. This cannot help in standalone testing of DSL plugin to certify its functional correctness.

Hence this becomes necessary to have this extended UT to validate if the logic that we are currently adding is correct. Otherwise, we would be able to catch it very late in the development lifecycle until all downstream plugins are functional and testable. Also note that this is not a replacement for UT or IT, but it is complementary to the above.

Please let me know if that helps to address your query.

github-actions · 2026-04-05T14:15:58Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit b03d5c1.

Path	Line	Severity	Description
sandbox/plugins/dsl-query-executor/build.gradle	27	high	New dependency added: 'com.google.guava:guava:${versions.guava}' via custom 'calciteCompile' configuration. Per mandatory rule, all dependency additions must be flagged regardless of apparent legitimacy. Maintainers should verify the resolved artifact matches the expected Guava release.
sandbox/plugins/dsl-query-executor/build.gradle	32	high	New test dependency added: 'com.fasterxml.jackson.core:jackson-databind:${versions.jackson_databind}'. Per mandatory rule, all dependency additions must be flagged. Maintainers should verify the resolved artifact is the expected Jackson Databind release.
sandbox/plugins/dsl-query-executor/build.gradle	18	medium	A custom Gradle configuration 'calciteCompile' is introduced specifically to bypass the existing 'compileClasspath { exclude group: "com.google.guava" }' exclusion rule. Adding the Guava dependency via a separate configuration that is then appended to compileClasspath circumvents the project-wide exclusion policy. While the comment provides a rationale, this pattern warrants review to confirm it does not introduce unwanted Guava classes into production compile or runtime scope.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 2 | Medium: 1 | Low: 0

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

bharath-techie · 2026-04-06T09:31:21Z

+        ExecutionResult result = new ExecutionResult(plan, rows);
+
+        // Build SearchResponse
+        var response = SearchResponseBuilder.build(List.of(result), 0L);


Forward path we'll use SearchSourceBuilder to convert to Calcite and handoff - but for reverse path, the plan is to get results as SearchResponseBuilder / SearchResponse to maintain result compatibility ?

For SQL / PPL , we might directly get back the rows, hence the question.

but for reverse path, the plan is to get results as SearchResponseBuilder / SearchResponse to maintain result compatibility ?

Yes, that is the plan

tanik98 · 2026-04-08T05:01:23Z

+            Path filePath = goldenFilePath(goldenFileName);
+            GoldenFileUpdater.update(filePath, actualPlan, tc.getExpectedOutputDsl());
+        } else {
+            assertEquals(


In RedNode plan, should we validate schema as well at each level?

Added check for field names as well to validate the schema: https://github.com/opensearch-project/OpenSearch/pull/21210/changes#diff-4ba3b789a12988b89c5caf2e9a4f2d50adea38008059768576b4303205ef820fR79

tanik98 · 2026-04-08T05:01:55Z

+    private static void validate(GoldenTestCase testCase, Path filePath) {
+        requireNonNull(testCase.getTestName(), "testName", filePath);
+        requireNonNull(testCase.getIndexName(), "indexName", filePath);
+        requireNonNull(testCase.getIndexMapping(), "indexMapping", filePath);
+        requireNonNull(testCase.getInputDsl(), "inputDsl", filePath);
+        requireNonNull(testCase.getExpectedRelNodePlan(), "expectedRelNodePlan", filePath);
+        requireNonNull(testCase.getExecutionFieldNames(), "executionFieldNames", filePath);
+        requireNonNull(testCase.getExecutionRows(), "executionRows", filePath);
+        requireNonNull(testCase.getExpectedOutputDsl(), "expectedOutputDsl", filePath);
+    }


Instead of this, can we use @nonnull in GoldenTestCase class itself?

Currently not able to find any library which uses NonNull annotation. Also currently other places in OpenSearch also use Objects.RequireNonNull only

tanik98 · 2026-04-08T06:14:41Z

+            stripNonDeterministicFields(actualOutput);
+            stripNonDeterministicFields(expectedOutput);
+
+            assertEquals(


Bucket order also might not be deterministic for cases where OpenSearch use topK approach at a shard level. e.g. terms bucket aggregation with size parameter.

Added sorting before comparing the buckets so that they would be deterministic: https://github.com/opensearch-project/OpenSearch/pull/21210/changes#diff-4ba3b789a12988b89c5caf2e9a4f2d50adea38008059768576b4303205ef820fR275

tanik98 · 2026-04-08T06:24:35Z

+     * Removes non-deterministic fields (took, _shards) from a serialized
+     * SearchResponse map so that comparisons are stable.
+     */
+    private void stripNonDeterministicFields(Map<String, Object> responseMap) {


Score also we will have to strip as it wo'nt be supported?

Added check for score as well: https://github.com/opensearch-project/OpenSearch/pull/21210/changes#diff-4ba3b789a12988b89c5caf2e9a4f2d50adea38008059768576b4303205ef820fR225

Changes to test the end to end DSL query conversion without OpenSearch cluster. Currently adds support for match_all and terms with agg for the initial commit.

) Changes to test the end to end DSL query conversion without OpenSearch cluster. Currently adds support for match_all and terms with agg for the initial commit.

nssuresh2007 · 2026-04-12T15:25:28Z

Apologies, since the forked repo branch got deleted by mistake raised new PR after re-creating it and addressed the comments: #21210

) Changes to test the end to end DSL query conversion without OpenSearch cluster. Currently adds support for match_all and terms with agg for the initial commit.

) Changes to test the end to end DSL query conversion without OpenSearch cluster. Currently adds support for match_all and terms with agg for the initial commit. Signed-off-by: Suresh N S <nssuresh@amazon.com>

nssuresh2007 requested a review from a team as a code owner March 31, 2026 10:51

nssuresh2007 changed the title ~~DSL Golden File Testing Framework (to validate query conversion end to end)~~ Extended UT Framework for Functional Correctness of query conversion Apr 3, 2026

bharath-techie reviewed Apr 6, 2026

View reviewed changes

tanik98 reviewed Apr 8, 2026

View reviewed changes

nssuresh2007 closed this Apr 12, 2026

nssuresh2007 force-pushed the feature/datafusion branch from 6da2f20 to 03df0ce Compare April 12, 2026 14:31

Extended UT Framework for E2E Query conversion

b03d5c1

Changes to test the end to end DSL query conversion without OpenSearch cluster. Currently adds support for match_all and terms with agg for the initial commit.

nssuresh2007 reopened this Apr 12, 2026

nssuresh2007 deleted the branch opensearch-project:feature/datafusion April 12, 2026 14:48

nssuresh2007 closed this Apr 12, 2026

nssuresh2007 deleted the feature/datafusion branch April 12, 2026 14:48

nssuresh2007 mentioned this pull request Apr 12, 2026

Extended UT Framework for E2E Query conversion #21210

Open

3 tasks

nssuresh2007 mentioned this pull request Apr 14, 2026

Extended UT Framework for E2E Query conversion #21227

Open

3 tasks

Conversation

nssuresh2007 commented Mar 31, 2026

Description

Requirements

Related Issues

Check List

Uh oh!

github-actions Bot commented Mar 31, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions Bot commented Mar 31, 2026

PR Code Suggestions ✨

Uh oh!

github-actions Bot commented Mar 31, 2026

Uh oh!

vinaykpud commented Apr 2, 2026

Uh oh!

nssuresh2007 commented Apr 3, 2026

Uh oh!

github-actions Bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Analyzer ❗

Uh oh!

bharath-techie Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

nssuresh2007 Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

tanik98 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

nssuresh2007 Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tanik98 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

nssuresh2007 Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

tanik98 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

nssuresh2007 Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tanik98 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

nssuresh2007 Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nssuresh2007 commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented Apr 5, 2026 •

edited

Loading

nssuresh2007 Apr 12, 2026 •

edited

Loading

nssuresh2007 Apr 12, 2026 •

edited

Loading

nssuresh2007 Apr 12, 2026 •

edited

Loading