feat(integ-test): add analytics-engine IT compatibility testing scripts

dai-chen · dai-chen · commit 38ea5a1aff2e · 2026-05-19T05:10:21.000Z
Scripts to run SQL V2 + Legacy IT suite against analytics-engine (DataFusion)
path on a local sandbox cluster with parquet-backed indices, and generate a
bucketed compatibility report — without modifying production code.

Includes:
- start-cluster.sh: start sandbox cluster with all 9 required plugins
- run-sql-compat-test.sh: run filtered ITs + generate report
- publish-sql-plugin.sh: build and install SQL plugin to maven local
- generate-report.py: parse JUnit XML into bucketed markdown report
- README.md: full SOP with prerequisites, quick start, and PR testing
diff --git a/analytics-engine-test/.agent-prompt.md b/analytics-engine-test/.agent-prompt.md
@@ -0,0 +1,64 @@
+# Analytics Engine IT — Agent Prompt
+
+## What This Is
+Scripts to run SQL V2 + Legacy IT suite against analytics-engine (DataFusion)
+and generate a compatibility report — without modifying SQL plugin production code.
+
+## Before Starting — Ask the User For:
+
+1. **OS_REPO** — path to their OpenSearch core checkout (must be on `main` branch)
+2. **JAVA25** — path to JDK 25 (default: `/usr/lib/jvm/java-25-amazon-corretto`)
+3. **Whether the native lib is built** — check if `$OS_REPO/sandbox/libs/dataformat-native/rust/target/release/libopensearch_native.so` exists. If not, build it:
+   ```bash
+   cd "$OS_REPO/sandbox/libs/dataformat-native/rust" && cargo build --release
+   ```
+   (~20 min first time)
+4. **Any PR to cherry-pick** — if testing a specific PR's impact, get the PR number
+
+## Execution Order
+
+```bash
+export OS_REPO=<user-provided path>
+export JAVA25=<user-provided path>
+
+# 1. Publish SQL plugin to maven local
+./analytics-engine-test/publish-sql-plugin.sh
+
+# 2. Start cluster (runs in foreground — use a background process or separate terminal)
+./analytics-engine-test/start-cluster.sh
+
+# 3. Wait for cluster: curl -s localhost:9200/_cluster/health
+
+# 4. Run tests + generate report
+./analytics-engine-test/run-sql-compat-test.sh
+
+# Report at: integ-test/build/reports/analytics-engine-compatibility/REPORT.md
+```
+
+## If Testing a PR
+
+```bash
+git fetch upstream pull/<N>/head:pr-<N>
+MERGE_BASE=$(git merge-base pr-<N> upstream/main)
+# Cherry-pick only the PR's own commits (on top of main)
+git log --oneline $MERGE_BASE..pr-<N>  # review
+git cherry-pick $MERGE_BASE..pr-<N>
+
+# Rebuild + republish + restart cluster + rerun tests (steps 1-4 above)
+
+# Revert after:
+git reset --hard HEAD~<number-of-cherry-picked-commits>
+```
+
+## Troubleshooting
+
+| Symptom | Fix |
+|---------|-----|
+| `release version 25 not supported` | Set `JAVA_HOME=$JAVA25` for the cluster build |
+| `multiple committer factories found: []` | Missing `analytics-backend-lucene` plugin |
+| `StreamTransportService` null / Guice errors | `run.gradle` missing `transport.stream.enabled=true` (script auto-patches) |
+| `NoSuchElementException` in `NativeAllocatorConfig` | Rebuild native lib: `cargo build --release` |
+| Index creation 500 errors | Cluster needs all 9 plugins — check `curl localhost:9200/_cat/plugins` |
+
+## Full Details
+See `README.md` in this directory.
diff --git a/analytics-engine-test/README.md b/analytics-engine-test/README.md
@@ -0,0 +1,116 @@
+# Analytics Engine IT Compatibility Testing
+
+Run the full SQL V2 + Legacy IT suite against the analytics-engine (DataFusion) path on a local sandbox cluster and generate a bucketed compatibility report — **without modifying SQL plugin production code**.
+
+## Prerequisites
+
+- **JDK 21** — for building the SQL plugin
+- **JDK 25** — for running the OpenSearch sandbox cluster (set `JAVA25` env var)
+- **Rust toolchain** — for building the native parquet library (one-time)
+- **OpenSearch core checkout** on `main` (set `OS_REPO` env var)
+- **This SQL plugin repo** checked out
+
+## How It Works
+
+1. Every test index is created with parquet-backed settings (`tests.analytics.parquet_indices=true`)
+2. The cluster setting `plugins.calcite.analytics.force_routing=true` routes ALL queries through analytics-engine
+3. The analytics-engine (DataFusion) executes queries instead of the standard Calcite/DSL path
+4. Tests that pass = compatible with analytics-engine; tests that fail = gaps to fill
+
+## Quick Start
+
+```bash
+# Set environment
+export OS_REPO=~/path/to/OpenSearch        # OpenSearch core checkout (main branch)
+export JAVA25=/usr/lib/jvm/java-25-amazon-corretto  # JDK 25 path
+
+# 1. Build native library (one-time, ~20 min)
+cd "$OS_REPO/sandbox/libs/dataformat-native/rust"
+cargo build --release
+
+# 2. Publish SQL plugin to maven local
+./analytics-engine-test/publish-sql-plugin.sh
+
+# 3. Start cluster (in a separate terminal)
+./analytics-engine-test/start-cluster.sh
+
+# 4. Run tests + generate report (in another terminal)
+./analytics-engine-test/run-sql-compat-test.sh
+```
+
+## Scripts
+
+| Script | Purpose |
+|--------|---------|
+| `publish-sql-plugin.sh` | Build SQL plugin zip → install to `~/.m2` |
+| `start-cluster.sh` | Start sandbox cluster with all 9 plugins (foreground) |
+| `run-sql-compat-test.sh` | Run SQL V2 + Legacy ITs → generate report |
+| `generate-report.py` | Parse JUnit XML → bucketed markdown report |
+
+## Cluster Plugins (9 total)
+
+| Plugin | Source | Purpose |
+|--------|--------|---------|
+| `opensearch-job-scheduler` | Maven snapshots | Required by SQL plugin |
+| `arrow-base` | Core `:plugins:` | Arrow memory/classloader |
+| `arrow-flight-rpc` | Core `:plugins:` | Arrow Flight transport |
+| `composite-engine` | Sandbox | Composite index engine |
+| `parquet-data-format` | Sandbox | Parquet data format |
+| `analytics-engine` | Sandbox | Query routing hub |
+| `analytics-backend-datafusion` | Sandbox | DataFusion execution |
+| `analytics-backend-lucene` | Sandbox | Lucene committer factory |
+| `opensearch-sql-plugin` | Maven local (`~/.m2`) | SQL/PPL plugin under test |
+
+## Key Configuration
+
+| Setting | Where | Purpose |
+|---------|-------|---------|
+| `tests.analytics.parquet_indices=true` | Gradle sys prop | Inject parquet settings into all test indices |
+| `plugins.calcite.analytics.force_routing=true` | Cluster setting | Route ALL queries through analytics-engine |
+| `opensearch.experimental.feature.transport.stream.enabled=true` | JVM flag (run.gradle patch) | Enable StreamTransportService for analytics-engine |
+| `opensearch.experimental.feature.pluggable.dataformat.enabled=true` | JVM flag (auto-set) | Enable parquet data format |
+
+## Testing a PR
+
+To test a PR's impact on compatibility:
+
+```bash
+# Cherry-pick PR commits
+git fetch upstream pull/<PR_NUMBER>/head:pr-<PR_NUMBER>
+git cherry-pick <commits>
+
+# Rebuild + republish
+./analytics-engine-test/publish-sql-plugin.sh
+
+# Restart cluster (Ctrl+C the old one first)
+./analytics-engine-test/start-cluster.sh
+
+# Rerun tests
+./analytics-engine-test/run-sql-compat-test.sh
+```
+
+## Report Location
+
+After running: `integ-test/build/reports/analytics-engine-compatibility/REPORT.md`
+
+## Cleanup
+
+```bash
+# Revert run.gradle patch in OpenSearch repo
+cd "$OS_REPO" && git checkout gradle/run.gradle
+
+# Remove cherry-picked commits (if any)
+git reset --hard HEAD~N
+
+# Delete PR branch
+git branch -D pr-<NUMBER>
+```
+
+## Known Failure Patterns
+
+| Pattern | Meaning | Count (baseline) |
+|---------|---------|-----------------|
+| "Other Error" (500) | Analytics-engine doesn't support the query pattern | ~421 |
+| "Result Mismatch" | Query runs but returns different results | ~5 |
+| "Bad Request" (400) | SQL syntax/feature not supported in unified path | ~3 |
+| Skipped (legacy) | Tests skipped due to legacy engine deprecation | ~470 |
diff --git a/analytics-engine-test/generate-report.py b/analytics-engine-test/generate-report.py
@@ -0,0 +1,152 @@
+#!/usr/bin/env python3
+"""Parse JUnit XML results and generate a bucketed analytics-engine compatibility report."""
+import xml.etree.ElementTree as ET
+import os, sys, glob
+from collections import defaultdict
+
+def categorize(msg):
+    if "DataFormatAwareEngine" in msg:
+        return "Direct Shard Op"
+    elif "400 Bad Request" in msg:
+        return "Bad Request"
+    elif "AssertionError" in msg or "expected" in msg.lower():
+        return "Result Mismatch"
+    elif "timeout" in msg.lower() or "timed out" in msg.lower():
+        return "Timeout"
+    elif "NullPointerException" in msg:
+        return "NPE"
+    elif "index_not_found" in msg:
+        return "Index Setup"
+    else:
+        return "Other Error"
+
+def main():
+    if len(sys.argv) < 3:
+        print(f"Usage: {sys.argv[0]} <results-dir> <output.md>")
+        sys.exit(1)
+
+    results_dir, output_path = sys.argv[1], sys.argv[2]
+    classes = defaultdict(lambda: {"passed": 0, "failed": 0, "skipped": 0, "failures": []})
+
+    for xml_file in sorted(glob.glob(os.path.join(results_dir, "*.xml"))):
+        try:
+            tree = ET.parse(xml_file)
+            root = tree.getroot()
+            classname = root.get("name", "")
+            if not (classname.startswith("org.opensearch.sql.sql.") or
+                    classname.startswith("org.opensearch.sql.legacy.")):
+                continue
+            for tc in root.findall(".//testcase"):
+                tc_class = tc.get("classname", classname)
+                if not (tc_class.startswith("org.opensearch.sql.sql.") or
+                        tc_class.startswith("org.opensearch.sql.legacy.")):
+                    continue
+                failure = tc.find("failure")
+                error = tc.find("error")
+                skipped_el = tc.find("skipped")
+                short_class = tc_class.replace("org.opensearch.sql.", "")
+                if skipped_el is not None:
+                    classes[short_class]["skipped"] += 1
+                elif failure is not None or error is not None:
+                    classes[short_class]["failed"] += 1
+                    msg = (failure if failure is not None else error).get("message", "")[:200]
+                    classes[short_class]["failures"].append((tc.get("name"), msg))
+                else:
+                    classes[short_class]["passed"] += 1
+        except Exception:
+            pass
+
+    total_p = sum(c["passed"] for c in classes.values())
+    total_f = sum(c["failed"] for c in classes.values())
+    total_s = sum(c["skipped"] for c in classes.values())
+    total = total_p + total_f + total_s
+
+    if total == 0:
+        print("ERROR: No test results found for sql.sql.* or sql.legacy.*")
+        sys.exit(1)
+
+    fail_cats = defaultdict(int)
+    for c in classes.values():
+        for _, msg in c["failures"]:
+            fail_cats[categorize(msg)] += 1
+
+    sql_v2 = {k: v for k, v in classes.items() if k.startswith("sql.")}
+    legacy = {k: v for k, v in classes.items() if k.startswith("legacy.")}
+
+    def area_totals(d):
+        p = sum(c["passed"] for c in d.values())
+        f = sum(c["failed"] for c in d.values())
+        s = sum(c["skipped"] for c in d.values())
+        return p, f, s
+
+    sql_p, sql_f, sql_s = area_totals(sql_v2)
+    leg_p, leg_f, leg_s = area_totals(legacy)
+
+    lines = []
+    w = lines.append
+
+    w("# SQL V2 + Legacy IT — Analytics Engine Compatibility Report\n")
+    w(f"**Total Tests:** {total} | ✅ Passed: {total_p} ({total_p/total*100:.1f}%) | "
+      f"❌ Failed: {total_f} ({total_f/total*100:.1f}%) | ⏭️ Skipped: {total_s} ({total_s/total*100:.1f}%)\n")
+    w("")
+    w("| Area | Passed | Failed | Skipped | Pass Rate |")
+    w("|------|-------:|-------:|--------:|----------:|")
+    if sql_p + sql_f > 0:
+        w(f"| SQL V2 (`sql.sql.*`) | {sql_p} | {sql_f} | {sql_s} | {sql_p/(sql_p+sql_f)*100:.1f}% |")
+    if leg_p + leg_f > 0:
+        w(f"| Legacy (`sql.legacy.*`) | {leg_p} | {leg_f} | {leg_s} | {leg_p/(leg_p+leg_f)*100:.1f}% |")
+    w("")
+
+    w("## Failure Categories\n")
+    w("| Category | Count | % |")
+    w("|----------|------:|--:|")
+    for cat, cnt in sorted(fail_cats.items(), key=lambda x: -x[1]):
+        w(f"| {cat} | {cnt} | {cnt/total_f*100:.1f}% |")
+    w("")
+
+    w("## ✅ Fully Passing Test Classes\n")
+    fully_passing = sorted([(k, v) for k, v in classes.items() if v["failed"] == 0 and v["passed"] > 0],
+                           key=lambda x: -x[1]["passed"])
+    for name, v in fully_passing:
+        w(f"- **{name}** ({v['passed']} tests)")
+    w("")
+
+    w("## 🟡 Partially Passing (>50%)\n")
+    w("| Class | Passed | Failed | Rate |")
+    w("|-------|-------:|-------:|-----:|")
+    partial = sorted([(k, v) for k, v in classes.items()
+                      if v["failed"] > 0 and v["passed"] > 0 and v["passed"]/(v["passed"]+v["failed"]) > 0.5],
+                     key=lambda x: -x[1]["passed"]/(x[1]["passed"]+x[1]["failed"]))
+    for name, v in partial:
+        rate = v["passed"]/(v["passed"]+v["failed"])*100
+        w(f"| {name} | {v['passed']} | {v['failed']} | {rate:.0f}% |")
+    w("")
+
+    w("## ❌ Failing Test Classes\n")
+    w("| Class | Passed | Failed | Skipped | Top Failure |")
+    w("|-------|-------:|-------:|--------:|-------------|")
+    failing = sorted([(k, v) for k, v in classes.items()
+                      if v["failed"] > 0 and (v["passed"] == 0 or v["passed"]/(v["passed"]+v["failed"]) <= 0.5)],
+                     key=lambda x: -x[1]["failed"])
+    for name, v in failing:
+        top_fail = categorize(v["failures"][0][1]) if v["failures"] else "?"
+        w(f"| {name} | {v['passed']} | {v['failed']} | {v['skipped']} | {top_fail} |")
+    w("")
+
+    w("## 🔍 Result Mismatch Details\n")
+    for name, v in sorted(classes.items()):
+        mismatches = [(t, m) for t, m in v["failures"] if categorize(m) == "Result Mismatch"]
+        if mismatches:
+            w(f"### {name}")
+            for test, msg in mismatches[:5]:
+                w(f"- `{test}`: {msg[:150]}")
+            w("")
+
+    os.makedirs(os.path.dirname(output_path), exist_ok=True)
+    with open(output_path, "w") as f:
+        f.write("\n".join(lines))
+    print(f"Report written to {output_path}")
+    print(f"  Total: {total} | Pass: {total_p} ({total_p/total*100:.1f}%) | Fail: {total_f} | Skip: {total_s}")
+
+if __name__ == "__main__":
+    main()
diff --git a/analytics-engine-test/publish-sql-plugin.sh b/analytics-engine-test/publish-sql-plugin.sh
@@ -0,0 +1,43 @@
+#!/bin/bash
+# Builds the SQL plugin zip and installs it to ~/.m2 for OpenSearch run.gradle resolution.
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+SQL_REPO="${SQL_REPO:-$(cd "$SCRIPT_DIR/.." && pwd)}"
+VERSION="${SQL_VERSION:-3.7.0.0-SNAPSHOT}"
+ARTIFACT="opensearch-sql-plugin"
+GROUP_PATH="org/opensearch/plugin"
+M2_DIR="$HOME/.m2/repository/$GROUP_PATH/$ARTIFACT/$VERSION"
+
+echo "=== Building SQL plugin ==="
+cd "$SQL_REPO"
+./gradlew :plugin:bundlePlugin
+
+# The zip is named opensearch-sql-<version>.zip (not opensearch-sql-plugin-)
+ZIP="plugin/build/distributions/opensearch-sql-$VERSION.zip"
+if [[ ! -f "$ZIP" ]]; then
+  echo "ERROR: Expected zip not found at $ZIP" >&2
+  echo "Available:" >&2
+  ls plugin/build/distributions/ >&2
+  exit 1
+fi
+
+echo "=== Installing to Maven local ==="
+mkdir -p "$M2_DIR"
+cp "$ZIP" "$M2_DIR/$ARTIFACT-$VERSION.zip"
+
+cat > "$M2_DIR/$ARTIFACT-$VERSION.pom" << EOF
+<?xml version="1.0" encoding="UTF-8"?>
+<project>
+  <modelVersion>4.0.0</modelVersion>
+  <groupId>org.opensearch.plugin</groupId>
+  <artifactId>$ARTIFACT</artifactId>
+  <version>$VERSION</version>
+  <packaging>zip</packaging>
+</project>
+EOF
+
+rm -f "$M2_DIR/$ARTIFACT-$VERSION.module"
+rm -f "$M2_DIR/_remote.repositories"
+
+echo "✅ Published $ARTIFACT-$VERSION.zip to $M2_DIR"
diff --git a/analytics-engine-test/run-sql-compat-test.sh b/analytics-engine-test/run-sql-compat-test.sh
diff --git a/analytics-engine-test/start-cluster.sh b/analytics-engine-test/start-cluster.sh