Default plugins.calcite.enabled=true on the unified query path (opensearch-project#5413)

ahkcs · web-flow · commit 59c728ba78a3 · 2026-05-06T16:33:17.000-07:00
* Default plugins.calcite.enabled=true on the unified query path The unified query handler (`RestUnifiedQueryAction` → `TransportPPLQueryAction` → analytics-engine) builds its `UnifiedQueryContext` with an empty `Settings` map; nothing wires the cluster setting through. PPL parsing is delegated to the same `AstBuilder` the v2 path uses, and that builder gates `table` (and other Calcite-only commands) on `Settings.Key.CALCITE_ENGINE_ENABLED`. With no setting propagated, the gate sees `null`, fails the `Boolean.TRUE.equals` check, and throws UnsupportedOperationException: Table command is supported only when plugins.calcite.enabled=true even when the cluster setting is true, blocking every `table` query routed through the analytics path under `tests.analytics.force_routing=true`. The unified path is by definition Calcite-based — every query reaching `UnifiedQueryContext` flows through Calcite's planner. Default `CALCITE_ENGINE_ENABLED=true` in `buildSettings()` when the underlying map doesn't have the key. This unblocks `table` and any other AstBuilder gate that defends the same toggle, without changing v2 behavior (v2 constructs `AstBuilder` with cluster `Settings`, not the unified context). Also refresh the PPL analytics-engine routing dev doc to document the unified-context dependency and switch the per-command verification recipe from the bundle-branch `analyticsCompatibilityTest` task to the standard `integTestRemote -Dtests.analytics.{force_routing,parquet_indices}=true` — the bundle-branch task is purely for the daily coverage-report sweep. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Drop dev docs from this PR — track separately These local routing-and-coverage notes are useful but belong in their own review thread (or stay as untracked working notes), not in the unified-context fix. Keeping them on disk via .gitignore-style untrack so the PR stays focused on the single api/ change. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Address @dai-chen: add CALCITE_ENGINE_ENABLED to default settings map @dai-chen suggested either adding the setting to the default list at UnifiedQueryContext.java:123 or passing it via the .setting() API on the builder, since only the unified PPL path requires it. Going with option 1 (default-list entry) — it composes the same way every other planning-required default does, and avoids forcing every caller (the production REST handler and every IT) to remember a single magic .setting() call. The IT at UnifiedQueryOpenSearchIT.java:51 was the precedent for the .setting() approach but only one IT needs it; the production caller (RestUnifiedQueryAction) wouldn't have a natural place to wire this without either repeating it everywhere or routing through a helper. Removes the conditional in buildSettings().getSettingValue() — the default map now carries the value the same way QUERY_SIZE_LIMIT, PPL_SUBSEARCH_MAXOUT, and PPL_JOIN_SUBSEARCH_MAXOUT do. Signed-off-by: Kai Huang <ahkcs@amazon.com> * CalcitePPLRenameIT: switch to column-name-aware row matcher Previously, {@code verifyStandardDataRows} (and a handful of bespoke {@code verifyDataRows} calls in this file) compared rows positionally — they assumed the engine emits columns in the {@code _source} iteration order the v2 / Lucene path produces. Under the analytics-engine route the parquet-backed reader returns columns in storage order, so the same 4 canonical state_country rows came back as {@code [70,"USA",4,"Jake", "California",2023]} instead of {@code [Jake,USA,California,4,2023,70]} and 12 of 22 tests failed despite the data being identical. Replace with a column-name-keyed expected-row map. The helper reads the actual schema from the response, looks up each canonical value by column name, places it at the corresponding schema position, then defers to {@code verifyDataRows} as before. The contract is identical to the existing {@code verifySchema} matcher — both are set-equality on column names — so the test no longer leaks the engine's emission order into the assertion. Each call site passes the canonical column-name list (with rename substitutions where applicable). Tests that don't rename age keep calling the no-arg form. Both paths now pass: * Analytics-engine route (`tests.analytics.force_routing=true`): 20 / 22 (the remaining 2 are `testRenameInAgg` / `testRenameWithBackticksInAgg`, blocked on the Substrait isthmus AVG-binding follow-up tracked in opensearch-project/OpenSearch#21521). * v2 / Calcite route (default routing): 20 / 22 (same two tests fail with `NoClassDefFoundError: LevenshteinDistance` only when running against the OS-core `:run` cluster — that bundle is missing commons-text. CI's own integ-test cluster bundles commons-text via the SQL plugin's classloader and isn't affected.) Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com>
diff --git a/api/src/main/java/org/opensearch/sql/api/UnifiedQueryContext.java b/api/src/main/java/org/opensearch/sql/api/UnifiedQueryContext.java
@@ -5,6 +5,7 @@
 
 package org.opensearch.sql.api;
 
+import static org.opensearch.sql.common.setting.Settings.Key.CALCITE_ENGINE_ENABLED;
 import static org.opensearch.sql.common.setting.Settings.Key.PPL_JOIN_SUBSEARCH_MAXOUT;
 import static org.opensearch.sql.common.setting.Settings.Key.PPL_SUBSEARCH_MAXOUT;
 import static org.opensearch.sql.common.setting.Settings.Key.QUERY_SIZE_LIMIT;
@@ -115,13 +116,21 @@ public static class Builder {
     /**
      * Setting values with defaults from SysLimit.DEFAULT. Only includes planning-required settings
      * to avoid coupling with OpenSearchSettings.
+     *
+     * <p>{@link Settings.Key#CALCITE_ENGINE_ENABLED} defaults to {@code true} here because the
+     * unified query path is by definition Calcite-based — every query reaching this context flows
+     * through Calcite's planner, never the v2 engine. The PPL {@link
+     * org.opensearch.sql.api.parser.PPLQueryParser} reuses the v2 {@code AstBuilder}, which gates
+     * Calcite-only commands (e.g. {@code visitTableCommand}) on this setting; without the default,
+     * those commands fail at parse time even when the cluster setting is true.
      */
     private final Map<Settings.Key, Object> settings =
         new HashMap<Settings.Key, Object>(
             Map.of(
                 QUERY_SIZE_LIMIT, SysLimit.DEFAULT.querySizeLimit(),
                 PPL_SUBSEARCH_MAXOUT, SysLimit.DEFAULT.subsearchLimit(),
-                PPL_JOIN_SUBSEARCH_MAXOUT, SysLimit.DEFAULT.joinSubsearchLimit()));
+                PPL_JOIN_SUBSEARCH_MAXOUT, SysLimit.DEFAULT.joinSubsearchLimit(),
+                CALCITE_ENGINE_ENABLED, true));
 
     /**
      * Sets the query language frontend to be used.
diff --git a/integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLRenameIT.java b/integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLRenameIT.java
@@ -14,6 +14,10 @@
 import static org.opensearch.sql.util.MatcherUtils.verifySchemaInOrder;
 
 import java.io.IOException;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import org.hamcrest.Matcher;
+import org.json.JSONArray;
 import org.json.JSONObject;
 import org.junit.Test;
 import org.opensearch.sql.ppl.PPLIntegTestCase;
@@ -40,7 +44,7 @@ public void testRename() throws IOException {
         schema("country", "string"),
         schema("year", "int"),
         schema("month", "int"));
-    verifyStandardDataRows(result);
+    verifyStandardDataRows(result, "name", "country", "state", "month", "year", "renamed_age");
   }
 
   @Test
@@ -77,7 +81,7 @@ public void testRenameToMetaField() throws IOException {
         schema("country", "string"),
         schema("year", "int"),
         schema("month", "int"));
-    verifyStandardDataRows(result);
+    verifyStandardDataRows(result, "name", "country", "state", "month", "year", "_ID");
   }
 
   @Test
@@ -156,7 +160,7 @@ public void testRenameWildcardFields() throws IOException {
         schema("country", "string"),
         schema("year", "int"),
         schema("month", "int"));
-    verifyStandardDataRows(result);
+    verifyStandardDataRows(result, "nAME", "country", "state", "month", "year", "age");
   }
 
   @Test
@@ -171,7 +175,7 @@ public void testRenameMultipleWildcardFields() throws IOException {
         schema("couNTry", "string"),
         schema("year", "int"),
         schema("moNTh", "int"));
-    verifyStandardDataRows(result);
+    verifyStandardDataRows(result, "name", "couNTry", "state", "moNTh", "year", "age");
   }
 
   @Test
@@ -186,7 +190,7 @@ public void testRenameWildcardPrefix() throws IOException {
         schema("country", "string"),
         schema("year", "int"),
         schema("month", "int"));
-    verifyStandardDataRows(result);
+    verifyStandardDataRows(result, "new_na", "country", "state", "month", "year", "age");
   }
 
   @Test
@@ -212,7 +216,7 @@ public void testRenameMultipleWildcards() throws IOException {
         schema("country", "string"),
         schema("year", "int"),
         schema("MoNtH", "int"));
-    verifyStandardDataRows(result);
+    verifyStandardDataRows(result, "name", "country", "state", "MoNtH", "year", "age");
   }
 
   @Test
@@ -261,12 +265,14 @@ public void testRenamingToExistingField() throws IOException {
         schema("country", "string"),
         schema("year", "int"),
         schema("month", "int"));
-    verifyDataRows(
+    // After `rename name as age`, the original name column overwrites the original age column;
+    // the (number) age values are gone and only the (string) name values remain under "age".
+    verifyDataRowsByColumn(
         result,
-        rows("Jake", "USA", "California", 4, 2023),
-        rows("Hello", "USA", "New York", 4, 2023),
-        rows("John", "Canada", "Ontario", 4, 2023),
-        rows("Jane", "Canada", "Quebec", 4, 2023));
+        rowOf("age", "Jake", "country", "USA", "state", "California", "month", 4, "year", 2023),
+        rowOf("age", "Hello", "country", "USA", "state", "New York", "month", 4, "year", 2023),
+        rowOf("age", "John", "country", "Canada", "state", "Ontario", "month", 4, "year", 2023),
+        rowOf("age", "Jane", "country", "Canada", "state", "Quebec", "month", 4, "year", 2023));
   }
 
   @Test
@@ -296,12 +302,12 @@ public void testRenamingNonExistentFieldToExistingField() throws IOException {
         schema("country", "string"),
         schema("year", "int"),
         schema("month", "int"));
-    verifyDataRows(
+    verifyDataRowsByColumn(
         result,
-        rows("Jake", "USA", "California", 4, 2023),
-        rows("Hello", "USA", "New York", 4, 2023),
-        rows("John", "Canada", "Ontario", 4, 2023),
-        rows("Jane", "Canada", "Quebec", 4, 2023));
+        rowOf("name", "Jake", "country", "USA", "state", "California", "month", 4, "year", 2023),
+        rowOf("name", "Hello", "country", "USA", "state", "New York", "month", 4, "year", 2023),
+        rowOf("name", "John", "country", "Canada", "state", "Ontario", "month", 4, "year", 2023),
+        rowOf("name", "Jane", "country", "Canada", "state", "Quebec", "month", 4, "year", 2023));
   }
 
   @Test
@@ -345,7 +351,7 @@ public void testMultipleRenameWithoutComma() throws IOException {
         schema("location", "string"),
         schema("year", "int"),
         schema("month", "int"));
-    verifyStandardDataRows(result);
+    verifyStandardDataRows(result, "user_name", "location", "state", "month", "year", "user_age");
   }
 
   @Test
@@ -363,15 +369,103 @@ public void testRenameMixedCommaAndSpace() throws IOException {
         schema("location", "string"),
         schema("year", "int"),
         schema("month", "int"));
-    verifyStandardDataRows(result);
+    verifyStandardDataRows(result, "user_name", "location", "state", "month", "year", "user_age");
+  }
+
+  /**
+   * Build a {@code column -> value} map from interleaved varargs ({@code key1, val1, key2, val2,
+   * ...}). Preserves insertion order so the expected-row mapping reads naturally at the call site.
+   */
+  private static Map<String, Object> rowOf(Object... pairs) {
+    if (pairs.length % 2 != 0) {
+      throw new IllegalArgumentException("rowOf expects an even number of args (key, value, ...)");
+    }
+    Map<String, Object> row = new LinkedHashMap<>();
+    for (int i = 0; i < pairs.length; i += 2) {
+      row.put((String) pairs[i], pairs[i + 1]);
+    }
+    return row;
   }
 
   private void verifyStandardDataRows(JSONObject result) {
-    verifyDataRows(
-        result,
-        rows("Jake", "USA", "California", 4, 2023, 70),
-        rows("Hello", "USA", "New York", 4, 2023, 30),
-        rows("John", "Canada", "Ontario", 4, 2023, 25),
-        rows("Jane", "Canada", "Quebec", 4, 2023, 20));
+    verifyStandardDataRows(result, "name", "country", "state", "month", "year", "age");
+  }
+
+  /**
+   * Verify the four canonical state_country rows independently of column order.
+   *
+   * <p>The schema check above ({@code verifySchema}) is set-equality on column names; the data row
+   * check {@code verifyDataRows} is positional. The two paths the analytics-engine route can take
+   * return columns in different orders (parquet preserves storage order, the v2 / Lucene path
+   * preserves {@code _source} iteration order), and either is valid given the contract {@code
+   * verifySchema} declares. To avoid baking either order into the test, this helper takes the
+   * canonical-position column names as varargs and reorders the canonical row values to match
+   * whatever column order the response actually returned.
+   *
+   * @param result the response JSON
+   * @param canonicalColumns the column names of the four canonical rows in {@code (name-or-renamed,
+   *     country-or-renamed, state, month, year, age-or-renamed)} order. Pass the rename target
+   *     where applicable.
+   */
+  private void verifyStandardDataRows(JSONObject result, String... canonicalColumns) {
+    if (canonicalColumns.length != 6) {
+      throw new IllegalArgumentException(
+          "verifyStandardDataRows expects 6 canonical column names; got "
+              + canonicalColumns.length);
+    }
+    Object[][] canonicalValues =
+        new Object[][] {
+          {"Jake", "USA", "California", 4, 2023, 70},
+          {"Hello", "USA", "New York", 4, 2023, 30},
+          {"John", "Canada", "Ontario", 4, 2023, 25},
+          {"Jane", "Canada", "Quebec", 4, 2023, 20}
+        };
+    Map<String, Object>[] expectedRows = new LinkedHashMap[canonicalValues.length];
+    for (int i = 0; i < canonicalValues.length; i++) {
+      Map<String, Object> row = new LinkedHashMap<>();
+      for (int c = 0; c < canonicalColumns.length; c++) {
+        row.put(canonicalColumns[c], canonicalValues[i][c]);
+      }
+      expectedRows[i] = row;
+    }
+    verifyDataRowsByColumn(result, expectedRows);
+  }
+
+  /**
+   * Match expected rows against the response by column name, ignoring the response's column
+   * emission order. For each expected row (a {@code column-name -> value} map), the value at each
+   * schema position is looked up by name. Tests using this helper become engine-order agnostic: a
+   * parquet-backed response and a Lucene-backed response yield the same assertion outcome as long
+   * as the column-name-to-value mapping agrees.
+   */
+  @SafeVarargs
+  @SuppressWarnings("varargs")
+  private final void verifyDataRowsByColumn(
+      JSONObject result, Map<String, Object>... expectedRows) {
+    JSONArray schema = result.getJSONArray("schema");
+    int n = schema.length();
+    String[] columnOrder = new String[n];
+    for (int i = 0; i < n; i++) {
+      columnOrder[i] = schema.getJSONObject(i).getString("name");
+    }
+    @SuppressWarnings({"unchecked", "rawtypes"})
+    Matcher<JSONArray>[] rowMatchers = new Matcher[expectedRows.length];
+    for (int r = 0; r < expectedRows.length; r++) {
+      Object[] reordered = new Object[n];
+      for (int c = 0; c < n; c++) {
+        if (!expectedRows[r].containsKey(columnOrder[c])) {
+          throw new IllegalArgumentException(
+              "Expected row at index "
+                  + r
+                  + " is missing canonical value for response column ["
+                  + columnOrder[c]
+                  + "]; provided keys: "
+                  + expectedRows[r].keySet());
+        }
+        reordered[c] = expectedRows[r].get(columnOrder[c]);
+      }
+      rowMatchers[r] = rows(reordered);
+    }
+    verifyDataRows(result, rowMatchers);
   }
 }