Skip to content

Commit 59c728b

Browse files
authored
Default plugins.calcite.enabled=true on the unified query path (opensearch-project#5413)
* Default plugins.calcite.enabled=true on the unified query path The unified query handler (`RestUnifiedQueryAction` → `TransportPPLQueryAction` → analytics-engine) builds its `UnifiedQueryContext` with an empty `Settings` map; nothing wires the cluster setting through. PPL parsing is delegated to the same `AstBuilder` the v2 path uses, and that builder gates `table` (and other Calcite-only commands) on `Settings.Key.CALCITE_ENGINE_ENABLED`. With no setting propagated, the gate sees `null`, fails the `Boolean.TRUE.equals` check, and throws UnsupportedOperationException: Table command is supported only when plugins.calcite.enabled=true even when the cluster setting is true, blocking every `table` query routed through the analytics path under `tests.analytics.force_routing=true`. The unified path is by definition Calcite-based — every query reaching `UnifiedQueryContext` flows through Calcite's planner. Default `CALCITE_ENGINE_ENABLED=true` in `buildSettings()` when the underlying map doesn't have the key. This unblocks `table` and any other AstBuilder gate that defends the same toggle, without changing v2 behavior (v2 constructs `AstBuilder` with cluster `Settings`, not the unified context). Also refresh the PPL analytics-engine routing dev doc to document the unified-context dependency and switch the per-command verification recipe from the bundle-branch `analyticsCompatibilityTest` task to the standard `integTestRemote -Dtests.analytics.{force_routing,parquet_indices}=true` — the bundle-branch task is purely for the daily coverage-report sweep. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Drop dev docs from this PR — track separately These local routing-and-coverage notes are useful but belong in their own review thread (or stay as untracked working notes), not in the unified-context fix. Keeping them on disk via .gitignore-style untrack so the PR stays focused on the single api/ change. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Address @dai-chen: add CALCITE_ENGINE_ENABLED to default settings map @dai-chen suggested either adding the setting to the default list at UnifiedQueryContext.java:123 or passing it via the .setting() API on the builder, since only the unified PPL path requires it. Going with option 1 (default-list entry) — it composes the same way every other planning-required default does, and avoids forcing every caller (the production REST handler and every IT) to remember a single magic .setting() call. The IT at UnifiedQueryOpenSearchIT.java:51 was the precedent for the .setting() approach but only one IT needs it; the production caller (RestUnifiedQueryAction) wouldn't have a natural place to wire this without either repeating it everywhere or routing through a helper. Removes the conditional in buildSettings().getSettingValue() — the default map now carries the value the same way QUERY_SIZE_LIMIT, PPL_SUBSEARCH_MAXOUT, and PPL_JOIN_SUBSEARCH_MAXOUT do. Signed-off-by: Kai Huang <ahkcs@amazon.com> * CalcitePPLRenameIT: switch to column-name-aware row matcher Previously, {@code verifyStandardDataRows} (and a handful of bespoke {@code verifyDataRows} calls in this file) compared rows positionally — they assumed the engine emits columns in the {@code _source} iteration order the v2 / Lucene path produces. Under the analytics-engine route the parquet-backed reader returns columns in storage order, so the same 4 canonical state_country rows came back as {@code [70,"USA",4,"Jake", "California",2023]} instead of {@code [Jake,USA,California,4,2023,70]} and 12 of 22 tests failed despite the data being identical. Replace with a column-name-keyed expected-row map. The helper reads the actual schema from the response, looks up each canonical value by column name, places it at the corresponding schema position, then defers to {@code verifyDataRows} as before. The contract is identical to the existing {@code verifySchema} matcher — both are set-equality on column names — so the test no longer leaks the engine's emission order into the assertion. Each call site passes the canonical column-name list (with rename substitutions where applicable). Tests that don't rename age keep calling the no-arg form. Both paths now pass: * Analytics-engine route (`tests.analytics.force_routing=true`): 20 / 22 (the remaining 2 are `testRenameInAgg` / `testRenameWithBackticksInAgg`, blocked on the Substrait isthmus AVG-binding follow-up tracked in opensearch-project/OpenSearch#21521). * v2 / Calcite route (default routing): 20 / 22 (same two tests fail with `NoClassDefFoundError: LevenshteinDistance` only when running against the OS-core `:run` cluster — that bundle is missing commons-text. CI's own integ-test cluster bundles commons-text via the SQL plugin's classloader and isn't affected.) Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com>
1 parent 6ffba26 commit 59c728b

2 files changed

Lines changed: 128 additions & 25 deletions

File tree

api/src/main/java/org/opensearch/sql/api/UnifiedQueryContext.java

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55

66
package org.opensearch.sql.api;
77

8+
import static org.opensearch.sql.common.setting.Settings.Key.CALCITE_ENGINE_ENABLED;
89
import static org.opensearch.sql.common.setting.Settings.Key.PPL_JOIN_SUBSEARCH_MAXOUT;
910
import static org.opensearch.sql.common.setting.Settings.Key.PPL_SUBSEARCH_MAXOUT;
1011
import static org.opensearch.sql.common.setting.Settings.Key.QUERY_SIZE_LIMIT;
@@ -115,13 +116,21 @@ public static class Builder {
115116
/**
116117
* Setting values with defaults from SysLimit.DEFAULT. Only includes planning-required settings
117118
* to avoid coupling with OpenSearchSettings.
119+
*
120+
* <p>{@link Settings.Key#CALCITE_ENGINE_ENABLED} defaults to {@code true} here because the
121+
* unified query path is by definition Calcite-based — every query reaching this context flows
122+
* through Calcite's planner, never the v2 engine. The PPL {@link
123+
* org.opensearch.sql.api.parser.PPLQueryParser} reuses the v2 {@code AstBuilder}, which gates
124+
* Calcite-only commands (e.g. {@code visitTableCommand}) on this setting; without the default,
125+
* those commands fail at parse time even when the cluster setting is true.
118126
*/
119127
private final Map<Settings.Key, Object> settings =
120128
new HashMap<Settings.Key, Object>(
121129
Map.of(
122130
QUERY_SIZE_LIMIT, SysLimit.DEFAULT.querySizeLimit(),
123131
PPL_SUBSEARCH_MAXOUT, SysLimit.DEFAULT.subsearchLimit(),
124-
PPL_JOIN_SUBSEARCH_MAXOUT, SysLimit.DEFAULT.joinSubsearchLimit()));
132+
PPL_JOIN_SUBSEARCH_MAXOUT, SysLimit.DEFAULT.joinSubsearchLimit(),
133+
CALCITE_ENGINE_ENABLED, true));
125134

126135
/**
127136
* Sets the query language frontend to be used.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLRenameIT.java

Lines changed: 118 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,10 @@
1414
import static org.opensearch.sql.util.MatcherUtils.verifySchemaInOrder;
1515

1616
import java.io.IOException;
17+
import java.util.LinkedHashMap;
18+
import java.util.Map;
19+
import org.hamcrest.Matcher;
20+
import org.json.JSONArray;
1721
import org.json.JSONObject;
1822
import org.junit.Test;
1923
import org.opensearch.sql.ppl.PPLIntegTestCase;
@@ -40,7 +44,7 @@ public void testRename() throws IOException {
4044
schema("country", "string"),
4145
schema("year", "int"),
4246
schema("month", "int"));
43-
verifyStandardDataRows(result);
47+
verifyStandardDataRows(result, "name", "country", "state", "month", "year", "renamed_age");
4448
}
4549

4650
@Test
@@ -77,7 +81,7 @@ public void testRenameToMetaField() throws IOException {
7781
schema("country", "string"),
7882
schema("year", "int"),
7983
schema("month", "int"));
80-
verifyStandardDataRows(result);
84+
verifyStandardDataRows(result, "name", "country", "state", "month", "year", "_ID");
8185
}
8286

8387
@Test
@@ -156,7 +160,7 @@ public void testRenameWildcardFields() throws IOException {
156160
schema("country", "string"),
157161
schema("year", "int"),
158162
schema("month", "int"));
159-
verifyStandardDataRows(result);
163+
verifyStandardDataRows(result, "nAME", "country", "state", "month", "year", "age");
160164
}
161165

162166
@Test
@@ -171,7 +175,7 @@ public void testRenameMultipleWildcardFields() throws IOException {
171175
schema("couNTry", "string"),
172176
schema("year", "int"),
173177
schema("moNTh", "int"));
174-
verifyStandardDataRows(result);
178+
verifyStandardDataRows(result, "name", "couNTry", "state", "moNTh", "year", "age");
175179
}
176180

177181
@Test
@@ -186,7 +190,7 @@ public void testRenameWildcardPrefix() throws IOException {
186190
schema("country", "string"),
187191
schema("year", "int"),
188192
schema("month", "int"));
189-
verifyStandardDataRows(result);
193+
verifyStandardDataRows(result, "new_na", "country", "state", "month", "year", "age");
190194
}
191195

192196
@Test
@@ -212,7 +216,7 @@ public void testRenameMultipleWildcards() throws IOException {
212216
schema("country", "string"),
213217
schema("year", "int"),
214218
schema("MoNtH", "int"));
215-
verifyStandardDataRows(result);
219+
verifyStandardDataRows(result, "name", "country", "state", "MoNtH", "year", "age");
216220
}
217221

218222
@Test
@@ -261,12 +265,14 @@ public void testRenamingToExistingField() throws IOException {
261265
schema("country", "string"),
262266
schema("year", "int"),
263267
schema("month", "int"));
264-
verifyDataRows(
268+
// After `rename name as age`, the original name column overwrites the original age column;
269+
// the (number) age values are gone and only the (string) name values remain under "age".
270+
verifyDataRowsByColumn(
265271
result,
266-
rows("Jake", "USA", "California", 4, 2023),
267-
rows("Hello", "USA", "New York", 4, 2023),
268-
rows("John", "Canada", "Ontario", 4, 2023),
269-
rows("Jane", "Canada", "Quebec", 4, 2023));
272+
rowOf("age", "Jake", "country", "USA", "state", "California", "month", 4, "year", 2023),
273+
rowOf("age", "Hello", "country", "USA", "state", "New York", "month", 4, "year", 2023),
274+
rowOf("age", "John", "country", "Canada", "state", "Ontario", "month", 4, "year", 2023),
275+
rowOf("age", "Jane", "country", "Canada", "state", "Quebec", "month", 4, "year", 2023));
270276
}
271277

272278
@Test
@@ -296,12 +302,12 @@ public void testRenamingNonExistentFieldToExistingField() throws IOException {
296302
schema("country", "string"),
297303
schema("year", "int"),
298304
schema("month", "int"));
299-
verifyDataRows(
305+
verifyDataRowsByColumn(
300306
result,
301-
rows("Jake", "USA", "California", 4, 2023),
302-
rows("Hello", "USA", "New York", 4, 2023),
303-
rows("John", "Canada", "Ontario", 4, 2023),
304-
rows("Jane", "Canada", "Quebec", 4, 2023));
307+
rowOf("name", "Jake", "country", "USA", "state", "California", "month", 4, "year", 2023),
308+
rowOf("name", "Hello", "country", "USA", "state", "New York", "month", 4, "year", 2023),
309+
rowOf("name", "John", "country", "Canada", "state", "Ontario", "month", 4, "year", 2023),
310+
rowOf("name", "Jane", "country", "Canada", "state", "Quebec", "month", 4, "year", 2023));
305311
}
306312

307313
@Test
@@ -345,7 +351,7 @@ public void testMultipleRenameWithoutComma() throws IOException {
345351
schema("location", "string"),
346352
schema("year", "int"),
347353
schema("month", "int"));
348-
verifyStandardDataRows(result);
354+
verifyStandardDataRows(result, "user_name", "location", "state", "month", "year", "user_age");
349355
}
350356

351357
@Test
@@ -363,15 +369,103 @@ public void testRenameMixedCommaAndSpace() throws IOException {
363369
schema("location", "string"),
364370
schema("year", "int"),
365371
schema("month", "int"));
366-
verifyStandardDataRows(result);
372+
verifyStandardDataRows(result, "user_name", "location", "state", "month", "year", "user_age");
373+
}
374+
375+
/**
376+
* Build a {@code column -> value} map from interleaved varargs ({@code key1, val1, key2, val2,
377+
* ...}). Preserves insertion order so the expected-row mapping reads naturally at the call site.
378+
*/
379+
private static Map<String, Object> rowOf(Object... pairs) {
380+
if (pairs.length % 2 != 0) {
381+
throw new IllegalArgumentException("rowOf expects an even number of args (key, value, ...)");
382+
}
383+
Map<String, Object> row = new LinkedHashMap<>();
384+
for (int i = 0; i < pairs.length; i += 2) {
385+
row.put((String) pairs[i], pairs[i + 1]);
386+
}
387+
return row;
367388
}
368389

369390
private void verifyStandardDataRows(JSONObject result) {
370-
verifyDataRows(
371-
result,
372-
rows("Jake", "USA", "California", 4, 2023, 70),
373-
rows("Hello", "USA", "New York", 4, 2023, 30),
374-
rows("John", "Canada", "Ontario", 4, 2023, 25),
375-
rows("Jane", "Canada", "Quebec", 4, 2023, 20));
391+
verifyStandardDataRows(result, "name", "country", "state", "month", "year", "age");
392+
}
393+
394+
/**
395+
* Verify the four canonical state_country rows independently of column order.
396+
*
397+
* <p>The schema check above ({@code verifySchema}) is set-equality on column names; the data row
398+
* check {@code verifyDataRows} is positional. The two paths the analytics-engine route can take
399+
* return columns in different orders (parquet preserves storage order, the v2 / Lucene path
400+
* preserves {@code _source} iteration order), and either is valid given the contract {@code
401+
* verifySchema} declares. To avoid baking either order into the test, this helper takes the
402+
* canonical-position column names as varargs and reorders the canonical row values to match
403+
* whatever column order the response actually returned.
404+
*
405+
* @param result the response JSON
406+
* @param canonicalColumns the column names of the four canonical rows in {@code (name-or-renamed,
407+
* country-or-renamed, state, month, year, age-or-renamed)} order. Pass the rename target
408+
* where applicable.
409+
*/
410+
private void verifyStandardDataRows(JSONObject result, String... canonicalColumns) {
411+
if (canonicalColumns.length != 6) {
412+
throw new IllegalArgumentException(
413+
"verifyStandardDataRows expects 6 canonical column names; got "
414+
+ canonicalColumns.length);
415+
}
416+
Object[][] canonicalValues =
417+
new Object[][] {
418+
{"Jake", "USA", "California", 4, 2023, 70},
419+
{"Hello", "USA", "New York", 4, 2023, 30},
420+
{"John", "Canada", "Ontario", 4, 2023, 25},
421+
{"Jane", "Canada", "Quebec", 4, 2023, 20}
422+
};
423+
Map<String, Object>[] expectedRows = new LinkedHashMap[canonicalValues.length];
424+
for (int i = 0; i < canonicalValues.length; i++) {
425+
Map<String, Object> row = new LinkedHashMap<>();
426+
for (int c = 0; c < canonicalColumns.length; c++) {
427+
row.put(canonicalColumns[c], canonicalValues[i][c]);
428+
}
429+
expectedRows[i] = row;
430+
}
431+
verifyDataRowsByColumn(result, expectedRows);
432+
}
433+
434+
/**
435+
* Match expected rows against the response by column name, ignoring the response's column
436+
* emission order. For each expected row (a {@code column-name -> value} map), the value at each
437+
* schema position is looked up by name. Tests using this helper become engine-order agnostic: a
438+
* parquet-backed response and a Lucene-backed response yield the same assertion outcome as long
439+
* as the column-name-to-value mapping agrees.
440+
*/
441+
@SafeVarargs
442+
@SuppressWarnings("varargs")
443+
private final void verifyDataRowsByColumn(
444+
JSONObject result, Map<String, Object>... expectedRows) {
445+
JSONArray schema = result.getJSONArray("schema");
446+
int n = schema.length();
447+
String[] columnOrder = new String[n];
448+
for (int i = 0; i < n; i++) {
449+
columnOrder[i] = schema.getJSONObject(i).getString("name");
450+
}
451+
@SuppressWarnings({"unchecked", "rawtypes"})
452+
Matcher<JSONArray>[] rowMatchers = new Matcher[expectedRows.length];
453+
for (int r = 0; r < expectedRows.length; r++) {
454+
Object[] reordered = new Object[n];
455+
for (int c = 0; c < n; c++) {
456+
if (!expectedRows[r].containsKey(columnOrder[c])) {
457+
throw new IllegalArgumentException(
458+
"Expected row at index "
459+
+ r
460+
+ " is missing canonical value for response column ["
461+
+ columnOrder[c]
462+
+ "]; provided keys: "
463+
+ expectedRows[r].keySet());
464+
}
465+
reordered[c] = expectedRows[r].get(columnOrder[c]);
466+
}
467+
rowMatchers[r] = rows(reordered);
468+
}
469+
verifyDataRows(result, rowMatchers);
376470
}
377471
}

0 commit comments

Comments
 (0)