Skip to content

Commit 1183e73

Browse files
authored
feat(ppl): wire patterns command for analytics-engine dashboard route (#5467)
* feat(api): add PATTERN_* settings defaults to UnifiedQueryContext PPL `patterns` command's AstBuilder reads cluster settings for method/mode/ max_sample_count/buffer_limit/show_numbered_token defaults when the query omits them. Without these in the analytics-engine path's settings map, the parser reads null, falls into `PatternMethod.valueOf("NULL")`, and every `patterns` query without an explicit `method=` or `mode=` argument fails at parse time with `No enum constant PatternMethod.NULL`. Mirrors the OpenSearchSettings defaults (SIMPLE_PATTERN / LABEL / 10 / 100000 / false). Part of the analytics-engine route support for the `patterns` command. Signed-off-by: Kai Huang <ahkcs@amazon.com> * feat(core): emit 4-arg regexp_replace with 'g' flag for SIMPLE patterns `buildParseRelNode` for `ParseMethod.PATTERNS` lowered through PPL's REPLACE handler, which always emits Calcite's 3-arg `REGEXP_REPLACE_3`. That works on the V2 / Calcite path (Calcite's default is replace-all), but the analytics- engine route converts the call to substrait + DataFusion, and DataFusion's `regexp_replace` defaults to first-match-only without an explicit "g" flag. The dashboard test for `source = bank | patterns email mode=label` returned `<*>@pyrami.com` instead of `<*>@<*>.<*>` because only the first `[a-zA-Z0-9]+` run was replaced. Bypass the REPLACE handler for the PATTERNS branch and emit `REGEXP_REPLACE_PG_4` directly with a constant "g" flag. Same semantics on V2 / Calcite (Calcite's REGEXP_REPLACE_PG_4 with "g" = replace-all); fixes the analytics-engine path. CalcitePPLPatternsTest plan-string expectations updated to match the 4-arg form. 17/17 unit tests pass. IT result on analytics-engine route: testSimplePatternLabelMode_NotShowNumberedToken now passes. Signed-off-by: Kai Huang <ahkcs@amazon.com> * test(integ-test): add CalcitePPLDashboardPatternsIT pinning BRAIN-label dashboard query OpenSearch Dashboards renders BRAIN-pattern panels with the shape: patterns ... method=BRAIN mode=label | stats count() as pattern_count, take(message, 1) as sample_logs by patterns_field | sort -pattern_count | fields patterns_field, pattern_count, sample_logs This integration test pins that shape on the analytics-engine route so regressions surface immediately. Schema-only assertions because BRAIN's clustering output is dataset-version-sensitive — the contract we care about is "the query plans, executes, and returns three columns in the right order". Currently red end-to-end pending the BRAIN label window-UDF type-cascade fix (see the OpenSearch-side WIP commit "BRAIN window UDF + dashboard query path scaffolding" — the {@code PplWindowCallRewriter} stub documents the remaining gap). Signed-off-by: Kai Huang <ahkcs@amazon.com> * style: apply spotless formatting Spotless drift from cherry-picking the analytics-engine patterns work across upstream's recent formatting touch-ups. No behavior change. Signed-off-by: Kai Huang <huangkaics@gmail.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * test(integ-test): update SIMPLE-patterns explain YAML for 4-arg regexp_replace CalciteExplainIT's `testPatternsSimplePatternMethodWith{out,AggPushDown}Explain` expected the old 3-arg `REGEXP_REPLACE(...)` form, but after the `feat(core)` commit emits 4-arg `REGEXP_REPLACE(..., 'g':VARCHAR)` the plan output now includes the extra operand both in the logical line and in the base64-encoded compounded script of the physical/pushdown plan. Regenerate both YAML expectations against the live planner. Signed-off-by: Kai Huang <ahkcs@amazon.com> * fix(opensearch): collapse 4-arg REGEXP_REPLACE_PG_4 'g' to 3-arg at script pushdown The `feat(core)` commit on this branch lowered PPL `patterns` to a 4-arg `REGEXP_REPLACE_PG_4(field, pattern, replacement, 'g')` so DataFusion (which defaults to first-match-only) does global replacement on the analytics-engine route. Calcite's enumerable runtime — which the V2 / Calcite-pushdown path uses to compile the serialized RexCall into Janino bytecode — has no matching `SqlFunctions.regexpReplace(String, String, String, String)` impl (only `(String, String, String, int[, ...])` variants where the 4th arg is start position, not a flags string). Janino codegen failed with `No applicable constructor/method found` for the 4-arg-with-flags call shape, breaking the patterns.md doctest (`source=apache | patterns message method=simple_pattern mode=aggregation`). Two complementary fixes: 1. `RexStandardizer.visitCall`: before serializing for pushdown, collapse `REGEXP_REPLACE_PG_4(field, pattern, replacement, 'g')` to the 3-arg `REGEXP_REPLACE_3` form. Safe because Calcite's 3-arg variant is already replace-all (same semantics as PG_4 with `g`). Only fires when the flags literal is exactly `"g"` so any future `i`/`m`/etc. use cases pass through untouched. 2. `ExtendedRelJson.toOp`: pass operand count when looking up an operator on the deserialization side so multi-arity SQL names (REGEXP_REPLACE_3 vs REGEXP_REPLACE_PG_4 vs REGEXP_REPLACE_5 all share `name="REGEXP_REPLACE"`) resolve to the right overload. Defensive — the standardizer fix above is what actually unblocks the doctest, but the resolver was picking by name alone and would have surfaced the same bug for any other overloaded builtin. Verified locally: - doctest queries (`patterns ... method=simple_pattern mode=aggregation [...]`) now return fully-tokenized output; - `CalcitePPLDashboardPatternsIT` still 1/1 PASS; - `CalcitePPLPatternsIT` still 10/15 with the same five known-pending failures (LogicalCorrelate + `_ShowNumberedToken` BRAIN cases). Signed-off-by: Kai Huang <ahkcs@amazon.com> * fix(opensearch): revert arity-aware toOp; restore spath/JSON_EXTRACT doctest The arity filter added to ExtendedRelJson.toOp in the previous commit broke SAFE_CAST → JSON_EXTRACT deserialization (used by `spath` lowering): the PPL JSON_EXTRACT UDF, registered as an anonymous UserDefinedFunctionBuilder subclass, doesn't expose a meaningful getOperandCountRange(), so my filter fell through to the firstKindMatch path and skipped the AvaticaUtils.instantiatePlugin "class" path that previously resolved the UDF. spath.md doctest started returning RuntimeException on `source=structured | spath input=doc_n n | eval n=cast(n as int) | stats sum(n)`. The RexStandardizer collapse (4-arg `REGEXP_REPLACE_PG_4(..., 'g')` → 3-arg `REGEXP_REPLACE_3`) already fixes the patterns.md doctest at the source side — by the time pushdown serialization runs, no 4-arg call exists for toOp to disambiguate. The arity filter was defensive only and no longer carries its weight; revert toOp to the original first-kind-match behavior, plus a spotless re-flow that came in with the same change. Verified locally on a fresh cluster: - spath.md doctest query → returns sum(n)=6 (was 500). - patterns.md doctest query → returns fully-tokenized aggregation rows. - CalcitePPLDashboardPatternsIT → 1/1 PASS. - CalcitePPLPatternsIT → 10/15 PASS (same baseline; same five known-pending BRAIN failures tracked separately). Signed-off-by: Kai Huang <ahkcs@amazon.com> * style: trim verbose comments per review Per @penghuo: drop the verbose multi-line explanatory comments and tighten the class/method javadoc on the new dashboard IT. Signed-off-by: Kai Huang <ahkcs@amazon.com> * test(integ-test): add verifyDataRows to dashboard patterns IT Per @dai-chen: schema-only verification doesn't catch "query succeeds but returns 0/wrong rows". Pin the 4 BRAIN clusters with their exact patterns, counts, and sample logs against the HDFS_LOGS fixture. Signed-off-by: Kai Huang <ahkcs@amazon.com> * refactor(core): fuse PATTERNS if-else in buildParseRelNode Per @dai-chen: the two consecutive `if (PATTERNS)` branches in buildParseRelNode share a condition; merge into a single if/else with each branch fully co-located. Pure refactor — CalcitePPLPatternsTest (logical-plan unit test) passes. Signed-off-by: Kai Huang <ahkcs@amazon.com> * test(integ-test): include CalcitePPLDashboardPatternsIT in CalciteNoPushdownIT Per CLAUDE.md guidance, new Calcite IT classes should be added to the no-pushdown suite. Verified locally that the dashboard query also passes with pushdown disabled (Dashboard 1/1, Patterns 10/15 — same baseline). Signed-off-by: Kai Huang <ahkcs@amazon.com> * test(integ-test): regenerate agg-push explain YAML for 3-arg REGEXP_REPLACE The previous YAML capture pre-dated the RexStandardizer 4-arg → 3-arg collapse landing. With the collapse, the pushed-down compounded script serializes the 3-arg form (SOURCES has 7 entries, no trailing 'g'). Signed-off-by: Kai Huang <ahkcs@amazon.com> * revert(core): drop SQL-side 'g' flag for patterns; move to DataFusion adapter Per @penghuo's review: DataFusion-specific concerns shouldn't live in SQL core. The 'g' flag is needed only because DataFusion's regexp_replace defaults to first-match-only — Calcite's 3-arg form is already replace-all on both pushdown and no-pushdown paths. Restores SQL core, RexStandardizer, the patterns unit test, and the SIMPLE- patterns explain YAMLs to their upstream/main shape. The 'g' flag is appended in opensearch-project/OpenSearch#21797's RegexpReplaceAdapter when converting 3-arg REGEXP_REPLACE to DataFusion. Same end-user behavior, smaller SQL diff, and the Calcite no-pushdown path no longer diverges from the pushdown YAML. Signed-off-by: Kai Huang <ahkcs@amazon.com> * test(api): pin UnifiedQueryContext PATTERN_* defaults via planner test Per @dai-chen: verify the RelNode produced when `patterns <field>` is run without explicit method=/mode= args — exercises that the PATTERN_METHOD and PATTERN_MODE defaults flow through to AstBuilder.visitPatternsCommand and produce a valid SIMPLE/LABEL lowering with a `patterns_field` projection. Signed-off-by: Kai Huang <ahkcs@amazon.com> * style: spotlessApply Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com> Signed-off-by: Kai Huang <huangkaics@gmail.com>
1 parent 37090ba commit 1183e73

4 files changed

Lines changed: 106 additions & 6 deletions

File tree

api/src/main/java/org/opensearch/sql/api/UnifiedQueryContext.java

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,11 @@
66
package org.opensearch.sql.api;
77

88
import static org.opensearch.sql.common.setting.Settings.Key.CALCITE_ENGINE_ENABLED;
9+
import static org.opensearch.sql.common.setting.Settings.Key.PATTERN_BUFFER_LIMIT;
10+
import static org.opensearch.sql.common.setting.Settings.Key.PATTERN_MAX_SAMPLE_COUNT;
11+
import static org.opensearch.sql.common.setting.Settings.Key.PATTERN_METHOD;
12+
import static org.opensearch.sql.common.setting.Settings.Key.PATTERN_MODE;
13+
import static org.opensearch.sql.common.setting.Settings.Key.PATTERN_SHOW_NUMBERED_TOKEN;
914
import static org.opensearch.sql.common.setting.Settings.Key.PPL_JOIN_SUBSEARCH_MAXOUT;
1015
import static org.opensearch.sql.common.setting.Settings.Key.PPL_REX_MAX_MATCH_LIMIT;
1116
import static org.opensearch.sql.common.setting.Settings.Key.PPL_SUBSEARCH_MAXOUT;
@@ -145,12 +150,18 @@ public static class Builder {
145150
*/
146151
private final Map<Settings.Key, Object> settings =
147152
new HashMap<Settings.Key, Object>(
148-
Map.of(
149-
QUERY_SIZE_LIMIT, SysLimit.DEFAULT.querySizeLimit(),
150-
PPL_SUBSEARCH_MAXOUT, SysLimit.UNLIMITED_SUBSEARCH.subsearchLimit(),
151-
PPL_JOIN_SUBSEARCH_MAXOUT, SysLimit.UNLIMITED_SUBSEARCH.joinSubsearchLimit(),
152-
CALCITE_ENGINE_ENABLED, true,
153-
PPL_REX_MAX_MATCH_LIMIT, 10));
153+
Map.ofEntries(
154+
Map.entry(QUERY_SIZE_LIMIT, SysLimit.DEFAULT.querySizeLimit()),
155+
Map.entry(PPL_SUBSEARCH_MAXOUT, SysLimit.UNLIMITED_SUBSEARCH.subsearchLimit()),
156+
Map.entry(
157+
PPL_JOIN_SUBSEARCH_MAXOUT, SysLimit.UNLIMITED_SUBSEARCH.joinSubsearchLimit()),
158+
Map.entry(CALCITE_ENGINE_ENABLED, true),
159+
Map.entry(PPL_REX_MAX_MATCH_LIMIT, 10),
160+
Map.entry(PATTERN_METHOD, "SIMPLE_PATTERN"),
161+
Map.entry(PATTERN_MODE, "LABEL"),
162+
Map.entry(PATTERN_MAX_SAMPLE_COUNT, 10),
163+
Map.entry(PATTERN_BUFFER_LIMIT, 100000),
164+
Map.entry(PATTERN_SHOW_NUMBERED_TOKEN, false)));
154165

155166
/**
156167
* Sets the query language frontend to be used.

api/src/test/java/org/opensearch/sql/api/UnifiedQueryPlannerTest.java

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,4 +144,18 @@ public void assertionErrorIsWrappedAsSemanticCheckException() {
144144
.assertErrorMessageEquals("Failed to plan query: invalid plan structure")
145145
.assertCauseType(AssertionError.class);
146146
}
147+
148+
/**
149+
* Without the {@code PATTERN_*} defaults in {@link UnifiedQueryContext}, a bare {@code patterns
150+
* <field>} (no explicit {@code method=}/{@code mode=}) dies at parse time with {@code
151+
* PatternMethod.valueOf("NULL")} because {@code AstBuilder.visitPatternsCommand} reads a null
152+
* from {@code settings.getSettingValue(Key.PATTERN_METHOD)}. With the defaults present, the
153+
* planner lowers patterns to SIMPLE / LABEL mode and adds {@code patterns_field}.
154+
*/
155+
@Test
156+
public void testPPLPatternsPicksUpDefaults() {
157+
givenQuery("source = catalog.employees | patterns name")
158+
.assertPlanContains("REGEXP_REPLACE")
159+
.assertFields("id", "name", "age", "department", "patterns_field");
160+
}
147161
}

integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@
6868
CalcitePPLCastFunctionIT.class,
6969
CalcitePPLConditionBuiltinFunctionIT.class,
7070
CalcitePPLCryptographicFunctionIT.class,
71+
CalcitePPLDashboardPatternsIT.class,
7172
CalcitePPLDedupIT.class,
7273
CalcitePPLEventstatsIT.class,
7374
CalciteStreamstatsCommandIT.class,
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
/*
2+
* Copyright OpenSearch Contributors
3+
* SPDX-License-Identifier: Apache-2.0
4+
*/
5+
6+
package org.opensearch.sql.calcite.remote;
7+
8+
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_HDFS_LOGS;
9+
import static org.opensearch.sql.util.MatcherUtils.rows;
10+
import static org.opensearch.sql.util.MatcherUtils.schema;
11+
import static org.opensearch.sql.util.MatcherUtils.verifyDataRows;
12+
import static org.opensearch.sql.util.MatcherUtils.verifySchemaInOrder;
13+
14+
import com.google.common.collect.ImmutableList;
15+
import java.io.IOException;
16+
import org.json.JSONObject;
17+
import org.junit.Test;
18+
import org.opensearch.sql.ppl.PPLIntegTestCase;
19+
20+
/** Pins the BRAIN-label pattern panel query shape used by OpenSearch Dashboards. */
21+
public class CalcitePPLDashboardPatternsIT extends PPLIntegTestCase {
22+
@Override
23+
public void init() throws Exception {
24+
super.init();
25+
enableCalcite();
26+
loadIndex(Index.HDFS_LOGS);
27+
}
28+
29+
@Test
30+
public void testDashboardBrainLabelStatsByPatternsField() throws IOException {
31+
JSONObject result =
32+
executeQuery(
33+
String.format(
34+
"source=%s"
35+
+ " | patterns content method=BRAIN mode=label"
36+
+ " max_sample_count=5 variable_count_threshold=5"
37+
+ " frequency_threshold_percentage=0.2"
38+
+ " | stats count() as pattern_count, take(content, 1) as sample_logs"
39+
+ " by patterns_field"
40+
+ " | sort - pattern_count"
41+
+ " | fields patterns_field, pattern_count, sample_logs",
42+
TEST_INDEX_HDFS_LOGS));
43+
verifySchemaInOrder(
44+
result,
45+
schema("patterns_field", "string"),
46+
schema("pattern_count", "bigint"),
47+
schema("sample_logs", "array"));
48+
verifyDataRows(
49+
result,
50+
rows(
51+
"BLOCK* NameSystem.addStoredBlock: blockMap updated: <*IP*> is added to blk_<*> size"
52+
+ " <*>",
53+
2,
54+
ImmutableList.of(
55+
"BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.251.31.85:50010 is added"
56+
+ " to blk_-7017553867379051457 size 67108864")),
57+
rows(
58+
"PacketResponder failed <*> blk_<*>",
59+
2,
60+
ImmutableList.of("PacketResponder failed for blk_6996194389878584395")),
61+
rows(
62+
"Verification succeeded <*> blk_<*>",
63+
2,
64+
ImmutableList.of("Verification succeeded for blk_-1547954353065580372")),
65+
rows(
66+
"<*> NameSystem.allocateBlock:"
67+
+ " /user/root/sortrand/_temporary/_task_<*>_<*>_r_<*>_<*>/part<*> blk_<*>",
68+
2,
69+
ImmutableList.of(
70+
"BLOCK* NameSystem.allocateBlock:"
71+
+ " /user/root/sortrand/_temporary/_task_200811092030_0002_r_000296_0/part-00296."
72+
+ " blk_-6620182933895093708")));
73+
}
74+
}

0 commit comments

Comments
 (0)