Commit 1183e73
authored
feat(ppl): wire patterns command for analytics-engine dashboard route (#5467)
* feat(api): add PATTERN_* settings defaults to UnifiedQueryContext
PPL `patterns` command's AstBuilder reads cluster settings for method/mode/
max_sample_count/buffer_limit/show_numbered_token defaults when the query
omits them. Without these in the analytics-engine path's settings map, the
parser reads null, falls into `PatternMethod.valueOf("NULL")`, and every
`patterns` query without an explicit `method=` or `mode=` argument fails at
parse time with `No enum constant PatternMethod.NULL`.
Mirrors the OpenSearchSettings defaults (SIMPLE_PATTERN / LABEL / 10 /
100000 / false). Part of the analytics-engine route support for the
`patterns` command.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* feat(core): emit 4-arg regexp_replace with 'g' flag for SIMPLE patterns
`buildParseRelNode` for `ParseMethod.PATTERNS` lowered through PPL's REPLACE
handler, which always emits Calcite's 3-arg `REGEXP_REPLACE_3`. That works on
the V2 / Calcite path (Calcite's default is replace-all), but the analytics-
engine route converts the call to substrait + DataFusion, and DataFusion's
`regexp_replace` defaults to first-match-only without an explicit "g" flag.
The dashboard test for `source = bank | patterns email mode=label` returned
`<*>@pyrami.com` instead of `<*>@<*>.<*>` because only the first
`[a-zA-Z0-9]+` run was replaced.
Bypass the REPLACE handler for the PATTERNS branch and emit
`REGEXP_REPLACE_PG_4` directly with a constant "g" flag. Same semantics on V2 /
Calcite (Calcite's REGEXP_REPLACE_PG_4 with "g" = replace-all); fixes the
analytics-engine path.
CalcitePPLPatternsTest plan-string expectations updated to match the 4-arg
form. 17/17 unit tests pass. IT result on analytics-engine route:
testSimplePatternLabelMode_NotShowNumberedToken now passes.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* test(integ-test): add CalcitePPLDashboardPatternsIT pinning BRAIN-label dashboard query
OpenSearch Dashboards renders BRAIN-pattern panels with the shape:
patterns ... method=BRAIN mode=label
| stats count() as pattern_count, take(message, 1) as sample_logs
by patterns_field
| sort -pattern_count
| fields patterns_field, pattern_count, sample_logs
This integration test pins that shape on the analytics-engine route so
regressions surface immediately. Schema-only assertions because BRAIN's
clustering output is dataset-version-sensitive — the contract we care about
is "the query plans, executes, and returns three columns in the right order".
Currently red end-to-end pending the BRAIN label window-UDF type-cascade
fix (see the OpenSearch-side WIP commit "BRAIN window UDF + dashboard
query path scaffolding" — the {@code PplWindowCallRewriter} stub
documents the remaining gap).
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* style: apply spotless formatting
Spotless drift from cherry-picking the analytics-engine patterns work
across upstream's recent formatting touch-ups. No behavior change.
Signed-off-by: Kai Huang <huangkaics@gmail.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* test(integ-test): update SIMPLE-patterns explain YAML for 4-arg regexp_replace
CalciteExplainIT's `testPatternsSimplePatternMethodWith{out,AggPushDown}Explain`
expected the old 3-arg `REGEXP_REPLACE(...)` form, but after the `feat(core)`
commit emits 4-arg `REGEXP_REPLACE(..., 'g':VARCHAR)` the plan output now
includes the extra operand both in the logical line and in the base64-encoded
compounded script of the physical/pushdown plan.
Regenerate both YAML expectations against the live planner.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* fix(opensearch): collapse 4-arg REGEXP_REPLACE_PG_4 'g' to 3-arg at script pushdown
The `feat(core)` commit on this branch lowered PPL `patterns` to a 4-arg
`REGEXP_REPLACE_PG_4(field, pattern, replacement, 'g')` so DataFusion (which
defaults to first-match-only) does global replacement on the analytics-engine
route. Calcite's enumerable runtime — which the V2 / Calcite-pushdown path uses
to compile the serialized RexCall into Janino bytecode — has no matching
`SqlFunctions.regexpReplace(String, String, String, String)` impl (only
`(String, String, String, int[, ...])` variants where the 4th arg is start
position, not a flags string). Janino codegen failed with
`No applicable constructor/method found` for the 4-arg-with-flags call shape,
breaking the patterns.md doctest (`source=apache | patterns message
method=simple_pattern mode=aggregation`).
Two complementary fixes:
1. `RexStandardizer.visitCall`: before serializing for pushdown, collapse
`REGEXP_REPLACE_PG_4(field, pattern, replacement, 'g')` to the 3-arg
`REGEXP_REPLACE_3` form. Safe because Calcite's 3-arg variant is already
replace-all (same semantics as PG_4 with `g`). Only fires when the flags
literal is exactly `"g"` so any future `i`/`m`/etc. use cases pass through
untouched.
2. `ExtendedRelJson.toOp`: pass operand count when looking up an operator on
the deserialization side so multi-arity SQL names (REGEXP_REPLACE_3 vs
REGEXP_REPLACE_PG_4 vs REGEXP_REPLACE_5 all share `name="REGEXP_REPLACE"`)
resolve to the right overload. Defensive — the standardizer fix above is
what actually unblocks the doctest, but the resolver was picking by name
alone and would have surfaced the same bug for any other overloaded
builtin.
Verified locally:
- doctest queries (`patterns ... method=simple_pattern mode=aggregation [...]`)
now return fully-tokenized output;
- `CalcitePPLDashboardPatternsIT` still 1/1 PASS;
- `CalcitePPLPatternsIT` still 10/15 with the same five known-pending failures
(LogicalCorrelate + `_ShowNumberedToken` BRAIN cases).
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* fix(opensearch): revert arity-aware toOp; restore spath/JSON_EXTRACT doctest
The arity filter added to ExtendedRelJson.toOp in the previous commit broke
SAFE_CAST → JSON_EXTRACT deserialization (used by `spath` lowering): the
PPL JSON_EXTRACT UDF, registered as an anonymous UserDefinedFunctionBuilder
subclass, doesn't expose a meaningful getOperandCountRange(), so my filter
fell through to the firstKindMatch path and skipped the
AvaticaUtils.instantiatePlugin "class" path that previously resolved the
UDF. spath.md doctest started returning RuntimeException on
`source=structured | spath input=doc_n n | eval n=cast(n as int) | stats sum(n)`.
The RexStandardizer collapse (4-arg `REGEXP_REPLACE_PG_4(..., 'g')` → 3-arg
`REGEXP_REPLACE_3`) already fixes the patterns.md doctest at the source side
— by the time pushdown serialization runs, no 4-arg call exists for toOp to
disambiguate. The arity filter was defensive only and no longer carries its
weight; revert toOp to the original first-kind-match behavior, plus a spotless
re-flow that came in with the same change.
Verified locally on a fresh cluster:
- spath.md doctest query → returns sum(n)=6 (was 500).
- patterns.md doctest query → returns fully-tokenized aggregation rows.
- CalcitePPLDashboardPatternsIT → 1/1 PASS.
- CalcitePPLPatternsIT → 10/15 PASS (same baseline; same five known-pending
BRAIN failures tracked separately).
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* style: trim verbose comments per review
Per @penghuo: drop the verbose multi-line explanatory comments and tighten
the class/method javadoc on the new dashboard IT.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* test(integ-test): add verifyDataRows to dashboard patterns IT
Per @dai-chen: schema-only verification doesn't catch "query succeeds but
returns 0/wrong rows". Pin the 4 BRAIN clusters with their exact patterns,
counts, and sample logs against the HDFS_LOGS fixture.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* refactor(core): fuse PATTERNS if-else in buildParseRelNode
Per @dai-chen: the two consecutive `if (PATTERNS)` branches in
buildParseRelNode share a condition; merge into a single if/else with
each branch fully co-located. Pure refactor — CalcitePPLPatternsTest
(logical-plan unit test) passes.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* test(integ-test): include CalcitePPLDashboardPatternsIT in CalciteNoPushdownIT
Per CLAUDE.md guidance, new Calcite IT classes should be added to the
no-pushdown suite. Verified locally that the dashboard query also passes
with pushdown disabled (Dashboard 1/1, Patterns 10/15 — same baseline).
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* test(integ-test): regenerate agg-push explain YAML for 3-arg REGEXP_REPLACE
The previous YAML capture pre-dated the RexStandardizer 4-arg → 3-arg
collapse landing. With the collapse, the pushed-down compounded script
serializes the 3-arg form (SOURCES has 7 entries, no trailing 'g').
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* revert(core): drop SQL-side 'g' flag for patterns; move to DataFusion adapter
Per @penghuo's review: DataFusion-specific concerns shouldn't live in SQL core.
The 'g' flag is needed only because DataFusion's regexp_replace defaults to
first-match-only — Calcite's 3-arg form is already replace-all on both pushdown
and no-pushdown paths.
Restores SQL core, RexStandardizer, the patterns unit test, and the SIMPLE-
patterns explain YAMLs to their upstream/main shape. The 'g' flag is appended
in opensearch-project/OpenSearch#21797's RegexpReplaceAdapter when converting
3-arg REGEXP_REPLACE to DataFusion. Same end-user behavior, smaller SQL diff,
and the Calcite no-pushdown path no longer diverges from the pushdown YAML.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* test(api): pin UnifiedQueryContext PATTERN_* defaults via planner test
Per @dai-chen: verify the RelNode produced when `patterns <field>` is run
without explicit method=/mode= args — exercises that the PATTERN_METHOD and
PATTERN_MODE defaults flow through to AstBuilder.visitPatternsCommand and
produce a valid SIMPLE/LABEL lowering with a `patterns_field` projection.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* style: spotlessApply
Signed-off-by: Kai Huang <ahkcs@amazon.com>
---------
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <huangkaics@gmail.com>1 parent 37090ba commit 1183e73
4 files changed
Lines changed: 106 additions & 6 deletions
File tree
- api/src
- main/java/org/opensearch/sql/api
- test/java/org/opensearch/sql/api
- integ-test/src/test/java/org/opensearch/sql/calcite
- remote
Lines changed: 17 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
9 | 14 | | |
10 | 15 | | |
11 | 16 | | |
| |||
145 | 150 | | |
146 | 151 | | |
147 | 152 | | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
154 | 165 | | |
155 | 166 | | |
156 | 167 | | |
| |||
Lines changed: 14 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
144 | 144 | | |
145 | 145 | | |
146 | 146 | | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
147 | 161 | | |
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| 71 | + | |
71 | 72 | | |
72 | 73 | | |
73 | 74 | | |
| |||
Lines changed: 74 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
0 commit comments