Skip to content

[Bug Fix] Fix PPL syntax highlighting accepting invalid field=/comparison source fields#5559

Closed
Hanyu-W wants to merge 1 commit into
opensearch-project:mainfrom
Hanyu-W:fix/grok-parse-spath-source-field
Closed

[Bug Fix] Fix PPL syntax highlighting accepting invalid field=/comparison source fields#5559
Hanyu-W wants to merge 1 commit into
opensearch-project:mainfrom
Hanyu-W:fix/grok-parse-spath-source-field

Conversation

@Hanyu-W

@Hanyu-W Hanyu-W commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Description

The grammar typed the source field of grok/parse/patterns (and spath input=/output=) as the broad expression rule. Any <name>=<name> form (e.g. grok field=body "...", grok foo=bar "...") matched the compareExpr alternative and was silently accepted — parsed as a comparison instead of a syntax error. On clusters >= 3.6 where the Explore editor uses the runtime grammar bundle (via /_plugins/_ppl/_grammar), there was no error highlighting at all for these forms. The documented form is positional (grok <field> <pattern>).

This narrows each source-field slot to the tightest rule matching the command's contract, tightening both the server parser (rejects at parse time instead of misparsing) and the grammar bundle the Explore editor consumes:

  • grok / parse / patterns: expressionvalueExpression (drops comparisons, keeps function-call/arithmetic sources like parse DATE_FORMAT(...) '...')
  • spath input/output: expressionfieldExpression (source is stringified into a field name, never evaluated)

Testing

  • 8 negative tests asserting SyntaxCheckException for invalid forms
  • 3 positive regression guards (dotted/keyword fields, function-call source, patterns with options)
  • Verified server-side: POST /_plugins/_ppl with invalid forms returns 400; valid forms execute correctly

Related Issues

Resolves #5558

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…arison source fields

The Explore query editor validates PPL in the browser using the runtime
grammar bundle it fetches from the cluster (for data sources >= 3.6). That
bundle is the serialized form of this `.g4`. Because the grammar typed the
source field of `grok`/`parse`/`patterns` (and `spath` `input=`/`output=`) as
the broad `expression` rule, the editor accepted syntax that is not valid PPL
and drew no error squiggle on it.

Concretely, `FIELD` (and `SED`, `MAX_MATCH`, `OFFSET_FIELD`) are valid
keyword-as-identifiers, so `grok field=body "..."` matched the `compareExpr`
alternative of `expression` (parsed as `Compare(field, =, body)`) and was
accepted as well-formed. The documented form is positional:
`grok <field> <pattern>`.

This bug fix narrows each source-field slot to the tightest grammar rule that
matches the command's real contract, so the editor (and the server parser that
shares this grammar) flags the invalid form.

- grok / parse / patterns: `expression` -> `valueExpression`.
  `valueExpression` excludes comparison/IN/BETWEEN/IS-NULL (which live only at
  the `expression` tier) but still includes function-call and arithmetic
  sources the engine genuinely evaluates — e.g.
  `parse DATE_FORMAT(HIREDATE, '%Y-%m-%d') '...'` stays valid (this is why
  narrowing all the way to `fieldExpression` would have been wrong).
- spath: `input`/`output` `expression` -> `fieldExpression`.
  spath stringifies its input into a field name and never evaluates it, so the
  source must be a bare field path; function calls/comparisons there are now
  flagged instead of accepted.

No AstBuilder change is needed: the labeled `ctx.source_field` accessor and
`param.input.getText()` work regardless of the rule's static type, and nothing
casts these contexts to a specific subtype.

Tests: 8 negative cases asserting the parser rejects the invalid forms
(`SyntaxCheckException`: named-param, comparison, keyword-named-param, spath
function-call/comparison) plus 3 positive regression guards (dotted/keyword
field names; function-call source for parse; dotted field with options for
patterns).
Verified in the Explore editor: the invalid forms now render a red squiggle
and the valid forms do not.

Note: this also tightens the server parser (same grammar), so `grok field=body`
is now rejected before semantic analysis instead of being silently misparsed.
That is a user-visible behavior change on the server, secondary to the
highlighting fix.

Signed-off-by: Hanyu Wei <weihanyu@amazon.com>

@RyanL1997 RyanL1997 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change @Hanyu-W. Can we also add some ITs?

@Hanyu-W

Hanyu-W commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Closing the PR-I have a better fix upcoming in the linting project.

@Hanyu-W Hanyu-W closed this Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugFix PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] PPL syntax highlighting does not flag invalid source fields in grok, parse, patterns, spath

3 participants