Skip to content

feat(gfql/cypher): standard =~ regex + numeric fns + ^ power + toLower/toUpper (#1673)#1675

Open
lmeyerov wants to merge 9 commits into
dev/gfql-polars-enginefrom
dev/gfql-cypher-viz-ops
Open

feat(gfql/cypher): standard =~ regex + numeric fns + ^ power + toLower/toUpper (#1673)#1675
lmeyerov wants to merge 9 commits into
dev/gfql-polars-enginefrom
dev/gfql-cypher-viz-ops

Conversation

@lmeyerov

@lmeyerov lmeyerov commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Implements the streamgl-viz filter-language conformance gaps (#1673) by adopting standard openCypher/neo4j Cypher syntax — not inventing dialect. Stacked on #1674 (polars contains regex fix).

Research first

Verified each construct against the neo4j Cypher manual, openCypher 9, and ISO GQL (sources in the issue). Finding: most "gaps" are standard Cypher, so we adopt the standard forms. LIKE/ILIKE/BETWEEN are SQL-only (not in Cypher or GQL) → intentionally not implemented; chained comparison (1 < n.age < 65) already worked.

What's added (all standard openCypher/neo4j)

  • =~ regex-match operator — Java-regex, full/anchored match (n.name =~ 'AB' matches only 'AB'), inline (?i)/(?m)/(?s) flags; lowers to fullmatch. Simple WHERE …=~… on all engines (filter_by_dict); composes through AND/OR/NOT/RETURN on pandas/cuDF (polars declines the complex OR/NOT row-filter form with an honest NotImplementedError — pre-existing where_rows gap). Adds native polars Match/Fullmatch lowering.
  • Numeric functions floor, ceil (≡ceiling), round(x)/round(x, precision) (complementing existing abs/sqrt/sign).
  • ^ power operator — right-associative, binds tighter than * / %, returns float; new power grammar tier in the cypher parser + shared expr engine.
  • toLower/toUpper — the idiomatic case-insensitive compare.

Wired end-to-end: cypher grammar + shared expr_parser (REGEX_MATCH terminal, pow_op) + allow-lists + evaluators (pandas/cuDF row/pipeline.py+row/dispatch.py; polars predicates.py+row_pipeline.py).

Tests / quality

Differential-parity tested pandas↔polars for =~, Match/Fullmatch, numeric fns, ^ (right-assoc + precedence), sign, toLower/toUpper. 1575+ tests pass across expr_parser/row-pipeline/polars/cypher-lowering; ruff+mypy clean. A focused adversarial review found no BLOCKERs; its two findings are addressed (polars parity now guarded by parametrized tests; polars sign() cast to Int64 to match pandas/neo4j Integer spec).

Docs

docs/source/gfql/cypher.rst: =~ in WHERE Forms + a new "Scalar Functions and Operators" section (fns/^/toLower + why LIKE/BETWEEN aren't provided). CHANGELOG entries for both increments.

Deferred (documented)

🤖 Generated with Claude Code

lmeyerov added a commit that referenced this pull request Jul 2, 2026
…s (no crash)

GPU-parity pass (viz-filter #1673 item 2) on dgx found `MATCH (n) WHERE n.name =~
'(?i)…'` CRASHES on engine='cudf' with libcudf "invalid regex pattern: nothing to
repeat". Root cause: libcudf's regex engine rejects inline flag groups ((?i)/(?m)/
(?s)) at ANY position (verified: '(?i)abc', '^(?i)abc$', '(?i)^abc$' all raise;
only flag-free '^abc$' works) — not merely a position issue.

Fix: the Match/Fullmatch cuDF branches now translate a leading (?i) to the existing
lowercase case-folding workaround (parity with pandas' (?i)), and honestly decline
any other inline flag with NotImplementedError instead of crashing. Shared helper
_cudf_regex_prep. pandas/polars paths untouched.

Validated on dgx (RAPIDS 26.02): cudf =~ '(?i)a.c' == pandas [2,3] (was RuntimeError);
446 regex/match/fullmatch/contains/numeric tests pass across pandas/polars/cudf; +1
cudf-gated regression test (test_regex_cudf_inline_flag_parity). ruff+mypy clean.

Also confirms viz-filter #1673 item 2: cuDF numeric (floor/ceil/round) + toLower/
toUpper parity OK; polars-gpu is N/A on this branch (#1675 is off #1660, below #1655
which introduces polars-gpu).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the dev/gfql-polars-contains-regex-fix branch from b8f7d2c to 6c68f1c Compare July 2, 2026 16:19
@lmeyerov lmeyerov force-pushed the dev/gfql-cypher-viz-ops branch from 8b94e1a to d90d7ea Compare July 2, 2026 16:19
lmeyerov added a commit that referenced this pull request Jul 2, 2026
…s (no crash)

GPU-parity pass (viz-filter #1673 item 2) on dgx found `MATCH (n) WHERE n.name =~
'(?i)…'` CRASHES on engine='cudf' with libcudf "invalid regex pattern: nothing to
repeat". Root cause: libcudf's regex engine rejects inline flag groups ((?i)/(?m)/
(?s)) at ANY position (verified: '(?i)abc', '^(?i)abc$', '(?i)^abc$' all raise;
only flag-free '^abc$' works) — not merely a position issue.

Fix: the Match/Fullmatch cuDF branches now translate a leading (?i) to the existing
lowercase case-folding workaround (parity with pandas' (?i)), and honestly decline
any other inline flag with NotImplementedError instead of crashing. Shared helper
_cudf_regex_prep. pandas/polars paths untouched.

Validated on dgx (RAPIDS 26.02): cudf =~ '(?i)a.c' == pandas [2,3] (was RuntimeError);
446 regex/match/fullmatch/contains/numeric tests pass across pandas/polars/cudf; +1
cudf-gated regression test (test_regex_cudf_inline_flag_parity). ruff+mypy clean.

Also confirms viz-filter #1673 item 2: cuDF numeric (floor/ceil/round) + toLower/
toUpper parity OK; polars-gpu is N/A on this branch (#1675 is off #1660, below #1655
which introduces polars-gpu).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov added a commit that referenced this pull request Jul 2, 2026
…in _cudf_regex_prep

The cuDF =~ flags commit annotated _cudf_regex_prep with "tuple[object, bool]",
which mypy rejects on py3.8/3.9 (builtin generics need 3.9+/3.10+); broke the
python-lint-types (3.8, 3.9) lanes on #1675/#1677. Use typing.Tuple.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov added a commit that referenced this pull request Jul 2, 2026
… transform)

_cudf_regex_prep is a pure string transform (no cuDF required), but its only
tests were cuDF-gated, so CPU CI left the new lines uncovered and #1675's
changed-line-coverage gate fell to 65.9%. Add direct unit tests for every
branch: non-str/no-flag passthrough, leading (?i)/(?ii) -> case-fold, other
inline flags -> honest NotImplementedError. Remaining uncovered lines are the
is_cudf execution branches (GPU-only; covered by the dgx parity pass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov and others added 9 commits July 2, 2026 16:38
…lars Match/Fullmatch (I1+I2, #1673)

Adopts the STANDARD openCypher/neo4j `=~` regex-match operator rather than
inventing dialect (research: `=~` is standard Cypher; Java-regex, full/anchored
match, inline (?i)/(?m)/(?s) flags; LIKE/ILIKE are SQL-only and stay unimplemented).

Wired end-to-end:
- Cypher WHERE-predicate grammar + expression grammar (`=~`), lowered to the
  existing `fullmatch` predicate (full/anchored match, not partial).
- Shared expr engine: REGEX_MATCH terminal + regex_op -> BinaryOp("regex")
  (high-priority terminal so `=~` beats `=`), allow-listed, evaluated via a
  fullmatch dispatch (pandas/cuDF series + scalar).
- polars `Match`/`Fullmatch` native lowering added (were NotImplementedError),
  so `=~` and match()/fullmatch() predicates run on polars.

Coverage: simple `WHERE prop =~ '...'` on all engines (filter_by_dict);
composes through AND/OR/NOT/RETURN on pandas/cuDF; polars declines the complex
OR/NOT row-filter form with an honest NotImplementedError (pre-existing polars
`where_rows` gap, not =~-specific). Differential-parity tested vs pandas oracle;
1306 tests pass across expr_parser + cypher lowering + polars chain; ruff+mypy
clean. CHANGELOG + docs (docs/source/gfql/cypher.rst).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r (I3, #1673)

Adopts standard openCypher/neo4j scalar functions + operator (research-verified;
these are standard Cypher, not gaps to invent):
- floor, ceil (alias ceiling), round(x) / round(x, precision) — numeric, return float
- toLower / toUpper — the idiomatic case-insensitive compare
- `^` exponentiation operator — right-associative, binds tighter than * / %,
  returns float (new power grammar tier in both the cypher parser and shared
  expr_parser; allow-listed; evaluated pandas/cuDF + polars)

(abs/sqrt/sign and chained comparison `1 < x < 9` were already supported.)
Differential-parity tested vs pandas oracle; 1575 tests pass across expr_parser,
row pipeline, polars, and cypher lowering; ruff+mypy clean. CHANGELOG + docs.

Two documented minor divergences: round uses numpy default (half-to-even) and
the 3-arg round(x,prec,mode) form is deferred; `-2 ^ 2` folds to (-2)^2 due to
the pre-existing negative-literal lexer (column base `-n.x ^ 2` is correct).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… sign Int64 (review)

Review findings on the =~/numeric-fns branch:
- IMPORTANT: the numeric/^/toLower tests ran only on pandas; parametrize them
  engine=[pandas, polars] so the differential-parity claim is guarded (adds a
  sign() case too). Regression guard for the polars row_pipeline lowerings.
- SUGGESTION: polars sign() returned float vs pandas int; cast to Int64 to match
  the pandas engine and neo4j's Integer-returning spec.

24 parity cases pass (pandas+polars); ruff+mypy clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…th importorskip

The engine=[pandas, polars] parametrization added in the review-fix commit ran
engine='polars' unconditionally in test_lowering.py (a pandas-oriented file with
no module-level polars guard), so it ERRORED (not skipped) in polars-less CI
lanes (test-gfql-core, test-pandas-compat-gfql). Add pytest.importorskip('polars')
on the polars branch of both tests, matching the repo convention (polars-specific
suites already importorskip). Pandas variants unaffected; polars variants run
where polars is installed, skip where it isn't.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… cover numeric scalar/null

CI surfaced two real issues on the numeric/^ increment:
1. tck-gfql (correct corpus): the openCypher TCK marks exponentiation precedence
   (Precedence2) as reject-expected ("expression evaluation not supported") AND
   its expected `4 ^ (3*2) ^ 3 = 4^18` shows the TCK treats `^` as LEFT-
   associative (my impl was right-assoc per neo4j docs). Implementing `^` both
   broke the corpus xfail-contract (execute vs expected-reject) and had ambiguous
   associativity. Reverted `^` — it needs a coordinated corpus xfail update + a
   settled associativity decision. Kept the clean numeric fns + toLower/toUpper.
2. changed-line-coverage: added scalar + null-scalar test coverage for
   floor/ceil/round/toLower/toUpper (86% of changed lines; remaining uncovered
   are cuDF-native branches + pre-existing shifted lines).

Also guards the polars-parametrized tests with importorskip (fixed separately in
f17bdf8). floor/ceil/ceiling/round/toLower/toUpper + =~ unchanged and green.
1607 tests pass; ruff+mypy clean. CHANGELOG + docs updated to drop `^`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s (no crash)

GPU-parity pass (viz-filter #1673 item 2) on dgx found `MATCH (n) WHERE n.name =~
'(?i)…'` CRASHES on engine='cudf' with libcudf "invalid regex pattern: nothing to
repeat". Root cause: libcudf's regex engine rejects inline flag groups ((?i)/(?m)/
(?s)) at ANY position (verified: '(?i)abc', '^(?i)abc$', '(?i)^abc$' all raise;
only flag-free '^abc$' works) — not merely a position issue.

Fix: the Match/Fullmatch cuDF branches now translate a leading (?i) to the existing
lowercase case-folding workaround (parity with pandas' (?i)), and honestly decline
any other inline flag with NotImplementedError instead of crashing. Shared helper
_cudf_regex_prep. pandas/polars paths untouched.

Validated on dgx (RAPIDS 26.02): cudf =~ '(?i)a.c' == pandas [2,3] (was RuntimeError);
446 regex/match/fullmatch/contains/numeric tests pass across pandas/polars/cudf; +1
cudf-gated regression test (test_regex_cudf_inline_flag_parity). ruff+mypy clean.

Also confirms viz-filter #1673 item 2: cuDF numeric (floor/ceil/round) + toLower/
toUpper parity OK; polars-gpu is N/A on this branch (#1675 is off #1660, below #1655
which introduces polars-gpu).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…in _cudf_regex_prep

The cuDF =~ flags commit annotated _cudf_regex_prep with "tuple[object, bool]",
which mypy rejects on py3.8/3.9 (builtin generics need 3.9+/3.10+); broke the
python-lint-types (3.8, 3.9) lanes on #1675/#1677. Use typing.Tuple.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… transform)

_cudf_regex_prep is a pure string transform (no cuDF required), but its only
tests were cuDF-gated, so CPU CI left the new lines uncovered and #1675's
changed-line-coverage gate fell to 65.9%. Add direct unit tests for every
branch: non-str/no-flag passthrough, leading (?i)/(?ii) -> case-fold, other
inline flags -> honest NotImplementedError. Remaining uncovered lines are the
is_cudf execution branches (GPU-only; covered by the dgx parity pass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lane

The polars-parametrized cypher tests (numeric fns, toLower, =~ parity in
graphistry/tests/compute/gfql/cypher/test_lowering.py) never executed in CI:
the core lane has no polars installed (importorskip -> skip) and the polars
lane's fixed file list excluded the file. So the polars numeric lowerings this
branch adds were CI-untested and their changed lines uncovered (coverage gate
76.9% < 80). Add the file (-k polars) to bin/test-polars.sh and the ci.yml
polars coverage step (--cov-append). Moved from the stacked #1677 to here — it
belongs with the lowerings it tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov changed the base branch from dev/gfql-polars-contains-regex-fix to dev/gfql-polars-engine July 2, 2026 23:40
@lmeyerov lmeyerov force-pushed the dev/gfql-cypher-viz-ops branch from f997217 to 31054a8 Compare July 2, 2026 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GFQL Cypher dialect: host the streamgl-viz filter language (regex/case, floor/ceil/round/pow, prune-isolated, two-mask via GRAPH{})

1 participant