feat(gfql/cypher): standard =~ regex + numeric fns + ^ power + toLower/toUpper (#1673)#1675
Open
lmeyerov wants to merge 9 commits into
Open
feat(gfql/cypher): standard =~ regex + numeric fns + ^ power + toLower/toUpper (#1673)#1675lmeyerov wants to merge 9 commits into
lmeyerov wants to merge 9 commits into
Conversation
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
…s (no crash) GPU-parity pass (viz-filter #1673 item 2) on dgx found `MATCH (n) WHERE n.name =~ '(?i)…'` CRASHES on engine='cudf' with libcudf "invalid regex pattern: nothing to repeat". Root cause: libcudf's regex engine rejects inline flag groups ((?i)/(?m)/ (?s)) at ANY position (verified: '(?i)abc', '^(?i)abc$', '(?i)^abc$' all raise; only flag-free '^abc$' works) — not merely a position issue. Fix: the Match/Fullmatch cuDF branches now translate a leading (?i) to the existing lowercase case-folding workaround (parity with pandas' (?i)), and honestly decline any other inline flag with NotImplementedError instead of crashing. Shared helper _cudf_regex_prep. pandas/polars paths untouched. Validated on dgx (RAPIDS 26.02): cudf =~ '(?i)a.c' == pandas [2,3] (was RuntimeError); 446 regex/match/fullmatch/contains/numeric tests pass across pandas/polars/cudf; +1 cudf-gated regression test (test_regex_cudf_inline_flag_parity). ruff+mypy clean. Also confirms viz-filter #1673 item 2: cuDF numeric (floor/ceil/round) + toLower/ toUpper parity OK; polars-gpu is N/A on this branch (#1675 is off #1660, below #1655 which introduces polars-gpu). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
b8f7d2c to
6c68f1c
Compare
8b94e1a to
d90d7ea
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
…s (no crash) GPU-parity pass (viz-filter #1673 item 2) on dgx found `MATCH (n) WHERE n.name =~ '(?i)…'` CRASHES on engine='cudf' with libcudf "invalid regex pattern: nothing to repeat". Root cause: libcudf's regex engine rejects inline flag groups ((?i)/(?m)/ (?s)) at ANY position (verified: '(?i)abc', '^(?i)abc$', '(?i)^abc$' all raise; only flag-free '^abc$' works) — not merely a position issue. Fix: the Match/Fullmatch cuDF branches now translate a leading (?i) to the existing lowercase case-folding workaround (parity with pandas' (?i)), and honestly decline any other inline flag with NotImplementedError instead of crashing. Shared helper _cudf_regex_prep. pandas/polars paths untouched. Validated on dgx (RAPIDS 26.02): cudf =~ '(?i)a.c' == pandas [2,3] (was RuntimeError); 446 regex/match/fullmatch/contains/numeric tests pass across pandas/polars/cudf; +1 cudf-gated regression test (test_regex_cudf_inline_flag_parity). ruff+mypy clean. Also confirms viz-filter #1673 item 2: cuDF numeric (floor/ceil/round) + toLower/ toUpper parity OK; polars-gpu is N/A on this branch (#1675 is off #1660, below #1655 which introduces polars-gpu). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
…in _cudf_regex_prep The cuDF =~ flags commit annotated _cudf_regex_prep with "tuple[object, bool]", which mypy rejects on py3.8/3.9 (builtin generics need 3.9+/3.10+); broke the python-lint-types (3.8, 3.9) lanes on #1675/#1677. Use typing.Tuple. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
… transform) _cudf_regex_prep is a pure string transform (no cuDF required), but its only tests were cuDF-gated, so CPU CI left the new lines uncovered and #1675's changed-line-coverage gate fell to 65.9%. Add direct unit tests for every branch: non-str/no-flag passthrough, leading (?i)/(?ii) -> case-fold, other inline flags -> honest NotImplementedError. Remaining uncovered lines are the is_cudf execution branches (GPU-only; covered by the dgx parity pass). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lars Match/Fullmatch (I1+I2, #1673) Adopts the STANDARD openCypher/neo4j `=~` regex-match operator rather than inventing dialect (research: `=~` is standard Cypher; Java-regex, full/anchored match, inline (?i)/(?m)/(?s) flags; LIKE/ILIKE are SQL-only and stay unimplemented). Wired end-to-end: - Cypher WHERE-predicate grammar + expression grammar (`=~`), lowered to the existing `fullmatch` predicate (full/anchored match, not partial). - Shared expr engine: REGEX_MATCH terminal + regex_op -> BinaryOp("regex") (high-priority terminal so `=~` beats `=`), allow-listed, evaluated via a fullmatch dispatch (pandas/cuDF series + scalar). - polars `Match`/`Fullmatch` native lowering added (were NotImplementedError), so `=~` and match()/fullmatch() predicates run on polars. Coverage: simple `WHERE prop =~ '...'` on all engines (filter_by_dict); composes through AND/OR/NOT/RETURN on pandas/cuDF; polars declines the complex OR/NOT row-filter form with an honest NotImplementedError (pre-existing polars `where_rows` gap, not =~-specific). Differential-parity tested vs pandas oracle; 1306 tests pass across expr_parser + cypher lowering + polars chain; ruff+mypy clean. CHANGELOG + docs (docs/source/gfql/cypher.rst). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r (I3, #1673) Adopts standard openCypher/neo4j scalar functions + operator (research-verified; these are standard Cypher, not gaps to invent): - floor, ceil (alias ceiling), round(x) / round(x, precision) — numeric, return float - toLower / toUpper — the idiomatic case-insensitive compare - `^` exponentiation operator — right-associative, binds tighter than * / %, returns float (new power grammar tier in both the cypher parser and shared expr_parser; allow-listed; evaluated pandas/cuDF + polars) (abs/sqrt/sign and chained comparison `1 < x < 9` were already supported.) Differential-parity tested vs pandas oracle; 1575 tests pass across expr_parser, row pipeline, polars, and cypher lowering; ruff+mypy clean. CHANGELOG + docs. Two documented minor divergences: round uses numpy default (half-to-even) and the 3-arg round(x,prec,mode) form is deferred; `-2 ^ 2` folds to (-2)^2 due to the pre-existing negative-literal lexer (column base `-n.x ^ 2` is correct). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… sign Int64 (review) Review findings on the =~/numeric-fns branch: - IMPORTANT: the numeric/^/toLower tests ran only on pandas; parametrize them engine=[pandas, polars] so the differential-parity claim is guarded (adds a sign() case too). Regression guard for the polars row_pipeline lowerings. - SUGGESTION: polars sign() returned float vs pandas int; cast to Int64 to match the pandas engine and neo4j's Integer-returning spec. 24 parity cases pass (pandas+polars); ruff+mypy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…th importorskip
The engine=[pandas, polars] parametrization added in the review-fix commit ran
engine='polars' unconditionally in test_lowering.py (a pandas-oriented file with
no module-level polars guard), so it ERRORED (not skipped) in polars-less CI
lanes (test-gfql-core, test-pandas-compat-gfql). Add pytest.importorskip('polars')
on the polars branch of both tests, matching the repo convention (polars-specific
suites already importorskip). Pandas variants unaffected; polars variants run
where polars is installed, skip where it isn't.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… cover numeric scalar/null
CI surfaced two real issues on the numeric/^ increment:
1. tck-gfql (correct corpus): the openCypher TCK marks exponentiation precedence
(Precedence2) as reject-expected ("expression evaluation not supported") AND
its expected `4 ^ (3*2) ^ 3 = 4^18` shows the TCK treats `^` as LEFT-
associative (my impl was right-assoc per neo4j docs). Implementing `^` both
broke the corpus xfail-contract (execute vs expected-reject) and had ambiguous
associativity. Reverted `^` — it needs a coordinated corpus xfail update + a
settled associativity decision. Kept the clean numeric fns + toLower/toUpper.
2. changed-line-coverage: added scalar + null-scalar test coverage for
floor/ceil/round/toLower/toUpper (86% of changed lines; remaining uncovered
are cuDF-native branches + pre-existing shifted lines).
Also guards the polars-parametrized tests with importorskip (fixed separately in
f17bdf8). floor/ceil/ceiling/round/toLower/toUpper + =~ unchanged and green.
1607 tests pass; ruff+mypy clean. CHANGELOG + docs updated to drop `^`.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s (no crash) GPU-parity pass (viz-filter #1673 item 2) on dgx found `MATCH (n) WHERE n.name =~ '(?i)…'` CRASHES on engine='cudf' with libcudf "invalid regex pattern: nothing to repeat". Root cause: libcudf's regex engine rejects inline flag groups ((?i)/(?m)/ (?s)) at ANY position (verified: '(?i)abc', '^(?i)abc$', '(?i)^abc$' all raise; only flag-free '^abc$' works) — not merely a position issue. Fix: the Match/Fullmatch cuDF branches now translate a leading (?i) to the existing lowercase case-folding workaround (parity with pandas' (?i)), and honestly decline any other inline flag with NotImplementedError instead of crashing. Shared helper _cudf_regex_prep. pandas/polars paths untouched. Validated on dgx (RAPIDS 26.02): cudf =~ '(?i)a.c' == pandas [2,3] (was RuntimeError); 446 regex/match/fullmatch/contains/numeric tests pass across pandas/polars/cudf; +1 cudf-gated regression test (test_regex_cudf_inline_flag_parity). ruff+mypy clean. Also confirms viz-filter #1673 item 2: cuDF numeric (floor/ceil/round) + toLower/ toUpper parity OK; polars-gpu is N/A on this branch (#1675 is off #1660, below #1655 which introduces polars-gpu). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…in _cudf_regex_prep The cuDF =~ flags commit annotated _cudf_regex_prep with "tuple[object, bool]", which mypy rejects on py3.8/3.9 (builtin generics need 3.9+/3.10+); broke the python-lint-types (3.8, 3.9) lanes on #1675/#1677. Use typing.Tuple. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… transform) _cudf_regex_prep is a pure string transform (no cuDF required), but its only tests were cuDF-gated, so CPU CI left the new lines uncovered and #1675's changed-line-coverage gate fell to 65.9%. Add direct unit tests for every branch: non-str/no-flag passthrough, leading (?i)/(?ii) -> case-fold, other inline flags -> honest NotImplementedError. Remaining uncovered lines are the is_cudf execution branches (GPU-only; covered by the dgx parity pass). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lane The polars-parametrized cypher tests (numeric fns, toLower, =~ parity in graphistry/tests/compute/gfql/cypher/test_lowering.py) never executed in CI: the core lane has no polars installed (importorskip -> skip) and the polars lane's fixed file list excluded the file. So the polars numeric lowerings this branch adds were CI-untested and their changed lines uncovered (coverage gate 76.9% < 80). Add the file (-k polars) to bin/test-polars.sh and the ci.yml polars coverage step (--cov-append). Moved from the stacked #1677 to here — it belongs with the lowerings it tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f997217 to
31054a8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the streamgl-viz filter-language conformance gaps (#1673) by adopting standard openCypher/neo4j Cypher syntax — not inventing dialect. Stacked on #1674 (polars
containsregex fix).Research first
Verified each construct against the neo4j Cypher manual, openCypher 9, and ISO GQL (sources in the issue). Finding: most "gaps" are standard Cypher, so we adopt the standard forms.
LIKE/ILIKE/BETWEENare SQL-only (not in Cypher or GQL) → intentionally not implemented; chained comparison (1 < n.age < 65) already worked.What's added (all standard openCypher/neo4j)
=~regex-match operator — Java-regex, full/anchored match (n.name =~ 'AB'matches only'AB'), inline(?i)/(?m)/(?s)flags; lowers tofullmatch. SimpleWHERE …=~…on all engines (filter_by_dict); composes throughAND/OR/NOT/RETURNon pandas/cuDF (polars declines the complexOR/NOTrow-filter form with an honestNotImplementedError— pre-existingwhere_rowsgap). Adds native polarsMatch/Fullmatchlowering.floor,ceil(≡ceiling),round(x)/round(x, precision)(complementing existingabs/sqrt/sign).^power operator — right-associative, binds tighter than* / %, returns float; new power grammar tier in the cypher parser + shared expr engine.toLower/toUpper— the idiomatic case-insensitive compare.Wired end-to-end: cypher grammar + shared
expr_parser(REGEX_MATCH terminal,pow_op) + allow-lists + evaluators (pandas/cuDFrow/pipeline.py+row/dispatch.py; polarspredicates.py+row_pipeline.py).Tests / quality
Differential-parity tested pandas↔polars for
=~, Match/Fullmatch, numeric fns,^(right-assoc + precedence),sign,toLower/toUpper. 1575+ tests pass across expr_parser/row-pipeline/polars/cypher-lowering; ruff+mypy clean. A focused adversarial review found no BLOCKERs; its two findings are addressed (polars parity now guarded by parametrized tests; polarssign()cast to Int64 to match pandas/neo4j Integer spec).Docs
docs/source/gfql/cypher.rst:=~in WHERE Forms + a new "Scalar Functions and Operators" section (fns/^/toLower + why LIKE/BETWEEN aren't provided). CHANGELOG entries for both increments.Deferred (documented)
round(x, precision, mode)3-arg form;rounduses numpy half-to-even.-2 ^ 2folds to(-2)^2(pre-existing negative-literal lexer; column base-n.x ^ 2is correct).🤖 Generated with Claude Code