Skip to content

Commit e1707bf

Browse files
lmeyerovclaude
andauthored
feat(gfql/cypher): NOT-pattern AST plumbing (#1031 slice 2 phase 2a) (#1233)
Top-level `WHERE NOT (pattern)` shapes (e.g. `WHERE NOT (n)-[:R]->()`) now parse cleanly and lift into `WhereClause.predicates` as `WherePatternPredicate(negated=True)` entries instead of tripping the legacy "cannot yet be mixed with generic row expressions" E108. This is the AST half of slice 2; the runtime half (anti-semi-join lowering) ships in a follow-up sub-PR. Changes: - `ast.py`: `WherePatternPredicate.negated: bool = False` field. Default keeps existing single-positive / multi-positive callers unchanged. - `parser.py::_split_top_level_and_pattern_leaves`: add top-level `not(pattern_atom)` case that strips the NOT and emits the inner pattern as a negated leaf. Returns a 4-tuple `(positive, negated, others, has_nested)`. Patterns nested deeper (under OR/XOR or double-NOT) still trip the legacy E108 reject so slice 4 / De-Morgan- NOT compositions stay deferred. - `parser.py::_build_where_with_pattern_lift`: accept both positive and negated leaves; emit one `WherePatternPredicate` per leaf with the matching `negated` flag. - `ast_normalizer::_rewrite_where_pattern_predicates_to_matches`: partition into positive (appended MatchClause as before) vs negated (passes through to lowering). - `lowering.py`: `WherePatternPredicate` checks now distinguish the two cases. Negated raises a scoped "Cypher WHERE NOT (pattern) anti-semi-join lowering is not yet supported" pointing at the engine-half follow-up. Positive case unchanged. Tests: - `test_parser.py`: replaces `test_parse_rejects_mixed_where_pattern_predicates_as_unsupported` parametrize with two parametrize blocks — OR cases (still rejected, slice 4) and NOT cases (now lift, new test `test_parse_lifts_top_level_not_pattern_to_negated_predicate`). - `test_lowering.py`: splits the legacy `_failfast_rejects_unsupported_mixed_variable_length_where_pattern_predicates` test — drops NOT cases, adds `_failfast_rejects_negated_pattern_until_slice2_lowering` locking the precise new error message. Verified: - 1585/1585 GFQL tests pass. - mypy clean on all 4 touched cypher modules. Out of scope (engine half / phase 2b+): - Anti-semi-join runtime lowering itself. Path C (row-pipeline anti-join via Let + NotIn) recommended in `plans/1031-slices-2-3-4/findings/slice-2-scope.md`. Will land as a follow-up PR. - Bound-aliases NOT-pattern (`MATCH (a)-[:R]->(b) WHERE NOT (b)-[:R]-(a)`). AST plumbing already handles it via the negated flag; runtime needs the same engine work. - IC10 benchmark unblock. Composes the engine work above with row-NOT (already supported via expr_tree). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 4253ef6 commit e1707bf

7 files changed

Lines changed: 137 additions & 38 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
3030
- **Polars support**: `polars.DataFrame` and `polars.LazyFrame` now work in `plot()`, `materialize_nodes()`, `get_degrees()`, `get_indegrees()`, `get_outdegrees()`, and `hypergraph()`. Polars is an optional dependency — no behavior change when not installed. Upload path uses efficient Arrow conversion (`to_arrow()` with schema-metadata stripping and memoization); compute/hypergraph paths coerce to pandas at entry. `LazyFrame` is materialized via `.collect()` at each boundary. Adds `test_polars.py` with 17 tests; skips gracefully when polars is absent (#1133).
3131

3232
### Internal
33+
- **GFQL / Cypher parser + ast_normalizer — NOT-pattern AST plumbing (#1031 slice 2 phase 2a)**: Top-level `WHERE NOT (pattern)` shapes (e.g. `WHERE NOT (n)-[:R]->()`) now parse cleanly and lift into `WhereClause.predicates` as `WherePatternPredicate(negated=True)` entries instead of tripping the legacy "cannot yet be mixed with generic row expressions" E108. `_split_top_level_and_pattern_leaves` adds a top-level `not(pattern_atom)` case that strips the NOT and emits the inner pattern as a negated leaf; `_build_where_with_pattern_lift` accepts both positive and negated leaves and emits one `WherePatternPredicate` per leaf with the matching `negated` flag. ast_normalizer's `_rewrite_where_pattern_predicates_to_matches` partitions into positive (rewrites to appended MatchClause as before) vs negated (passes through to lowering). Lowering now distinguishes the two cases: positive `WherePatternPredicate` still raises "must be rewritten before lowering" (defensive — slice 3 already rewrites all positives in ast_normalizer); negated raises a scoped "Cypher WHERE NOT (pattern) anti-semi-join lowering is not yet supported" pointing the way for the engine half (path-C row-pipeline anti-join, see `plans/1031-slices-2-3-4/findings/slice-2-scope.md`). Adds `test_parse_lifts_top_level_not_pattern_to_negated_predicate` and `test_string_cypher_failfast_rejects_negated_pattern_until_slice2_lowering`. De Morgan compositions, OR-around-pattern, and double-NOT remain rejected at the lift step (slice 4 / future). Phase 2a only — runtime (anti-semi-join lowering) ships in a follow-up sub-PR (#1031).
3334
- **GFQL / Cypher row-boolean residual matrix + guardrails (#1219 hardening)**: Locks compositional row-boolean WHERE shapes that #1217's Earley swap admitted but its initial test surface didn't cover. Adds 11 native tests: nullable NOT/OR over a 4-row 3VL fixture (`NULL OR T = T`); N-ary OR (3 branches) + duplicate-branch companion isolating rightmost-drop associativity bugs; De Morgan equivalences (`NOT (A OR B)` ≡ `NOT A AND NOT B`; `NOT (A AND B)` ≡ `NOT A OR NOT B`) parametrized to assert both per-form expected rows AND the form-equivalence; double negation; XOR symmetric difference + XOR with NULL preserving 3VL; mixed-string-numeric AND inside OR exercising `_StringAllowingComparisonMixin` GT path; unit test locking `boolean_expr_to_text(BooleanExpr(op="pattern", ...))` round-trip for the (currently unreachable) defensive branch. Three docstring guardrails: `expr_split.split_top_level_and` documents AND-only intent + the `_combine_conjuncts` AND-recombine mechanism that makes a hypothetical `split_top_level_or` silently incorrect; `predicate_pushdown._split_conjuncts` mirrored guard naming the failure mode; `_boolean_expr_text.boolean_expr_to_text` explicit `op == "pattern"` branch with both unreachability paths documented. No production-code behavior change. Closes the residual-frontier portion of #1219; deeper compositional shapes beyond current fixtures remain tracked under that issue (#1219, #1227).
3435
- **GFQL / Cypher parser + ast_normalizer — multi-positive WHERE pattern predicates (#1031 slice 3)**: AND-joined positive WHERE pattern predicates (`WHERE (n)-[:R]->() AND (n)-[:T]->()`) now lift into structured `WhereClause.predicates` as N `WherePatternPredicate` entries. The ast_normalizer packs them into a single appended `MatchClause` whose `patterns: Tuple[Tuple[PatternElement, ...], ...]` carries one tuple per predicate (multi-pattern cartesian within MATCH), preserving the lowering invariant that only the FINAL match is connected — pre-binding seeds remain node-only. Per-predicate validation (must include a relationship; cannot introduce new aliases) runs independently before the lift. Removes the legacy `len(pattern_leaves) > 1` gate in `parser.py::_build_where_with_pattern_lift` and the corresponding gate in `ast_normalizer._rewrite_where_pattern_predicates_to_matches`. Refactors `pattern_atom` to split the greedy `WHERE_PATTERN` lexer token (which gobbles `pattern AND pattern AND ...` chains as a single match) back into individual pattern-item texts via `_WHERE_PATTERN_ITEM_RE.finditer` and emit one `BooleanExpr(op="pattern")` per item, joined by an AND-tree via `_rebuild_and_tree`. Adds `test_gfql_executes_multi_positive_where_pattern_predicates_as_intersected_seed` and updates the legacy rejection test to assert the new lift + compile shape. Closes #1031 slice 3 (#1031).
3536
- **GFQL / Cypher lowering**: Connected `MATCH + OPTIONAL MATCH` compilation now supports row-boolean `WHERE` expressions (`OR`/`NOT`/`XOR` and mixed row predicates) by carrying non-lowerable expressions into post-binding `where_rows(...)` filters for base and optional arms, preserving null-extension behavior while expanding supported disjunction shapes (#1219, #1224).

graphistry/compute/gfql/cypher/ast.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,10 @@ class WherePredicate:
140140
class WherePatternPredicate:
141141
pattern: Tuple[PatternElement, ...]
142142
span: SourceSpan
143+
# #1031 slice 2: True when lifted from a `WHERE NOT (...)` shape, signaling
144+
# anti-semi-join lowering instead of intersect-MATCH. Default False keeps
145+
# all existing single-positive / multi-positive callers unchanged.
146+
negated: bool = False
143147

144148

145149
WhereTerm = Union[WherePredicate, WherePatternPredicate]

graphistry/compute/gfql/cypher/ast_normalizer.py

Lines changed: 24 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -476,6 +476,13 @@ def _rewrite_where_pattern_predicates_to_matches(query: CypherQuery) -> CypherQu
476476
pattern_preds = [predicate for predicate in query.where.predicates if isinstance(predicate, WherePatternPredicate)]
477477
if not pattern_preds:
478478
return query
479+
# Slice 2 of #1031: WherePatternPredicate.negated stays in place — the
480+
# ast_normalizer rewriter only handles positive predicates (which compile
481+
# to MatchClause append). Negated predicates are passed through to
482+
# lowering, which emits an anti-semi-join row-pipeline step.
483+
positive_preds = [p for p in pattern_preds if not p.negated]
484+
if not positive_preds:
485+
return query
479486
# Slice 3 of #1031: support N positive pattern predicates by emitting one
480487
# appended ``MatchClause`` per predicate. Each predicate is independently
481488
# validated (must include a relationship; cannot introduce new aliases).
@@ -486,12 +493,13 @@ def _rewrite_where_pattern_predicates_to_matches(query: CypherQuery) -> CypherQu
486493
for element in pattern
487494
if getattr(element, "variable", None) is not None
488495
}
489-
# Validate every pattern; collect into a single appended MatchClause whose
490-
# ``patterns`` tuple holds N patterns (multi-positive WHERE pattern via
491-
# cartesian-style multi-pattern MATCH; #1031 slice 3). Packing into one
492-
# MatchClause (rather than N appended MatchClauses) preserves the lowering
493-
# invariant that only the FINAL match is connected — seeds remain node-only.
494-
for pred in pattern_preds:
496+
# Validate every positive pattern; collect into a single appended
497+
# MatchClause whose ``patterns`` tuple holds N patterns (multi-positive
498+
# WHERE pattern via cartesian-style multi-pattern MATCH; #1031 slice 3).
499+
# Packing into one MatchClause (rather than N appended MatchClauses)
500+
# preserves the lowering invariant that only the FINAL match is
501+
# connected — seeds remain node-only.
502+
for pred in positive_preds:
495503
if len(pred.pattern) < 3:
496504
raise _unsupported(
497505
"Cypher WHERE pattern predicates must include a relationship",
@@ -514,16 +522,21 @@ def _rewrite_where_pattern_predicates_to_matches(query: CypherQuery) -> CypherQu
514522
column=pred.span.column,
515523
)
516524

517-
first = pattern_preds[0]
525+
first = positive_preds[0]
518526
extra_match = MatchClause(
519-
patterns=tuple(pred.pattern for pred in pattern_preds),
527+
patterns=tuple(pred.pattern for pred in positive_preds),
520528
span=first.span,
521529
optional=False,
522-
pattern_aliases=tuple(None for _ in pattern_preds),
523-
pattern_alias_kinds=tuple("pattern" for _ in pattern_preds),
530+
pattern_aliases=tuple(None for _ in positive_preds),
531+
pattern_alias_kinds=tuple("pattern" for _ in positive_preds),
524532
)
525533

526-
remaining = tuple(predicate for predicate in query.where.predicates if not isinstance(predicate, WherePatternPredicate))
534+
# Keep negated WherePatternPredicates in `remaining` so lowering sees them
535+
# for anti-semi-join emission (#1031 slice 2).
536+
remaining = tuple(
537+
predicate for predicate in query.where.predicates
538+
if (not isinstance(predicate, WherePatternPredicate)) or predicate.negated
539+
)
527540
remaining_where = None
528541
if remaining or query.where.expr_tree is not None:
529542
remaining_where = WhereClause(

graphistry/compute/gfql/cypher/lowering.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6389,6 +6389,18 @@ def lower_match_query(
63896389
)
63906390
for predicate in query.where.predicates:
63916391
if isinstance(predicate, WherePatternPredicate):
6392+
if predicate.negated:
6393+
# #1031 slice 2: NOT-pattern parses + survives ast_normalizer
6394+
# but lowering for anti-semi-join is not yet implemented.
6395+
# See plans/1031-slices-2-3-4/findings/slice-2-scope.md
6396+
# for the path-C (row-pipeline anti-join) approach.
6397+
raise _unsupported(
6398+
"Cypher WHERE NOT (pattern) anti-semi-join lowering is not yet supported",
6399+
field="where",
6400+
value=None,
6401+
line=predicate.span.line,
6402+
column=predicate.span.column,
6403+
)
63926404
raise _unsupported(
63936405
"Cypher WHERE pattern predicates must be rewritten before lowering",
63946406
field="where",
@@ -8198,6 +8210,14 @@ def _apply_where_to_ops(
81988210
row_expr_filters.append(rewritten)
81998211
for predicate in where.predicates:
82008212
if isinstance(predicate, WherePatternPredicate):
8213+
if predicate.negated:
8214+
raise _unsupported(
8215+
"Cypher WHERE NOT (pattern) anti-semi-join lowering is not yet supported",
8216+
field="where",
8217+
value=None,
8218+
line=predicate.span.line,
8219+
column=predicate.span.column,
8220+
)
82018221
raise _unsupported(
82028222
"Cypher WHERE pattern predicates must be rewritten before lowering",
82038223
field="where",

graphistry/compute/gfql/cypher/parser.py

Lines changed: 46 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -347,17 +347,37 @@ def _lift_label_only_and_spine(
347347

348348
def _split_top_level_and_pattern_leaves(
349349
expr: BooleanExpr,
350-
) -> Tuple[List[BooleanExpr], List[BooleanExpr], bool]:
351-
"""Split *expr* at top-level AND boundaries into (patterns, others, has_nested_pattern)."""
350+
) -> Tuple[List[BooleanExpr], List[BooleanExpr], List[BooleanExpr], bool]:
351+
"""Split *expr* at top-level AND boundaries.
352+
353+
Returns ``(positive_patterns, negated_patterns, others, has_nested_pattern)``.
354+
355+
- ``positive_patterns``: leaf ``BooleanExpr(op="pattern")`` nodes at top-level
356+
AND positions. Lifted to ``WherePatternPredicate(negated=False)``.
357+
- ``negated_patterns``: leaf ``BooleanExpr(op="pattern")`` nodes wrapped in a
358+
single top-level ``not`` (i.e. ``NOT (n)-[:R]->()``). Lifted to
359+
``WherePatternPredicate(negated=True)`` for #1031 slice 2 anti-semi-join
360+
lowering. Returns the inner ``pattern`` leaf (the ``not`` is consumed).
361+
- ``others``: non-pattern conjuncts that should remain in ``expr_tree``.
362+
- ``has_nested_pattern``: True when a pattern atom appears in a deeper
363+
non-AND/non-direct-NOT context (e.g. ``OR`` with a pattern leaf, or
364+
``NOT (and-tree-of-patterns)``). Triggers the legacy E108 reject so
365+
slice 4 / De-Morgan-NOT compositions stay deferred.
366+
"""
352367
if expr.op == "and":
353368
if expr.left is None or expr.right is None:
354-
return [], [expr], False
355-
left_pat, left_other, left_bad = _split_top_level_and_pattern_leaves(expr.left)
356-
right_pat, right_other, right_bad = _split_top_level_and_pattern_leaves(expr.right)
357-
return left_pat + right_pat, left_other + right_other, left_bad or right_bad
369+
return [], [], [expr], False
370+
l_pos, l_neg, l_oth, l_bad = _split_top_level_and_pattern_leaves(expr.left)
371+
r_pos, r_neg, r_oth, r_bad = _split_top_level_and_pattern_leaves(expr.right)
372+
return l_pos + r_pos, l_neg + r_neg, l_oth + r_oth, l_bad or r_bad
358373
if expr.op == "pattern":
359-
return [expr], [], False
360-
return [], [expr], _has_pattern_descendant(expr)
374+
return [expr], [], [], False
375+
if expr.op == "not" and expr.left is not None and expr.left.op == "pattern":
376+
# `WHERE NOT (pattern)` — slice 2 anti-semi-join target. Strip the NOT
377+
# and emit the inner pattern leaf as a negated pattern. No nested
378+
# pattern; rest of the tree continues structural traversal as usual.
379+
return [], [expr.left], [], False
380+
return [], [], [expr], _has_pattern_descendant(expr)
361381

362382

363383
def _has_pattern_descendant(expr: BooleanExpr) -> bool:
@@ -394,6 +414,7 @@ def _rebuild_and_tree(conjuncts: List[BooleanExpr]) -> Optional[BooleanExpr]:
394414
def _build_where_with_pattern_lift(
395415
*,
396416
pattern_leaves: List[BooleanExpr],
417+
negated_pattern_leaves: List[BooleanExpr],
397418
other_conjuncts: List[BooleanExpr],
398419
nested_pattern: bool,
399420
expr_text: str,
@@ -410,13 +431,18 @@ def _build_where_with_pattern_lift(
410431
column=span.column,
411432
language="cypher",
412433
)
413-
# Multi-positive (slice 3 of #1031): emit one ``WherePatternPredicate`` per
414-
# leaf; downstream ``_rewrite_where_pattern_predicates_to_matches`` lifts
415-
# each into its own appended ``MatchClause``.
434+
# Slice 3 (#1031): N positive patterns each become a WherePatternPredicate
435+
# (negated=False). Slice 2 (#1031): N NOT-patterns each become a
436+
# WherePatternPredicate (negated=True) for downstream anti-semi-join
437+
# lowering. Both groups travel together in WhereClause.predicates;
438+
# ast_normalizer dispatches by the negated flag.
416439
pattern_preds: List[WherePatternPredicate] = []
417440
for leaf in pattern_leaves:
418441
assert leaf.pattern is not None, "pattern_atom invariant: pattern payload always set"
419-
pattern_preds.append(WherePatternPredicate(pattern=leaf.pattern, span=leaf.span))
442+
pattern_preds.append(WherePatternPredicate(pattern=leaf.pattern, span=leaf.span, negated=False))
443+
for leaf in negated_pattern_leaves:
444+
assert leaf.pattern is not None, "pattern_atom invariant: pattern payload always set"
445+
pattern_preds.append(WherePatternPredicate(pattern=leaf.pattern, span=leaf.span, negated=True))
420446
new_expr_tree = _rebuild_and_tree(other_conjuncts)
421447
if new_expr_tree is None:
422448
return WhereClause(predicates=tuple(pattern_preds), expr_tree=None, span=span)
@@ -1199,14 +1225,15 @@ def generic_where_clause(self, meta: Any, items: Sequence[Any]) -> WhereClause:
11991225
# top-level AND positions; extract them as
12001226
# ``WherePatternPredicate`` entries so the existing AST-
12011227
# normalizer step (``_rewrite_where_pattern_predicates_to_matches``)
1202-
# can lift them to a separate ``MatchClause`` later. Pattern
1203-
# leaves nested under non-AND ops (NOT/OR/XOR) and multiple
1204-
# positive patterns are rejected here with shape-specific
1205-
# E108 errors — slice 2/3/4 territory.
1206-
pattern_leaves, other_conjuncts, nested_pattern = _split_top_level_and_pattern_leaves(expr_tree)
1207-
if pattern_leaves or nested_pattern:
1228+
# can lift them. Slice 2 (#1031): top-level ``NOT (pattern)``
1229+
# leaves are also lifted, marked ``negated=True`` for anti-semi-
1230+
# join lowering. Patterns nested deeper (under OR/XOR or
1231+
# double-NOT) trip the legacy E108 reject.
1232+
pos_leaves, neg_leaves, other_conjuncts, nested_pattern = _split_top_level_and_pattern_leaves(expr_tree)
1233+
if pos_leaves or neg_leaves or nested_pattern:
12081234
return _build_where_with_pattern_lift(
1209-
pattern_leaves=pattern_leaves,
1235+
pattern_leaves=pos_leaves,
1236+
negated_pattern_leaves=neg_leaves,
12101237
other_conjuncts=other_conjuncts,
12111238
nested_pattern=nested_pattern,
12121239
expr_text=expr_text,

graphistry/tests/compute/gfql/cypher/test_lowering.py

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5145,7 +5145,6 @@ def test_string_cypher_executes_where_pattern_predicate_and_expr_mix(
51455145
"query",
51465146
[
51475147
"MATCH (n) WHERE (n)-[:R*]->() OR n.id = 'z' RETURN n",
5148-
"MATCH (n) WHERE NOT (n)-[:R*]->() RETURN n",
51495148
],
51505149
)
51515150
def test_string_cypher_failfast_rejects_unsupported_mixed_variable_length_where_pattern_predicates(query: str) -> None:
@@ -5158,6 +5157,27 @@ def test_string_cypher_failfast_rejects_unsupported_mixed_variable_length_where_
51585157
assert "mixed with generic row expressions" in exc_info.value.message
51595158

51605159

5160+
@pytest.mark.parametrize(
5161+
"query",
5162+
[
5163+
"MATCH (n) WHERE NOT (n)-[:R*]->() RETURN n",
5164+
"MATCH (n) WHERE NOT (n)-[:R]->() RETURN n",
5165+
],
5166+
)
5167+
def test_string_cypher_failfast_rejects_negated_pattern_until_slice2_lowering(query: str) -> None:
5168+
# #1031 slice 2 plumbing: parser lifts NOT-pattern to
5169+
# ``WherePatternPredicate(negated=True)``; lowering raises a scoped error
5170+
# until the anti-semi-join lowering lands. Locks the precise message
5171+
# so future engine work knows where to plug in.
5172+
graph = _mk_empty_graph()
5173+
5174+
with pytest.raises(GFQLValidationError) as exc_info:
5175+
graph.gfql(query)
5176+
5177+
assert exc_info.value.code == ErrorCode.E108
5178+
assert "anti-semi-join" in exc_info.value.message
5179+
5180+
51615181
def test_string_cypher_failfast_rejects_multi_alias_return_star_projection() -> None:
51625182
graph = _mk_empty_graph()
51635183

0 commit comments

Comments
 (0)