Skip to content

Commit bd420d6

Browse files
authored
feat(gfql/cypher): execute WHERE NOT (pattern) via anti-semi row filtering (#1031 slice 2 phase 2b) (#1238)
* feat(gfql/cypher): execute WHERE NOT pattern via anti-semi row filtering (#1031 #1235) * test(gfql/cypher): amplify OPTIONAL MATCH NOT-pattern coverage (#1235)
1 parent e1707bf commit bd420d6

6 files changed

Lines changed: 308 additions & 39 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
3030
- **Polars support**: `polars.DataFrame` and `polars.LazyFrame` now work in `plot()`, `materialize_nodes()`, `get_degrees()`, `get_indegrees()`, `get_outdegrees()`, and `hypergraph()`. Polars is an optional dependency — no behavior change when not installed. Upload path uses efficient Arrow conversion (`to_arrow()` with schema-metadata stripping and memoization); compute/hypergraph paths coerce to pandas at entry. `LazyFrame` is materialized via `.collect()` at each boundary. Adds `test_polars.py` with 17 tests; skips gracefully when polars is absent (#1133).
3131

3232
### Internal
33+
- **GFQL / Cypher lowering — WHERE NOT (pattern) anti-semi execution (#1031 slice 2 phase 2b)**: `WherePatternPredicate(negated=True)` no longer hard-fails at lowering. The compiler now emits a row pre-filter call (`anti_semi_apply`) for negated pattern predicates in both general MATCH lowering and connected OPTIONAL MATCH clause lowering, with per-predicate validation (must include relationship, must not introduce new aliases, must share bound aliases). Added `anti_semi_apply` row-pipeline runtime operation + call-safelist entry, plus row-table base-graph context preservation needed for correlated bindings execution. New tests cover compile shape (`row_pre_filters` emission) and runtime behavior for `MATCH (n) WHERE NOT (n)-[:R]->()` (including mixed row-expression + NOT-pattern filtering), plus connected `OPTIONAL MATCH ... WHERE NOT (pattern)` filtering/null-fill semantics. Full `cypher/test_lowering.py` suite passes (756 passed / 66 skipped) and touched-module mypy is clean.
3334
- **GFQL / Cypher parser + ast_normalizer — NOT-pattern AST plumbing (#1031 slice 2 phase 2a)**: Top-level `WHERE NOT (pattern)` shapes (e.g. `WHERE NOT (n)-[:R]->()`) now parse cleanly and lift into `WhereClause.predicates` as `WherePatternPredicate(negated=True)` entries instead of tripping the legacy "cannot yet be mixed with generic row expressions" E108. `_split_top_level_and_pattern_leaves` adds a top-level `not(pattern_atom)` case that strips the NOT and emits the inner pattern as a negated leaf; `_build_where_with_pattern_lift` accepts both positive and negated leaves and emits one `WherePatternPredicate` per leaf with the matching `negated` flag. ast_normalizer's `_rewrite_where_pattern_predicates_to_matches` partitions into positive (rewrites to appended MatchClause as before) vs negated (passes through to lowering). Lowering now distinguishes the two cases: positive `WherePatternPredicate` still raises "must be rewritten before lowering" (defensive — slice 3 already rewrites all positives in ast_normalizer); negated raises a scoped "Cypher WHERE NOT (pattern) anti-semi-join lowering is not yet supported" pointing the way for the engine half (path-C row-pipeline anti-join, see `plans/1031-slices-2-3-4/findings/slice-2-scope.md`). Adds `test_parse_lifts_top_level_not_pattern_to_negated_predicate` and `test_string_cypher_failfast_rejects_negated_pattern_until_slice2_lowering`. De Morgan compositions, OR-around-pattern, and double-NOT remain rejected at the lift step (slice 4 / future). Phase 2a only — runtime (anti-semi-join lowering) ships in a follow-up sub-PR (#1031).
3435
- **GFQL / Cypher row-boolean residual matrix + guardrails (#1219 hardening)**: Locks compositional row-boolean WHERE shapes that #1217's Earley swap admitted but its initial test surface didn't cover. Adds 11 native tests: nullable NOT/OR over a 4-row 3VL fixture (`NULL OR T = T`); N-ary OR (3 branches) + duplicate-branch companion isolating rightmost-drop associativity bugs; De Morgan equivalences (`NOT (A OR B)` ≡ `NOT A AND NOT B`; `NOT (A AND B)` ≡ `NOT A OR NOT B`) parametrized to assert both per-form expected rows AND the form-equivalence; double negation; XOR symmetric difference + XOR with NULL preserving 3VL; mixed-string-numeric AND inside OR exercising `_StringAllowingComparisonMixin` GT path; unit test locking `boolean_expr_to_text(BooleanExpr(op="pattern", ...))` round-trip for the (currently unreachable) defensive branch. Three docstring guardrails: `expr_split.split_top_level_and` documents AND-only intent + the `_combine_conjuncts` AND-recombine mechanism that makes a hypothetical `split_top_level_or` silently incorrect; `predicate_pushdown._split_conjuncts` mirrored guard naming the failure mode; `_boolean_expr_text.boolean_expr_to_text` explicit `op == "pattern"` branch with both unreachability paths documented. No production-code behavior change. Closes the residual-frontier portion of #1219; deeper compositional shapes beyond current fixtures remain tracked under that issue (#1219, #1227).
3536
- **GFQL / Cypher parser + ast_normalizer — multi-positive WHERE pattern predicates (#1031 slice 3)**: AND-joined positive WHERE pattern predicates (`WHERE (n)-[:R]->() AND (n)-[:T]->()`) now lift into structured `WhereClause.predicates` as N `WherePatternPredicate` entries. The ast_normalizer packs them into a single appended `MatchClause` whose `patterns: Tuple[Tuple[PatternElement, ...], ...]` carries one tuple per predicate (multi-pattern cartesian within MATCH), preserving the lowering invariant that only the FINAL match is connected — pre-binding seeds remain node-only. Per-predicate validation (must include a relationship; cannot introduce new aliases) runs independently before the lift. Removes the legacy `len(pattern_leaves) > 1` gate in `parser.py::_build_where_with_pattern_lift` and the corresponding gate in `ast_normalizer._rewrite_where_pattern_predicates_to_matches`. Refactors `pattern_atom` to split the greedy `WHERE_PATTERN` lexer token (which gobbles `pattern AND pattern AND ...` chains as a single match) back into individual pattern-item texts via `_WHERE_PATTERN_ITEM_RE.finditer` and emit one `BooleanExpr(op="pattern")` per item, joined by an AND-tree via `_rebuild_and_tree`. Adds `test_gfql_executes_multi_positive_where_pattern_predicates_as_intersected_seed` and updates the legacy rejection test to assert the new lift + compile shape. Closes #1031 slice 3 (#1031).

graphistry/compute/ast.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1745,6 +1745,25 @@ def where_rows(
17451745
return ASTCall("where_rows", params)
17461746

17471747

1748+
def anti_semi_apply(
1749+
*,
1750+
binding_ops: List[Dict[str, Any]],
1751+
join_aliases: Sequence[str],
1752+
) -> ASTCall:
1753+
"""Filter active rows by removing rows matching a correlated pattern.
1754+
1755+
``binding_ops`` encodes the pattern to evaluate as bindings rows.
1756+
``join_aliases`` names shared aliases used as anti-join keys.
1757+
"""
1758+
return ASTCall(
1759+
"anti_semi_apply",
1760+
{
1761+
"binding_ops": binding_ops,
1762+
"join_aliases": list(join_aliases),
1763+
},
1764+
)
1765+
1766+
17481767
def order_by(keys: Iterable[Tuple[Any, str]]) -> ASTCall:
17491768
"""Create an ORDER BY operation for GFQL row pipelines."""
17501769
return ASTCall("order_by", {"keys": list(keys)})

graphistry/compute/gfql/call/validation.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,17 @@ def _group_by_requires_node_cols(params: Dict[str, object]) -> List[str]:
260260
schema_effects=_schema_effects(requires_node_cols=_where_rows_requires_node_cols),
261261
),
262262

263+
'anti_semi_apply': _method_entry(
264+
allowed_params={'binding_ops', 'join_aliases'},
265+
required_params={'binding_ops', 'join_aliases'},
266+
param_validators={
267+
'binding_ops': is_list_of_dicts,
268+
'join_aliases': is_non_empty_list_of_strings,
269+
},
270+
description='Filter active rows by anti-semi joining against correlated binding rows',
271+
schema_effects=NO_SCHEMA_EFFECTS,
272+
),
273+
263274
'order_by': _method_entry(
264275
allowed_params={'keys'},
265276
required_params={'keys'},

graphistry/compute/gfql/cypher/lowering.py

Lines changed: 117 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
ASTEdge,
1313
ASTObject,
1414
ASTNode,
15+
anti_semi_apply,
1516
distinct,
1617
drop_cols,
1718
e_forward,
@@ -125,6 +126,7 @@ class LoweredCypherMatch:
125126
query: List[ASTObject]
126127
where: List[WhereComparison]
127128
row_where: Optional[ExpressionText] = None
129+
row_pre_filters: Tuple[ASTCall, ...] = ()
128130

129131

130132
_CYPHER_INT64_MIN = -(2**63)
@@ -4466,6 +4468,9 @@ def _append_match_row_where(
44664468
allowed_match_aliases: Optional[AbstractSet[str]],
44674469
params: Optional[Mapping[str, Any]],
44684470
) -> None:
4471+
if lowered.row_pre_filters:
4472+
row_steps.extend(lowered.row_pre_filters)
4473+
44694474
expr = lowered.row_where
44704475
if expr is None:
44714476
return
@@ -4544,7 +4549,7 @@ def _lower_projection_chain(
45444549
if plan.all_source_aliases is not None
45454550
else binding_row_aliases
45464551
)
4547-
if plan.all_source_aliases is not None or binding_row_aliases:
4552+
if plan.all_source_aliases is not None or binding_row_aliases or lowered.row_pre_filters:
45484553
row_steps: List[ASTObject] = [rows(binding_ops=serialize_binding_ops(lowered.query))]
45494554
else:
45504555
row_steps = [rows(table=plan.table, source=plan.source_alias)]
@@ -4573,7 +4578,7 @@ def _lower_projection_chain(
45734578
)
45744579
)
45754580
_append_page_ops(row_steps, query=query, params=params)
4576-
if binding_row_aliases:
4581+
if binding_row_aliases or lowered.row_pre_filters:
45774582
return row_steps
45784583
return lowered.query + row_steps
45794584

@@ -4635,7 +4640,7 @@ def _build_initial_row_scope(
46354640
if active_match_alias is None:
46364641
row_steps: List[ASTObject] = [rows(table="nodes")]
46374642
scope_mode: Literal["match_alias", "row_columns"] = "row_columns"
4638-
elif binding_row_aliases:
4643+
elif binding_row_aliases or lowered.row_pre_filters:
46394644
row_steps = [rows(binding_ops=serialize_binding_ops(lowered.query))]
46404645
scope_mode = "match_alias"
46414646
else:
@@ -6332,8 +6337,82 @@ def _check_projection_clause(clause: ReturnClause) -> None:
63326337
item.expression.text,
63336338
field="order_by",
63346339
line=item.span.line,
6335-
column=item.span.column,
6336-
)
6340+
column=item.span.column,
6341+
)
6342+
6343+
6344+
def _predicate_pattern_aliases(predicate: WherePatternPredicate) -> List[str]:
6345+
aliases: List[str] = []
6346+
seen: Set[str] = set()
6347+
for element in predicate.pattern:
6348+
alias = getattr(element, "variable", None)
6349+
if alias is None:
6350+
continue
6351+
alias_name = cast(str, alias)
6352+
if alias_name in seen:
6353+
continue
6354+
seen.add(alias_name)
6355+
aliases.append(alias_name)
6356+
return aliases
6357+
6358+
6359+
def _lower_negated_pattern_predicate_to_row_filter(
6360+
predicate: WherePatternPredicate,
6361+
*,
6362+
alias_targets: Mapping[str, ASTObject],
6363+
params: Optional[Mapping[str, Any]],
6364+
) -> ASTCall:
6365+
if len(predicate.pattern) < 3:
6366+
raise _unsupported(
6367+
"Cypher WHERE pattern predicates must include a relationship",
6368+
field="where",
6369+
value=None,
6370+
line=predicate.span.line,
6371+
column=predicate.span.column,
6372+
)
6373+
6374+
predicate_aliases = _predicate_pattern_aliases(predicate)
6375+
if not predicate_aliases:
6376+
raise _unsupported(
6377+
"Cypher WHERE NOT (pattern) currently requires at least one shared bound alias",
6378+
field="where",
6379+
value=None,
6380+
line=predicate.span.line,
6381+
column=predicate.span.column,
6382+
)
6383+
6384+
introduced_aliases = sorted(alias for alias in predicate_aliases if alias not in alias_targets)
6385+
if introduced_aliases:
6386+
raise _unsupported(
6387+
"Cypher WHERE pattern predicates cannot introduce new aliases in this phase",
6388+
field="where",
6389+
value=introduced_aliases,
6390+
line=predicate.span.line,
6391+
column=predicate.span.column,
6392+
)
6393+
6394+
shared_aliases = [alias for alias in predicate_aliases if alias in alias_targets]
6395+
if not shared_aliases:
6396+
raise _unsupported(
6397+
"Cypher WHERE NOT (pattern) currently requires at least one shared bound alias",
6398+
field="where",
6399+
value=predicate_aliases,
6400+
line=predicate.span.line,
6401+
column=predicate.span.column,
6402+
)
6403+
6404+
pattern_clause = MatchClause(
6405+
patterns=(predicate.pattern,),
6406+
span=predicate.span,
6407+
optional=False,
6408+
pattern_aliases=(None,),
6409+
pattern_alias_kinds=("pattern",),
6410+
)
6411+
pattern_ops = lower_match_clause(pattern_clause, params=params)
6412+
return anti_semi_apply(
6413+
binding_ops=serialize_binding_ops(pattern_ops),
6414+
join_aliases=shared_aliases,
6415+
)
63376416

63386417

63396418
def lower_match_query(
@@ -6365,6 +6444,7 @@ def lower_match_query(
63656444
params=params,
63666445
)
63676446
where_out.extend(dynamic_where_out)
6447+
row_pre_filters: List[ASTCall] = []
63686448

63696449
row_where: Optional[ExpressionText] = None
63706450
row_where_predicates: List[str] = list(dynamic_row_where_predicates)
@@ -6390,17 +6470,14 @@ def lower_match_query(
63906470
for predicate in query.where.predicates:
63916471
if isinstance(predicate, WherePatternPredicate):
63926472
if predicate.negated:
6393-
# #1031 slice 2: NOT-pattern parses + survives ast_normalizer
6394-
# but lowering for anti-semi-join is not yet implemented.
6395-
# See plans/1031-slices-2-3-4/findings/slice-2-scope.md
6396-
# for the path-C (row-pipeline anti-join) approach.
6397-
raise _unsupported(
6398-
"Cypher WHERE NOT (pattern) anti-semi-join lowering is not yet supported",
6399-
field="where",
6400-
value=None,
6401-
line=predicate.span.line,
6402-
column=predicate.span.column,
6473+
row_pre_filters.append(
6474+
_lower_negated_pattern_predicate_to_row_filter(
6475+
predicate,
6476+
alias_targets=alias_targets,
6477+
params=params,
6478+
)
64036479
)
6480+
continue
64046481
raise _unsupported(
64056482
"Cypher WHERE pattern predicates must be rewritten before lowering",
64066483
field="where",
@@ -6439,7 +6516,12 @@ def lower_match_query(
64396516
span=query.where.span if query.where is not None else merged_match.span,
64406517
)
64416518

6442-
return LoweredCypherMatch(query=ops, where=where_out, row_where=row_where)
6519+
return LoweredCypherMatch(
6520+
query=ops,
6521+
where=where_out,
6522+
row_where=row_where,
6523+
row_pre_filters=tuple(row_pre_filters),
6524+
)
64436525

64446526

64456527
def _fresh_temp_name(existing: Set[str], prefix: str) -> str:
@@ -8174,7 +8256,7 @@ def _apply_where_to_ops(
81748256
alias_targets: Dict[str, ASTObject],
81758257
*,
81768258
params: Optional[Mapping[str, Any]],
8177-
) -> Tuple[List[WhereComparison], List[ExpressionText]]:
8259+
) -> Tuple[List[WhereComparison], List[ExpressionText], List[ASTCall]]:
81788260
"""Apply a WHERE clause's predicates to already-lowered ops.
81798261
81808262
Label predicates mutate the ASTNode filter in *alias_targets* (in-place).
@@ -8185,8 +8267,9 @@ def _apply_where_to_ops(
81858267
"""
81868268
where_out: List[WhereComparison] = []
81878269
row_expr_filters: List[ExpressionText] = []
8270+
row_pre_filters: List[ASTCall] = []
81888271
if where is None:
8189-
return where_out, row_expr_filters
8272+
return where_out, row_expr_filters, row_pre_filters
81908273
where_expr = _where_clause_expr_text(where)
81918274
if where_expr is not None:
81928275
type_where = _extract_relationship_type_where(
@@ -8211,13 +8294,14 @@ def _apply_where_to_ops(
82118294
for predicate in where.predicates:
82128295
if isinstance(predicate, WherePatternPredicate):
82138296
if predicate.negated:
8214-
raise _unsupported(
8215-
"Cypher WHERE NOT (pattern) anti-semi-join lowering is not yet supported",
8216-
field="where",
8217-
value=None,
8218-
line=predicate.span.line,
8219-
column=predicate.span.column,
8297+
row_pre_filters.append(
8298+
_lower_negated_pattern_predicate_to_row_filter(
8299+
predicate,
8300+
alias_targets=alias_targets,
8301+
params=params,
8302+
)
82208303
)
8304+
continue
82218305
raise _unsupported(
82228306
"Cypher WHERE pattern predicates must be rewritten before lowering",
82238307
field="where",
@@ -8256,7 +8340,7 @@ def _apply_where_to_ops(
82568340
right=cast(Optional[CypherLiteral], predicate.right),
82578341
params=params,
82588342
)
8259-
return where_out, row_expr_filters
8343+
return where_out, row_expr_filters, row_pre_filters
82608344

82618345

82628346
def _compile_connected_optional_match(
@@ -8275,8 +8359,13 @@ def _compile_connected_optional_match(
82758359
base_ops = lower_match_clause(base_clause, params=params)
82768360
base_alias_targets = _alias_target(base_ops)
82778361
base_aliases = _match_clause_aliases(base_clause)
8278-
base_where, base_row_expr_filters = _apply_where_to_ops(base_clause.where, base_alias_targets, params=params)
8362+
base_where, base_row_expr_filters, base_row_pre_filters = _apply_where_to_ops(
8363+
base_clause.where,
8364+
base_alias_targets,
8365+
params=params,
8366+
)
82798367
base_chain_ops: List[ASTObject] = list(base_ops)
8368+
base_chain_ops.extend(base_row_pre_filters)
82808369
for expr in base_row_expr_filters:
82818370
base_chain_ops.append(
82828371
where_rows(
@@ -8326,12 +8415,13 @@ def _compile_connected_optional_match(
83268415
column=opt_clause.span.column,
83278416
)
83288417

8329-
opt_where, opt_row_expr_filters = _apply_where_to_ops(
8418+
opt_where, opt_row_expr_filters, opt_row_pre_filters = _apply_where_to_ops(
83308419
opt_clause.where,
83318420
opt_alias_targets,
83328421
params=params,
83338422
)
83348423
opt_chain_ops: List[ASTObject] = list(opt_ops)
8424+
opt_chain_ops.extend(opt_row_pre_filters)
83358425
for expr in opt_row_expr_filters:
83368426
opt_chain_ops.append(
83378427
where_rows(

0 commit comments

Comments
 (0)