Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
- **Polars support**: `polars.DataFrame` and `polars.LazyFrame` now work in `plot()`, `materialize_nodes()`, `get_degrees()`, `get_indegrees()`, `get_outdegrees()`, and `hypergraph()`. Polars is an optional dependency — no behavior change when not installed. Upload path uses efficient Arrow conversion (`to_arrow()` with schema-metadata stripping and memoization); compute/hypergraph paths coerce to pandas at entry. `LazyFrame` is materialized via `.collect()` at each boundary. Adds `test_polars.py` with 17 tests; skips gracefully when polars is absent (#1133).

### Internal
- **GFQL / Cypher residual cleanup (#1226, #1219)**: Retired `graphistry/compute/gfql/expr_split.py` by migrating top-level-AND splitting to `predicate_pushdown.py` (`split_top_level_and_conjuncts`) and updating conformance coverage to target the migrated splitter path. Fixed primitive literal atom fallback text in parser boolean-tree construction so `atom_text` now preserves Cypher keyword casing (`true`/`false`/`null`) instead of Python casing. Tightened optional-arm pushdown null-safety guardrails by treating OR-compound predicates as conservatively null-rejecting (disjunct-level alias analysis remains out of scope), with new regression coverage for mixed-alias `IS NOT NULL OR ...` forms.
- **GFQL / Cypher lowering — WHERE NOT (pattern) anti-semi execution (#1031 slice 2 phase 2b)**: `WherePatternPredicate(negated=True)` no longer hard-fails at lowering. The compiler now emits a row pre-filter call (`anti_semi_apply`) for negated pattern predicates in both general MATCH lowering and connected OPTIONAL MATCH clause lowering, with per-predicate validation (must include relationship, must not introduce new aliases, must share bound aliases). Added `anti_semi_apply` row-pipeline runtime operation + call-safelist entry, plus row-table base-graph context preservation needed for correlated bindings execution. New tests cover compile shape (`row_pre_filters` emission) and runtime behavior for `MATCH (n) WHERE NOT (n)-[:R]->()` (including mixed row-expression + NOT-pattern filtering), plus connected `OPTIONAL MATCH ... WHERE NOT (pattern)` filtering/null-fill semantics. Full `cypher/test_lowering.py` suite passes (756 passed / 66 skipped) and touched-module mypy is clean.
- **GFQL / Cypher parser + ast_normalizer — NOT-pattern AST plumbing (#1031 slice 2 phase 2a)**: Top-level `WHERE NOT (pattern)` shapes (e.g. `WHERE NOT (n)-[:R]->()`) now parse cleanly and lift into `WhereClause.predicates` as `WherePatternPredicate(negated=True)` entries instead of tripping the legacy "cannot yet be mixed with generic row expressions" E108. `_split_top_level_and_pattern_leaves` adds a top-level `not(pattern_atom)` case that strips the NOT and emits the inner pattern as a negated leaf; `_build_where_with_pattern_lift` accepts both positive and negated leaves and emits one `WherePatternPredicate` per leaf with the matching `negated` flag. ast_normalizer's `_rewrite_where_pattern_predicates_to_matches` partitions into positive (rewrites to appended MatchClause as before) vs negated (passes through to lowering). Lowering now distinguishes the two cases: positive `WherePatternPredicate` still raises "must be rewritten before lowering" (defensive — slice 3 already rewrites all positives in ast_normalizer); negated raises a scoped "Cypher WHERE NOT (pattern) anti-semi-join lowering is not yet supported" pointing the way for the engine half (path-C row-pipeline anti-join, see `plans/1031-slices-2-3-4/findings/slice-2-scope.md`). Adds `test_parse_lifts_top_level_not_pattern_to_negated_predicate` and `test_string_cypher_failfast_rejects_negated_pattern_until_slice2_lowering`. De Morgan compositions, OR-around-pattern, and double-NOT remain rejected at the lift step (slice 4 / future). Phase 2a only — runtime (anti-semi-join lowering) ships in a follow-up sub-PR (#1031).
- **GFQL / Cypher row-boolean residual matrix + guardrails (#1219 hardening)**: Locks compositional row-boolean WHERE shapes that #1217's Earley swap admitted but its initial test surface didn't cover. Adds 11 native tests: nullable NOT/OR over a 4-row 3VL fixture (`NULL OR T = T`); N-ary OR (3 branches) + duplicate-branch companion isolating rightmost-drop associativity bugs; De Morgan equivalences (`NOT (A OR B)` ≡ `NOT A AND NOT B`; `NOT (A AND B)` ≡ `NOT A OR NOT B`) parametrized to assert both per-form expected rows AND the form-equivalence; double negation; XOR symmetric difference + XOR with NULL preserving 3VL; mixed-string-numeric AND inside OR exercising `_StringAllowingComparisonMixin` GT path; unit test locking `boolean_expr_to_text(BooleanExpr(op="pattern", ...))` round-trip for the (currently unreachable) defensive branch. Three docstring guardrails: `expr_split.split_top_level_and` documents AND-only intent + the `_combine_conjuncts` AND-recombine mechanism that makes a hypothetical `split_top_level_or` silently incorrect; `predicate_pushdown._split_conjuncts` mirrored guard naming the failure mode; `_boolean_expr_text.boolean_expr_to_text` explicit `op == "pattern"` branch with both unreachability paths documented. No production-code behavior change. Closes the residual-frontier portion of #1219; deeper compositional shapes beyond current fixtures remain tracked under that issue (#1219, #1227).
Expand Down
6 changes: 2 additions & 4 deletions graphistry/compute/gfql/cypher/_boolean_expr_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,8 @@ def boolean_expr_to_text(expr: BooleanExpr) -> str:
single conjunct. ``NOT`` prefixes its operand; binary ops produce
``"L OP R"``.

Inherits the slice-1 known limitation for primitive literal atoms
(``str(True) == "True"`` rather than Cypher ``"true"``); that is a
follow-up under #1200 to be addressed when literal transformers
gain span-carrying wrappers.
Primitive literal atoms produced via parser fallback are normalized
to Cypher keyword casing (``true`` / ``false`` / ``null``).
"""
if expr.op == "atom":
return expr.atom_text or ""
Expand Down
24 changes: 15 additions & 9 deletions graphistry/compute/gfql/cypher/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -514,6 +514,15 @@ def _parse_number_token(token: str) -> Union[int, float]:
return int(token)


def _cypher_literal_fallback_text(value: object) -> str:
"""Render primitive Python literals in Cypher surface form."""
if value is None:
return "null"
if isinstance(value, bool):
return "true" if value else "false"
return str(value)


@dataclass(frozen=True)
class _ExpressionSlice:
text: str
Expand Down Expand Up @@ -1081,19 +1090,16 @@ def _wrap_as_boolean_atom(self, operand: Any, enclosing_meta: Any) -> BooleanExp
``_ExpressionSlice`` operands carry their own span, so we use
it to extract the source slice precisely.

**Known limitation — primitive literal atoms.** Literal
**Primitive literal fallback path.** Literal
transformers (``true_lit`` / ``false_lit`` / ``null_lit`` /
``number_lit``) return raw Python values without span info.
When such a value reaches us as a boolean-operator operand
(``WHERE true AND false``), we cannot recover the original
source text for that specific operand; we approximate with
the enclosing operator's span and ``str(operand)`` (which
produces Python-style text like ``"True"`` not Cypher-style
``"true"``). No current consumer reads ``atom_text`` on
literal atoms — the binder is not wired to ``expr_tree`` in
this slice. Accuracy for this path is a follow-up concern
tracked in issue #1200; if/when literal transformers gain
span-carrying wrappers, this fallback can be removed.
the enclosing operator's span and a Cypher-literal render
(``true`` / ``false`` / ``null`` for primitive values).
If/when literal transformers gain span-carrying wrappers,
this fallback can be removed.
"""
if isinstance(operand, BooleanExpr):
return operand
Expand All @@ -1105,7 +1111,7 @@ def _wrap_as_boolean_atom(self, operand: Any, enclosing_meta: Any) -> BooleanExp
if operand_meta is None:
# Primitive literal — see docstring caveat.
span = _span_from_meta(enclosing_meta)
text = str(operand)
text = _cypher_literal_fallback_text(operand)
else:
span = _span_from_meta(operand_meta)
text = self._slice(span)
Expand Down
131 changes: 0 additions & 131 deletions graphistry/compute/gfql/expr_split.py

This file was deleted.

11 changes: 8 additions & 3 deletions graphistry/compute/gfql/ir/pushdown_safety.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,10 @@ def is_null_rejecting(
regardless of the left side. Example: ``n.name IS NULL AND n.type = 'x'``
contains IS NULL but is null-rejecting overall.

Compound OR is not analyzed — a null-safe form anywhere in an OR chain
correctly triggers the null-safe classification because ``True OR <anything>``
is True when the null-safe conjunct evaluates to True for NULL inputs.
Compound OR is not analyzed and is treated as null-rejecting when any
null-extended alias is referenced. This avoids false negatives on mixed
alias forms such as ``n.x IS NOT NULL OR m.y = 1`` where substring checks
alone cannot prove optional-arm safety.

:param predicate: The bound predicate to classify.
:param null_extended_aliases: Aliases that may be NULL from OPTIONAL MATCH.
Expand All @@ -72,6 +73,10 @@ def is_null_rejecting(
# the other may not be, and True AND NULL = NULL (row filtered).
if " and " in expr_lower:
return True
# OR compounds are conservatively treated as null-rejecting; we do not
# perform disjunct-level alias analysis in this helper.
if " or " in expr_lower:
return True
for form in _NULL_SAFE_FORMS:
if form in expr_lower:
return False
Expand Down
100 changes: 96 additions & 4 deletions graphistry/compute/gfql/passes/predicate_pushdown.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,8 @@

import re
from dataclasses import replace
from typing import Any, FrozenSet, List, Sequence, Tuple, cast
from typing import Any, FrozenSet, List, Optional, Sequence, Tuple, cast

from graphistry.compute.gfql.expr_split import split_top_level_and
from graphistry.compute.gfql.ir.compilation import PlanContext
from graphistry.compute.gfql.ir.logical_plan import CHILD_SLOTS, Filter, LogicalPlan, PatternMatch
from graphistry.compute.gfql.ir.pushdown_safety import is_null_rejecting, with_barrier_blocks_pushdown
Expand Down Expand Up @@ -39,6 +38,99 @@ def run(self, plan: LogicalPlan, ctx: PlanContext) -> PassResult:
)


def split_top_level_and_conjuncts(expr: str) -> Tuple[str, ...]:
"""Split *expr* on whitespace-bounded top-level ``AND``.

This routine is quote-aware (single, double, backtick), tracks
bracket depth for ``()`` / ``[]`` / ``{}``, and preserves nested
boolean groups as opaque text. Returns ``()`` for malformed chains
(leading/trailing/consecutive AND) so callers can conservatively
skip splitting.

#1226 retirement note: this is the in-module replacement for the
removed ``graphistry.compute.gfql.expr_split`` helper.
"""
terms: list[str] = []
term_start = 0
paren_depth = 0
bracket_depth = 0
brace_depth = 0
string_quote: Optional[str] = None
in_backtick = False
i = 0
n = len(expr)
while i < n:
ch = expr[i]
if string_quote is not None:
if ch == "\\":
i += 2
continue
if ch == string_quote:
string_quote = None
i += 1
continue
if in_backtick:
if ch == "`":
in_backtick = False
i += 1
continue
if ch in {"'", '"'}:
string_quote = ch
i += 1
continue
if ch == "`":
in_backtick = True
i += 1
continue
if ch == "(":
paren_depth += 1
i += 1
continue
if ch == ")":
paren_depth = max(0, paren_depth - 1)
i += 1
continue
if ch == "[":
bracket_depth += 1
i += 1
continue
if ch == "]":
bracket_depth = max(0, bracket_depth - 1)
i += 1
continue
if ch == "{":
brace_depth += 1
i += 1
continue
if ch == "}":
brace_depth = max(0, brace_depth - 1)
i += 1
continue
if (
paren_depth == 0
and bracket_depth == 0
and brace_depth == 0
and expr[i:i + 3].upper() == "AND"
and (i == 0 or expr[i - 1].isspace())
and (i + 3 == n or expr[i + 3].isspace())
):
term = expr[term_start:i].strip()
if term == "":
return ()
terms.append(term)
i += 3
while i < n and expr[i].isspace():
i += 1
term_start = i
continue
i += 1
tail = expr[term_start:].strip()
if tail == "":
return ()
terms.append(tail)
return tuple(terms)


def _rewrite_tree(
plan: LogicalPlan,
*,
Expand Down Expand Up @@ -133,12 +225,12 @@ def _split_conjuncts(predicate: BoundPredicate) -> List[BoundPredicate]:

AND-only — splitting an OR here would be silently wrong because
``_combine_conjuncts`` (below) AND-joins residuals. See
``expr_split.split_top_level_and`` docstring for the full rationale.
``split_top_level_and_conjuncts`` docstring for the full rationale.
"""
expression = predicate.expression.strip()
if not expression:
return []
parts = split_top_level_and(expression)
parts = split_top_level_and_conjuncts(expression)
if len(parts) <= 1:
return [predicate]
return [
Expand Down
Loading
Loading