graphistry · lmeyerov · Apr 30, 2026 · Apr 30, 2026 · Apr 30, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -30,6 +30,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
 - **Polars support**: `polars.DataFrame` and `polars.LazyFrame` now work in `plot()`, `materialize_nodes()`, `get_degrees()`, `get_indegrees()`, `get_outdegrees()`, and `hypergraph()`. Polars is an optional dependency — no behavior change when not installed. Upload path uses efficient Arrow conversion (`to_arrow()` with schema-metadata stripping and memoization); compute/hypergraph paths coerce to pandas at entry. `LazyFrame` is materialized via `.collect()` at each boundary. Adds `test_polars.py` with 17 tests; skips gracefully when polars is absent (#1133).
 
 ### Internal
+- **GFQL / Cypher residual cleanup (#1226, #1219)**: Retired `graphistry/compute/gfql/expr_split.py` by migrating top-level-AND splitting to `predicate_pushdown.py` (`split_top_level_and_conjuncts`) and updating conformance coverage to target the migrated splitter path. Fixed primitive literal atom fallback text in parser boolean-tree construction so `atom_text` now preserves Cypher keyword casing (`true`/`false`/`null`) instead of Python casing. Tightened optional-arm pushdown null-safety guardrails by treating OR-compound predicates as conservatively null-rejecting (disjunct-level alias analysis remains out of scope), with new regression coverage for mixed-alias `IS NOT NULL OR ...` forms.
 - **GFQL / Cypher lowering — WHERE NOT (pattern) anti-semi execution (#1031 slice 2 phase 2b)**: `WherePatternPredicate(negated=True)` no longer hard-fails at lowering. The compiler now emits a row pre-filter call (`anti_semi_apply`) for negated pattern predicates in both general MATCH lowering and connected OPTIONAL MATCH clause lowering, with per-predicate validation (must include relationship, must not introduce new aliases, must share bound aliases). Added `anti_semi_apply` row-pipeline runtime operation + call-safelist entry, plus row-table base-graph context preservation needed for correlated bindings execution. New tests cover compile shape (`row_pre_filters` emission) and runtime behavior for `MATCH (n) WHERE NOT (n)-[:R]->()` (including mixed row-expression + NOT-pattern filtering), plus connected `OPTIONAL MATCH ... WHERE NOT (pattern)` filtering/null-fill semantics. Full `cypher/test_lowering.py` suite passes (756 passed / 66 skipped) and touched-module mypy is clean.
 - **GFQL / Cypher parser + ast_normalizer — NOT-pattern AST plumbing (#1031 slice 2 phase 2a)**: Top-level `WHERE NOT (pattern)` shapes (e.g. `WHERE NOT (n)-[:R]->()`) now parse cleanly and lift into `WhereClause.predicates` as `WherePatternPredicate(negated=True)` entries instead of tripping the legacy "cannot yet be mixed with generic row expressions" E108.  `_split_top_level_and_pattern_leaves` adds a top-level `not(pattern_atom)` case that strips the NOT and emits the inner pattern as a negated leaf; `_build_where_with_pattern_lift` accepts both positive and negated leaves and emits one `WherePatternPredicate` per leaf with the matching `negated` flag.  ast_normalizer's `_rewrite_where_pattern_predicates_to_matches` partitions into positive (rewrites to appended MatchClause as before) vs negated (passes through to lowering).  Lowering now distinguishes the two cases: positive `WherePatternPredicate` still raises "must be rewritten before lowering" (defensive — slice 3 already rewrites all positives in ast_normalizer); negated raises a scoped "Cypher WHERE NOT (pattern) anti-semi-join lowering is not yet supported" pointing the way for the engine half (path-C row-pipeline anti-join, see `plans/1031-slices-2-3-4/findings/slice-2-scope.md`).  Adds `test_parse_lifts_top_level_not_pattern_to_negated_predicate` and `test_string_cypher_failfast_rejects_negated_pattern_until_slice2_lowering`.  De Morgan compositions, OR-around-pattern, and double-NOT remain rejected at the lift step (slice 4 / future).  Phase 2a only — runtime (anti-semi-join lowering) ships in a follow-up sub-PR (#1031).
 - **GFQL / Cypher row-boolean residual matrix + guardrails (#1219 hardening)**: Locks compositional row-boolean WHERE shapes that #1217's Earley swap admitted but its initial test surface didn't cover.  Adds 11 native tests: nullable NOT/OR over a 4-row 3VL fixture (`NULL OR T = T`); N-ary OR (3 branches) + duplicate-branch companion isolating rightmost-drop associativity bugs; De Morgan equivalences (`NOT (A OR B)` ≡ `NOT A AND NOT B`; `NOT (A AND B)` ≡ `NOT A OR NOT B`) parametrized to assert both per-form expected rows AND the form-equivalence; double negation; XOR symmetric difference + XOR with NULL preserving 3VL; mixed-string-numeric AND inside OR exercising `_StringAllowingComparisonMixin` GT path; unit test locking `boolean_expr_to_text(BooleanExpr(op="pattern", ...))` round-trip for the (currently unreachable) defensive branch.  Three docstring guardrails: `expr_split.split_top_level_and` documents AND-only intent + the `_combine_conjuncts` AND-recombine mechanism that makes a hypothetical `split_top_level_or` silently incorrect; `predicate_pushdown._split_conjuncts` mirrored guard naming the failure mode; `_boolean_expr_text.boolean_expr_to_text` explicit `op == "pattern"` branch with both unreachability paths documented.  No production-code behavior change.  Closes the residual-frontier portion of #1219; deeper compositional shapes beyond current fixtures remain tracked under that issue (#1219, #1227).

diff --git a/graphistry/compute/gfql/cypher/_boolean_expr_text.py b/graphistry/compute/gfql/cypher/_boolean_expr_text.py
@@ -51,10 +51,8 @@ def boolean_expr_to_text(expr: BooleanExpr) -> str:
     single conjunct.  ``NOT`` prefixes its operand; binary ops produce
     ``"L OP R"``.
 
-    Inherits the slice-1 known limitation for primitive literal atoms
-    (``str(True) == "True"`` rather than Cypher ``"true"``); that is a
-    follow-up under #1200 to be addressed when literal transformers
-    gain span-carrying wrappers.
+    Primitive literal atoms produced via parser fallback are normalized
+    to Cypher keyword casing (``true`` / ``false`` / ``null``).
     """
     if expr.op == "atom":
         return expr.atom_text or ""

diff --git a/graphistry/compute/gfql/cypher/parser.py b/graphistry/compute/gfql/cypher/parser.py
@@ -514,6 +514,15 @@ def _parse_number_token(token: str) -> Union[int, float]:
     return int(token)
 
 
+def _cypher_literal_fallback_text(value: object) -> str:
+    """Render primitive Python literals in Cypher surface form."""
+    if value is None:
+        return "null"
+    if isinstance(value, bool):
+        return "true" if value else "false"
+    return str(value)
+
+
 @dataclass(frozen=True)
 class _ExpressionSlice:
     text: str
@@ -1081,19 +1090,16 @@ def _wrap_as_boolean_atom(self, operand: Any, enclosing_meta: Any) -> BooleanExp
             ``_ExpressionSlice`` operands carry their own span, so we use
             it to extract the source slice precisely.
 
-            **Known limitation — primitive literal atoms.**  Literal
+            **Primitive literal fallback path.**  Literal
             transformers (``true_lit`` / ``false_lit`` / ``null_lit`` /
             ``number_lit``) return raw Python values without span info.
             When such a value reaches us as a boolean-operator operand
             (``WHERE true AND false``), we cannot recover the original
             source text for that specific operand; we approximate with
-            the enclosing operator's span and ``str(operand)`` (which
-            produces Python-style text like ``"True"`` not Cypher-style
-            ``"true"``).  No current consumer reads ``atom_text`` on
-            literal atoms — the binder is not wired to ``expr_tree`` in
-            this slice.  Accuracy for this path is a follow-up concern
-            tracked in issue #1200; if/when literal transformers gain
-            span-carrying wrappers, this fallback can be removed.
+            the enclosing operator's span and a Cypher-literal render
+            (``true`` / ``false`` / ``null`` for primitive values).
+            If/when literal transformers gain span-carrying wrappers,
+            this fallback can be removed.
             """
             if isinstance(operand, BooleanExpr):
                 return operand
@@ -1105,7 +1111,7 @@ def _wrap_as_boolean_atom(self, operand: Any, enclosing_meta: Any) -> BooleanExp
                 if operand_meta is None:
                     # Primitive literal — see docstring caveat.
                     span = _span_from_meta(enclosing_meta)
-                    text = str(operand)
+                    text = _cypher_literal_fallback_text(operand)
                 else:
                     span = _span_from_meta(operand_meta)
                     text = self._slice(span)

diff --git a/graphistry/compute/gfql/expr_split.py b/graphistry/compute/gfql/expr_split.py
diff --git a/graphistry/compute/gfql/ir/pushdown_safety.py b/graphistry/compute/gfql/ir/pushdown_safety.py
@@ -55,9 +55,10 @@ def is_null_rejecting(
       regardless of the left side.  Example: ``n.name IS NULL AND n.type = 'x'``
       contains IS NULL but is null-rejecting overall.
 
-    Compound OR is not analyzed — a null-safe form anywhere in an OR chain
-    correctly triggers the null-safe classification because ``True OR <anything>``
-    is True when the null-safe conjunct evaluates to True for NULL inputs.
+    Compound OR is not analyzed and is treated as null-rejecting when any
+    null-extended alias is referenced.  This avoids false negatives on mixed
+    alias forms such as ``n.x IS NOT NULL OR m.y = 1`` where substring checks
+    alone cannot prove optional-arm safety.
 
     :param predicate: The bound predicate to classify.
     :param null_extended_aliases: Aliases that may be NULL from OPTIONAL MATCH.
@@ -72,6 +73,10 @@ def is_null_rejecting(
     # the other may not be, and True AND NULL = NULL (row filtered).
     if " and " in expr_lower:
         return True
+    # OR compounds are conservatively treated as null-rejecting; we do not
+    # perform disjunct-level alias analysis in this helper.
+    if " or " in expr_lower:
+        return True
     for form in _NULL_SAFE_FORMS:
         if form in expr_lower:
             return False

diff --git a/graphistry/compute/gfql/passes/predicate_pushdown.py b/graphistry/compute/gfql/passes/predicate_pushdown.py
@@ -8,9 +8,8 @@
 
 import re
 from dataclasses import replace
-from typing import Any, FrozenSet, List, Sequence, Tuple, cast
+from typing import Any, FrozenSet, List, Optional, Sequence, Tuple, cast
 
-from graphistry.compute.gfql.expr_split import split_top_level_and
 from graphistry.compute.gfql.ir.compilation import PlanContext
 from graphistry.compute.gfql.ir.logical_plan import CHILD_SLOTS, Filter, LogicalPlan, PatternMatch
 from graphistry.compute.gfql.ir.pushdown_safety import is_null_rejecting, with_barrier_blocks_pushdown
@@ -39,6 +38,99 @@ def run(self, plan: LogicalPlan, ctx: PlanContext) -> PassResult:
         )
 
 
+def split_top_level_and_conjuncts(expr: str) -> Tuple[str, ...]:
+    """Split *expr* on whitespace-bounded top-level ``AND``.
+
+    This routine is quote-aware (single, double, backtick), tracks
+    bracket depth for ``()`` / ``[]`` / ``{}``, and preserves nested
+    boolean groups as opaque text.  Returns ``()`` for malformed chains
+    (leading/trailing/consecutive AND) so callers can conservatively
+    skip splitting.
+
+    #1226 retirement note: this is the in-module replacement for the
+    removed ``graphistry.compute.gfql.expr_split`` helper.
+    """
+    terms: list[str] = []
+    term_start = 0
+    paren_depth = 0
+    bracket_depth = 0
+    brace_depth = 0
+    string_quote: Optional[str] = None
+    in_backtick = False
+    i = 0
+    n = len(expr)
+    while i < n:
+        ch = expr[i]
+        if string_quote is not None:
+            if ch == "\\":
+                i += 2
+                continue
+            if ch == string_quote:
+                string_quote = None
+            i += 1
+            continue
+        if in_backtick:
+            if ch == "`":
+                in_backtick = False
+            i += 1
+            continue
+        if ch in {"'", '"'}:
+            string_quote = ch
+            i += 1
+            continue
+        if ch == "`":
+            in_backtick = True
+            i += 1
+            continue
+        if ch == "(":
+            paren_depth += 1
+            i += 1
+            continue
+        if ch == ")":
+            paren_depth = max(0, paren_depth - 1)
+            i += 1
+            continue
+        if ch == "[":
+            bracket_depth += 1
+            i += 1
+            continue
+        if ch == "]":
+            bracket_depth = max(0, bracket_depth - 1)
+            i += 1
+            continue
+        if ch == "{":
+            brace_depth += 1
+            i += 1
+            continue
+        if ch == "}":
+            brace_depth = max(0, brace_depth - 1)
+            i += 1
+            continue
+        if (
+            paren_depth == 0
+            and bracket_depth == 0
+            and brace_depth == 0
+            and expr[i:i + 3].upper() == "AND"
+            and (i == 0 or expr[i - 1].isspace())
+            and (i + 3 == n or expr[i + 3].isspace())
+        ):
+            term = expr[term_start:i].strip()
+            if term == "":
+                return ()
+            terms.append(term)
+            i += 3
+            while i < n and expr[i].isspace():
+                i += 1
+            term_start = i
+            continue
+        i += 1
+    tail = expr[term_start:].strip()
+    if tail == "":
+        return ()
+    terms.append(tail)
+    return tuple(terms)
+
+
 def _rewrite_tree(
     plan: LogicalPlan,
     *,
@@ -133,12 +225,12 @@ def _split_conjuncts(predicate: BoundPredicate) -> List[BoundPredicate]:
 
     AND-only — splitting an OR here would be silently wrong because
     ``_combine_conjuncts`` (below) AND-joins residuals.  See
-    ``expr_split.split_top_level_and`` docstring for the full rationale.
+    ``split_top_level_and_conjuncts`` docstring for the full rationale.
     """
     expression = predicate.expression.strip()
     if not expression:
         return []
-    parts = split_top_level_and(expression)
+    parts = split_top_level_and_conjuncts(expression)
     if len(parts) <= 1:
         return [predicate]
     return [