test+docs(#1219): residual row-boolean compositional matrix + guardrails (#1227)

lmeyerov · claude · web-flow · commit 4253ef6e6a19 · 2026-04-28T20:12:18.000-07:00
* test+docs(#1219): residual row-boolean compositional matrix + guardrails Worker B / independent-hardening stream of #1219. After #1217's Earley swap surfaced row-boolean shapes (OR/NOT/XOR among row predicates) that LALR rejected, four compositional shapes remained unverified beyond the fixtures #1217 covered. This PR locks empirical correctness across the residual matrix + adds two lightweight guardrail comments. ## Compositional matrix tests (test_lowering.py) All four shapes verified correct empirically; locked with sorted-id assertions against discriminating fixtures: 1. **Nullable NOT/OR** — `WHERE NOT n.x = 1 OR n.y IS NULL` against a 4-row fixture mixing real and projected nulls. Locks pandas-backed row-evaluator preserves Cypher 3VL truth table (NULL OR T = T): `{n2, n4}`. 2. **N-ary OR (3 branches)** — `WHERE n.x = 1 OR n.x = 2 OR n.x = 3`. Locks left-associative parse `or(or(=1, =2), =3)` doesn't degenerate under associativity bugs. `{n1, n2, n3}`. 3. **De Morgan compositions** (parametrized × 4) — both `NOT(A OR B)` ≡ `NOT-A AND NOT-B` and `NOT(A AND B)` ≡ `NOT-A OR NOT-B` against a 4-row fixture covering all (x∈{1,2}, y∈{2,3}) combos. Each form and its De-Morganed equivalent return the same row set. 4. **Mixed-string-numeric AND inside OR** — `WHERE (n.s = 'a' AND n.x > 0) OR n.x < -1`, exercising `_StringAllowingComparisonMixin` (#1217) paired with OR composition. `{n1, n2, n4}`. ## Guardrails - `expr_split.py::split_top_level_and` — added load-bearing AND-only docstring with #1219 cross-ref explaining why a sibling `split_top_level_or` would break OR-distributivity-over-join correctness. No future maintainer should accidentally add it without first redesigning topology-aware pushdown safety. - `_boolean_expr_text.py::boolean_expr_to_text` — added explicit `if expr.op == "pattern"` branch with docstring. Currently unreachable in production (lift step extracts pattern leaves before the binder walks the tree) but documents the contract: emit the raw pattern source rather than silently falling through to empty string. ## Test impact Validated on dgx-spark: `graphistry/tests/compute/` → 2524 passed (7 new tests; remaining delta from baseline absorbed by #1224's unrelated additions). Closes the residual frontier portion of #1219. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test+docs(#1219): wave-1 review fixes — discriminating fixtures, equivalence assertions, pattern-op unit test, mirrored guard Wave-1 review on c196393 surfaced 3 IMPORTANTs + 5 SUGGESTIONs. Addressed: 1. N-ary OR test now has a companion (duplicate-leftmost-branch) that isolates the rightmost-drop associativity bug from the any-branch- drop case. Comment rephrased to be honest about what the original test covers (any branch dropped, not specifically rightmost). 2. De Morgan parametrize restructured: paired (compound, distributed, expected) tuples instead of independent rows. Now asserts: - compound matches expected - distributed matches expected - compound == distributed (the actual De Morgan equivalence) Added separate double-negation test (NOT NOT A ≡ A). 3. New test_boolean_expr_to_text_emits_atom_text_for_pattern_op in test_boolean_expr.py exercises the explicit pattern branch added in c196393. Locks the contract even though the branch is currently unreachable in production (lift step extracts pattern leaves before the binder walks the tree). 4. Mirrored AND-only guard comment near _split_conjuncts in predicate_pushdown.py — that's where future maintainers actually look when adding pushdown features; the load-bearing rationale stays in expr_split.py's docstring. Test counts on dgx-spark: 2525 passed (was 2524 + 1 pattern_op unit test; net 8 new tests in PR diff vs master). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test+docs(#1219): wave-2 review fixes — accurate docstring, inline pattern, terser comments Wave-2 (2a targeted + 2b broad) found 1 IMPORTANT + several SUGGESTIONs. IMPORTANT (Wave-2b): - expr_split.split_top_level_and docstring overstated the topology argument. Original draft claimed distribute-OR breaks on fan-out topologies; the cross-alias-OR test in test_lowering.py:3206-3211 (added in #1217) explicitly confirms distribute-OR converges to correct row-set semantics on a 2:2 fan-out fixture. The actual reason pushdown leaves OR opaque is that the multi-alias references on the conjunct cause it to be retained post-join. Rewrote docstring to describe the real mechanism + cite the test. SUGGESTIONs (Wave-2a + 2b): - _ids_for(graph, query) helper inlined to match the established inline-sorted-comprehension style used elsewhere in test_lowering.py (12+ existing call sites). Removes inconsistency within this PR. - Pattern-op branch comment in _boolean_expr_text.py compressed from 9 lines to 3 — the verbose explanation duplicated the test docstring. - Local-variable assignment for graph.gfql() result kept (matches existing test patterns + works around a Plottable.gfql pyright attribute warning that fires on fluent chains). Test counts unchanged: 8 new tests; full gfql suite 1581 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test+docs(#1219): wave-3b fixes — accurate split-OR rationale + XOR runtime sibling Wave-3b from-scratch fresh-eyes review found 1 IMPORTANT + 3 SUGGESTIONs. Addressed: - IMPORTANT: split_top_level_and docstring's stated mechanism ('multi-alias refs cause retention post-join') was misleading. Real mechanism: pushdown silently AND-recombines the pushed conjuncts inside PatternMatch.predicates. AND distributes (split + AND-recombine ≡ original); OR does not (split + AND-recombine ≠ original). An OR-aware split would need a UNION-of-pushed-branches recombine path the current pipeline doesn't implement. Rewrote the docstring with the accurate split + AND-recombine rationale. - SUGGESTION 3: XOR runtime row-set test (sibling to OR/AND/NOT tests this PR adds). Locks symmetric-difference semantics. Skipped: - SUGGESTION 4 (NOT IS NULL standalone): nullable_not_or already exercises IS NULL composed with NOT/OR; standalone marginal. - SUGGESTION 5 (cuDF parametrize): real scope creep, defer. Test counts: 9 new tests; full gfql suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test+docs(#1219): wave-4 polish — accurate mixin claim, XOR with NULL, mirrored-guard clarification Wave-4 (4a targeted + 4b from-scratch) — 0 BLOCKER + 0 IMPORTANT, 9 SUGGESTIONs. Picked up the cheapest valuable ones: - Wave-4b S1: mixed-string-numeric AND-inside-OR test claimed to exercise _StringAllowingComparisonMixin but used `n.s = 'a'` (plain EQ, supported pre-#1217). Swapped to `n.s > 'a'` (string GT, the mixin-specific path); flipped fixture s-values to keep the same truth-table outcomes. - Wave-4b S3: added test_string_cypher_executes_xor_with_null_uses_three_valued_logic — sibling to the OR/NOT 3VL test, locks NULL XOR T = NULL. Reuses the 3VL fixture from the De Morgan tests. - Wave-4a S1: `_split_conjuncts` mirrored guard now names the actual failure mode (`_combine_conjuncts` AND-joins residuals) before pointing to the fuller rationale in expr_split. Skipped (out of scope or duplicate): - Wave-4b S2: rightmost-only discriminator comment is already honest; test verified correct. - Wave-4b S4: cross-alias OR / cross-product fixture — already covered by test_string_cypher_executes_cross_alias_or_returns_correct_union (#1217). - Wave-4b S6: optional CHANGELOG bullet for test+docs PR. - Wave-4a S2 (OR-analyzer in pushdown_safety.py:58-60): pre-existing unrelated, separate followup. Test counts: 10 new tests; full gfql suite green (1583 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test+docs(#1219): wave-5 polish — extract 3VL fixture helper, clarify pattern-op unreachability Wave-5 formal review (multi-dim per .agents/skills/review/SKILL.md + adversarial pressure-test) found 0 BLOCKER + 0 IMPORTANT, 3 SUGGESTIONs of which 2 were actionable + cheap: - Extract `_three_valued_logic_fixture_graph()` helper paralleling `_de_morgan_fixture_graph()` — eliminates byte-identical 4-row NaN- mixed fixture duplication between nullable_not_or and xor_with_null. - Clarify pattern-op branch comment in `_boolean_expr_text.py` to name BOTH unreachability paths (top-level AND lift + nested-NOT/OR/XOR E108 rejection) instead of just the first. - Adversarial-rejected: malformed-NOT-chain test would be a tautology (right-recursive grammar makes depth-N equivalent to depth-2; existing double-negation test already exercises the path). Test counts unchanged: 11 new tests; full gfql suite 1583 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(changelog): add #1227 row-boolean residual matrix entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -30,6 +30,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
 - **Polars support**: `polars.DataFrame` and `polars.LazyFrame` now work in `plot()`, `materialize_nodes()`, `get_degrees()`, `get_indegrees()`, `get_outdegrees()`, and `hypergraph()`. Polars is an optional dependency — no behavior change when not installed. Upload path uses efficient Arrow conversion (`to_arrow()` with schema-metadata stripping and memoization); compute/hypergraph paths coerce to pandas at entry. `LazyFrame` is materialized via `.collect()` at each boundary. Adds `test_polars.py` with 17 tests; skips gracefully when polars is absent (#1133).
 
 ### Internal
+- **GFQL / Cypher row-boolean residual matrix + guardrails (#1219 hardening)**: Locks compositional row-boolean WHERE shapes that #1217's Earley swap admitted but its initial test surface didn't cover.  Adds 11 native tests: nullable NOT/OR over a 4-row 3VL fixture (`NULL OR T = T`); N-ary OR (3 branches) + duplicate-branch companion isolating rightmost-drop associativity bugs; De Morgan equivalences (`NOT (A OR B)` ≡ `NOT A AND NOT B`; `NOT (A AND B)` ≡ `NOT A OR NOT B`) parametrized to assert both per-form expected rows AND the form-equivalence; double negation; XOR symmetric difference + XOR with NULL preserving 3VL; mixed-string-numeric AND inside OR exercising `_StringAllowingComparisonMixin` GT path; unit test locking `boolean_expr_to_text(BooleanExpr(op="pattern", ...))` round-trip for the (currently unreachable) defensive branch.  Three docstring guardrails: `expr_split.split_top_level_and` documents AND-only intent + the `_combine_conjuncts` AND-recombine mechanism that makes a hypothetical `split_top_level_or` silently incorrect; `predicate_pushdown._split_conjuncts` mirrored guard naming the failure mode; `_boolean_expr_text.boolean_expr_to_text` explicit `op == "pattern"` branch with both unreachability paths documented.  No production-code behavior change.  Closes the residual-frontier portion of #1219; deeper compositional shapes beyond current fixtures remain tracked under that issue (#1219, #1227).
 - **GFQL / Cypher parser + ast_normalizer — multi-positive WHERE pattern predicates (#1031 slice 3)**: AND-joined positive WHERE pattern predicates (`WHERE (n)-[:R]->() AND (n)-[:T]->()`) now lift into structured `WhereClause.predicates` as N `WherePatternPredicate` entries.  The ast_normalizer packs them into a single appended `MatchClause` whose `patterns: Tuple[Tuple[PatternElement, ...], ...]` carries one tuple per predicate (multi-pattern cartesian within MATCH), preserving the lowering invariant that only the FINAL match is connected — pre-binding seeds remain node-only.  Per-predicate validation (must include a relationship; cannot introduce new aliases) runs independently before the lift.  Removes the legacy `len(pattern_leaves) > 1` gate in `parser.py::_build_where_with_pattern_lift` and the corresponding gate in `ast_normalizer._rewrite_where_pattern_predicates_to_matches`.  Refactors `pattern_atom` to split the greedy `WHERE_PATTERN` lexer token (which gobbles `pattern AND pattern AND ...` chains as a single match) back into individual pattern-item texts via `_WHERE_PATTERN_ITEM_RE.finditer` and emit one `BooleanExpr(op="pattern")` per item, joined by an AND-tree via `_rebuild_and_tree`.  Adds `test_gfql_executes_multi_positive_where_pattern_predicates_as_intersected_seed` and updates the legacy rejection test to assert the new lift + compile shape.  Closes #1031 slice 3 (#1031).
 - **GFQL / Cypher lowering**: Connected `MATCH + OPTIONAL MATCH` compilation now supports row-boolean `WHERE` expressions (`OR`/`NOT`/`XOR` and mixed row predicates) by carrying non-lowerable expressions into post-binding `where_rows(...)` filters for base and optional arms, preserving null-extension behavior while expanding supported disjunction shapes (#1219, #1224).
 - **GFQL / Cypher parser**: switched the Cypher parser from Lark's LALR(1) backend to Earley.  *Earley's broader unification incidentally lifts the implicit LALR rejection on row-side OR/NOT/XOR among row predicates.  Coverage validated empirically across the risky shapes available to current fixtures: simple homogeneous AND/OR/NOT (correct rows); cross-alias OR with predicate-pushdown candidates (correct union — pushdown leaves the OR intact past the join); OPTIONAL MATCH + WHERE OR (the pre-existing OPTIONAL-MATCH-projection validator gates the projection shape regardless of WHERE — including the OR variant); type-coerced OR against a mixed-type Series (the call executor wraps pandas's `TypeError` as `GFQLTypeError(E303)` via the generic unsupported-row-expression path).  No silent wrong-rows surfaced in the shapes exercised; deeper compositional shapes (NOT inside OR with nullable arms, N-ary OR associativity, mixed-string-numeric AND inside OR, De Morgan compositions) are tracked under #1219.*  This eliminates four LALR-induced workarounds: the 3 dedicated pattern-shape `where_clause` grammar alternatives (now collapsed into a single `WHERE_PATTERN -> pattern_atom` leaf in `?primary`), `_canonicalize_where_single_pattern_and_expr` (regex source-rewrite that reordered `expr AND pattern AND expr` to `pattern AND <rest>` so LALR could match), `_mixed_where_pattern_expr_error` (pre-flight rejector replaced with a structural lift in `generic_where_clause`), and the `parse_cypher` `except LarkError` retry block.  `BooleanExpr.op` literal extended with `"pattern"` plus a new `BooleanExpr.pattern` payload field; the `pattern_atom` transformer wraps `WHERE_PATTERN` tokens as boolean-tree leaves; `_split_top_level_and_pattern_leaves` + `_rebuild_and_tree` + `_build_where_with_pattern_lift` extract pattern leaves from `expr_tree` into `WhereClause.predicates` as `WherePatternPredicate` entries before lowering.  Strict-improvement consequences (Earley accepts what LALR rejected): `WHERE expr OR expr` now parses as a structured `or` tree; `WHERE expr AND (expr OR expr)` parses as `and(left, or(...))`; `WHERE n:Label AND n.prop = X` routes through structured `where_predicates`; mixed label+property+string-comparison shapes work via the paired `_StringAllowingComparisonMixin` fix in `comparison.py`.  Slice 2/3/4 of #1200 territory (NOT-pattern, multi-positive-pattern, OR/XOR-around-pattern) emit explicit `unsupported` errors at the lift step.  1551 GFQL tests pass; the matching `tck-gfql` branch (`issue-1031-grammar-mixed-where-pattern-expr`) carries 8 paired contract refinements: `match-where1-10` lifted to UNEXPECTED_SUCCESS; `match-where5-{1,2,3}` + `expr-comparison2-1` + `with-where5-3` migrated to a new TYPE_ERROR_KEYS bucket (string-comparison-on-mixed-Series wraps as `GFQLTypeError(E303)`); `match-where5-4` upgraded to MATCHES_EXPECTED (Earley + 3-valued OR makes `WHERE i.var > 'te' OR i.var IS NOT NULL` semantically correct); `with2-1` filed under WRONG_ROW_KEYS (WITH-pipelined join now parses + executes but rows differ from oracle — separate gap); `with-where5-3` demoted from PROMOTION_ROW_KEYS (#1031, #1217).
diff --git a/graphistry/compute/gfql/cypher/_boolean_expr_text.py b/graphistry/compute/gfql/cypher/_boolean_expr_text.py
@@ -58,6 +58,13 @@ def boolean_expr_to_text(expr: BooleanExpr) -> str:
     """
     if expr.op == "atom":
         return expr.atom_text or ""
+    if expr.op == "pattern":
+        # Unreachable today: top-level AND leaves are lifted out by
+        # ``_split_top_level_and_pattern_leaves`` before the binder walks
+        # the tree, and patterns nested under NOT/OR/XOR are rejected
+        # earlier with E108 errors.  Contract for the defensive branch:
+        # emit raw pattern source for round-trippability.
+        return expr.atom_text or ""
     if expr.op == "not":
         operand = boolean_expr_to_text(expr.left) if expr.left is not None else ""
         if expr.left is not None and expr.left.op != "atom":
diff --git a/graphistry/compute/gfql/expr_split.py b/graphistry/compute/gfql/expr_split.py
@@ -25,6 +25,21 @@ def split_top_level_and(expr: str) -> Tuple[str, ...]:
     them do not split.  Leading and trailing whitespace on each term is
     stripped.
 
+    **AND-only by design.**  Do NOT add a sibling ``split_top_level_or``.
+    The pushdown pipeline (``predicate_pushdown._push_filter_into_pattern``)
+    splits a filter into conjuncts, decides per-conjunct whether to
+    push, and silently AND-combines the pushed conjuncts inside
+    ``PatternMatch.predicates``.  That contract is correct for AND
+    (split + AND-recombine is the identity on the original AND-tree)
+    but wrong for OR: splitting ``a.x = 1 OR b.y = 2`` and AND-recombining
+    yields ``a.x = 1 AND b.y = 2``, which is a strict subset of the
+    correct answer.  An OR-aware split would need a UNION-of-pushed-
+    branches recombine path (with row-multiplicity / dedup logic) that
+    the current pipeline does not implement.  See #1219 for the design
+    space.  Cross-alias OR conjuncts route through the per-pattern
+    pushdown intact today (verified by
+    ``test_string_cypher_executes_cross_alias_or_returns_correct_union``).
+
     :param expr: The expression text to split (typically a WHERE body).
     :returns: A tuple of non-empty terms.  ``()`` when *expr* is empty,
         whitespace-only, has a leading/trailing top-level ``AND``, or
diff --git a/graphistry/compute/gfql/passes/predicate_pushdown.py b/graphistry/compute/gfql/passes/predicate_pushdown.py
@@ -129,7 +129,12 @@ def _optional_arm_aliases(pattern: PatternMatch) -> FrozenSet[str]:
 
 
 def _split_conjuncts(predicate: BoundPredicate) -> List[BoundPredicate]:
-    """Split ``A AND B`` into top-level conjunct predicates."""
+    """Split ``A AND B`` into top-level conjunct predicates.
+
+    AND-only — splitting an OR here would be silently wrong because
+    ``_combine_conjuncts`` (below) AND-joins residuals.  See
+    ``expr_split.split_top_level_and`` docstring for the full rationale.
+    """
     expression = predicate.expression.strip()
     if not expression:
         return []
diff --git a/graphistry/tests/compute/gfql/cypher/test_boolean_expr.py b/graphistry/tests/compute/gfql/cypher/test_boolean_expr.py
@@ -219,3 +219,28 @@ def test_literal_boolean_atoms_known_limitation_python_style_text() -> None:
     assert tree.left.atom_text == "True"  # known limitation — see docstring
     # Right operand is a comparable with a Lark Tree span — accurate slice.
     assert tree.right is not None and tree.right.atom_text == "n.x > 1"
+
+
+# ---------------------------------------------------------------------------
+# boolean_expr_to_text contract for the op == "pattern" branch
+# ---------------------------------------------------------------------------
+
+
+def test_boolean_expr_to_text_emits_atom_text_for_pattern_op() -> None:
+    # Pattern leaves are normally lifted out of expr_tree by
+    # _split_top_level_and_pattern_leaves before the binder walks the
+    # tree, so this branch is unreachable in production.  The unit test
+    # locks the contract explicitly so a future code path that DOES
+    # reach boolean_expr_to_text with a pattern leaf gets the raw
+    # pattern source rather than the empty-string fallthrough.
+    from graphistry.compute.gfql.cypher.ast import SourceSpan
+    from graphistry.compute.gfql.cypher._boolean_expr_text import boolean_expr_to_text
+
+    span = SourceSpan(line=1, column=1, end_line=1, end_column=10, start_pos=0, end_pos=10)
+    pattern_leaf = BooleanExpr(
+        op="pattern",
+        span=span,
+        atom_text="(a)-->(b)",
+        atom_span=span,
+    )
+    assert boolean_expr_to_text(pattern_leaf) == "(a)-->(b)"
diff --git a/graphistry/tests/compute/gfql/cypher/test_lowering.py b/graphistry/tests/compute/gfql/cypher/test_lowering.py
@@ -3324,6 +3324,205 @@ def test_string_cypher_executes_homogeneous_or_returns_correct_union() -> None:
     assert ids == ["n1", "n3"]
 
 
+# Compositional row-boolean shapes (#1219 residual matrix).  Each shape locks
+# Cypher 3VL semantics + boolean-tree composition correctness against
+# fixtures designed to discriminate against subtle bugs.
+
+
+def _three_valued_logic_fixture_graph() -> _CypherTestGraph:
+    # 4-row fixture mixing actual and projected NaN over (x, y) for 3VL tests.
+    return _mk_graph(
+        pd.DataFrame({
+            "id":       ["n1", "n2", "n3", "n4"],
+            "label__N": [True, True, True, True],
+            "x":        [1.0, 2.0, float("nan"), float("nan")],
+            "y":        [10.0, float("nan"), 20.0, float("nan")],
+        }),
+        pd.DataFrame({"s": [], "d": []}),
+    )
+
+
+def test_string_cypher_executes_nullable_not_or_uses_three_valued_logic() -> None:
+    # `WHERE NOT n.x = 1 OR n.y IS NULL` against a fixture mixing actual
+    # and projected nulls.  Cypher 3VL truth table:
+    #   n1{x=1, y=10}:   NOT(1=1)=F,    y IS NULL=F → F OR F = F → drop
+    #   n2{x=2, y=NaN}:  NOT(2=1)=T,    y IS NULL=T → T OR T = T → keep
+    #   n3{x=NaN,y=20}:  NOT(NaN=1)=NULL, y IS NULL=F → NULL OR F = NULL → drop
+    #   n4{x=NaN,y=NaN}: NOT(NaN=1)=NULL, y IS NULL=T → NULL OR T = T → keep
+    # Locks that the pandas-backed row-evaluator preserves NULL OR T = T.
+    graph = _three_valued_logic_fixture_graph()
+
+    result = graph.gfql("MATCH (n:N) WHERE NOT n.x = 1 OR n.y IS NULL RETURN n.id AS id")
+
+    ids = sorted(row["id"] for row in result._nodes.to_dict(orient="records"))
+    assert ids == ["n2", "n4"]
+
+
+def test_string_cypher_executes_nary_or_returns_full_union() -> None:
+    # `WHERE n.x = 1 OR n.x = 2 OR n.x = 3` — three OR branches against a
+    # 5-row fixture where each value matches a unique row.  Locks that the
+    # binder's parse evaluates ALL branches; silently dropping any one
+    # branch yields a 2-row result and fails the assertion.
+    graph = _mk_graph(
+        pd.DataFrame({
+            "id":       ["n1", "n2", "n3", "n4", "n5"],
+            "label__N": [True, True, True, True, True],
+            "x":        [1, 2, 3, 4, 5],
+        }),
+        pd.DataFrame({"s": [], "d": []}),
+    )
+
+    result = graph.gfql("MATCH (n:N) WHERE n.x = 1 OR n.x = 2 OR n.x = 3 RETURN n.id AS id")
+
+    ids = sorted(row["id"] for row in result._nodes.to_dict(orient="records"))
+    assert ids == ["n1", "n2", "n3"]
+
+
+def test_string_cypher_executes_nary_or_with_duplicate_branch_locks_specific_associativity() -> None:
+    # Companion to test_string_cypher_executes_nary_or_returns_full_union:
+    # `WHERE n.x = 1 OR n.x = 1 OR n.x = 3` has a duplicated leftmost
+    # branch.  If the binder silently dropped the rightmost branch under
+    # an associativity bug, the result would be `[n1]` only.  If it
+    # silently dropped one of the duplicates, the result is still `[n1, n3]`
+    # (correct) — so this isolates the rightmost-drop case from the
+    # any-branch-drop case the previous test covers.
+    graph = _mk_graph(
+        pd.DataFrame({
+            "id":       ["n1", "n2", "n3"],
+            "label__N": [True, True, True],
+            "x":        [1, 2, 3],
+        }),
+        pd.DataFrame({"s": [], "d": []}),
+    )
+
+    result = graph.gfql("MATCH (n:N) WHERE n.x = 1 OR n.x = 1 OR n.x = 3 RETURN n.id AS id")
+
+    ids = sorted(row["id"] for row in result._nodes.to_dict(orient="records"))
+    assert ids == ["n1", "n3"]
+
+
+def _de_morgan_fixture_graph() -> _CypherTestGraph:
+    # 4-row fixture covering all (x∈{1,2}, y∈{2,3}) combos.
+    return _mk_graph(
+        pd.DataFrame({
+            "id":       ["n1", "n2", "n3", "n4"],
+            "label__N": [True, True, True, True],
+            "x":        [1, 1, 2, 2],
+            "y":        [2, 3, 2, 3],
+        }),
+        pd.DataFrame({"s": [], "d": []}),
+    )
+
+
+@pytest.mark.parametrize("compound,distributed,expected", [
+    # NOT(A OR B) ≡ NOT(A) AND NOT(B) — both forms must return {n4}
+    (
+        "MATCH (n:N) WHERE NOT (n.x = 1 OR n.y = 2) RETURN n.id AS id",
+        "MATCH (n:N) WHERE NOT n.x = 1 AND NOT n.y = 2 RETURN n.id AS id",
+        ["n4"],
+    ),
+    # NOT(A AND B) ≡ NOT(A) OR NOT(B) — both forms must return {n2,n3,n4}
+    (
+        "MATCH (n:N) WHERE NOT (n.x = 1 AND n.y = 2) RETURN n.id AS id",
+        "MATCH (n:N) WHERE NOT n.x = 1 OR NOT n.y = 2 RETURN n.id AS id",
+        ["n2", "n3", "n4"],
+    ),
+])
+def test_string_cypher_executes_de_morgan_compositions(
+    compound: str, distributed: str, expected: List[str],
+) -> None:
+    # Each NOT-of-compound and its De-Morganed equivalent must return the
+    # same row set AND that row set must equal the hardcoded expected.
+    graph = _de_morgan_fixture_graph()
+
+    compound_result = graph.gfql(compound)
+    distributed_result = graph.gfql(distributed)
+    compound_ids = sorted(row["id"] for row in compound_result._nodes.to_dict(orient="records"))
+    distributed_ids = sorted(row["id"] for row in distributed_result._nodes.to_dict(orient="records"))
+
+    assert compound_ids == expected
+    assert distributed_ids == expected
+    assert compound_ids == distributed_ids  # De Morgan equivalence
+
+
+def test_string_cypher_executes_xor_with_null_uses_three_valued_logic() -> None:
+    # XOR + IS NULL on the 3VL fixture.  IS NULL is deterministic
+    # (NaN → TRUE, non-null → FALSE; no NULL output), so XOR's NULL
+    # comes only from the comparison branch.
+    #
+    #   n1{x=1, y=10}:   x=1=T,    y IS NULL=F → T XOR F = T → keep
+    #   n2{x=2, y=NaN}:  x=1=F,    y IS NULL=T → F XOR T = T → keep
+    #   n3{x=NaN,y=20}:  x=1=NULL, y IS NULL=F → NULL XOR F = NULL → drop
+    #   n4{x=NaN,y=NaN}: x=1=NULL, y IS NULL=T → NULL XOR T = NULL → drop
+    graph = _three_valued_logic_fixture_graph()
+
+    result = graph.gfql("MATCH (n:N) WHERE n.x = 1 XOR n.y IS NULL RETURN n.id AS id")
+
+    ids = sorted(row["id"] for row in result._nodes.to_dict(orient="records"))
+    assert ids == ["n1", "n2"]
+
+
+def test_string_cypher_executes_xor_returns_symmetric_difference() -> None:
+    # Sibling to the OR/AND/NOT runtime locks: XOR(A, B) ≡ (A AND NOT B) OR (NOT A AND B).
+    # Locks pandas-backed evaluator returns the symmetric-difference row set
+    # rather than treating XOR as OR (the boolean_expr_to_text and parse-tree
+    # tests already cover structure; this is the runtime sibling).
+    #
+    #   n1{x=1, y=2}: x=1=T, y=2=T → T XOR T = F → drop
+    #   n2{x=1, y=3}: x=1=T, y=2=F → T XOR F = T → keep
+    #   n3{x=2, y=2}: x=1=F, y=2=T → F XOR T = T → keep
+    #   n4{x=2, y=3}: x=1=F, y=2=F → F XOR F = F → drop
+    graph = _de_morgan_fixture_graph()
+
+    result = graph.gfql("MATCH (n:N) WHERE n.x = 1 XOR n.y = 2 RETURN n.id AS id")
+
+    ids = sorted(row["id"] for row in result._nodes.to_dict(orient="records"))
+    assert ids == ["n2", "n3"]
+
+
+def test_string_cypher_executes_double_negation_returns_original() -> None:
+    # NOT(NOT A) ≡ A.  Locks compound-NOT lowering doesn't drop one negation.
+    graph = _de_morgan_fixture_graph()
+
+    plain_result = graph.gfql("MATCH (n:N) WHERE n.x = 1 RETURN n.id AS id")
+    double_neg_result = graph.gfql("MATCH (n:N) WHERE NOT NOT n.x = 1 RETURN n.id AS id")
+    plain_ids = sorted(row["id"] for row in plain_result._nodes.to_dict(orient="records"))
+    double_neg_ids = sorted(row["id"] for row in double_neg_result._nodes.to_dict(orient="records"))
+
+    assert plain_ids == ["n1", "n2"]
+    assert double_neg_ids == plain_ids
+
+
+def test_string_cypher_executes_mixed_string_numeric_and_inside_or() -> None:
+    # `WHERE (n.s > 'a' AND n.x > 0) OR n.x < -1` — exercises the
+    # `_StringAllowingComparisonMixin` (#1217: extended GT/LT/GE/LE/NE
+    # to strings) paired with OR composition.  The string GT branch
+    # `n.s > 'a'` is the mixin-specific path; plain EQ on strings was
+    # already supported pre-#1217 and would not exercise the mixin.
+    # Truth table over the 5-row fixture:
+    #   n1{s='b', x=5}:  ('b'>'a' AND 5>0)=T;  T OR (5<-1)=T → keep
+    #   n2{s='b', x=-5}: ('b'>'a' AND -5>0)=F; F OR (-5<-1)=T → keep
+    #   n3{s='a', x=5}:  ('a'>'a' AND 5>0)=F;  F OR (5<-1)=F → drop
+    #   n4{s='a', x=-5}: ('a'>'a' AND -5>0)=F; F OR (-5<-1)=T → keep
+    #   n5{s='a', x=0}:  ('a'>'a' AND 0>0)=F;  F OR (0<-1)=F → drop
+    graph = _mk_graph(
+        pd.DataFrame({
+            "id":       ["n1", "n2", "n3", "n4", "n5"],
+            "label__N": [True, True, True, True, True],
+            "s":        ["b", "b", "a", "a", "a"],
+            "x":        [5, -5, 5, -5, 0],
+        }),
+        pd.DataFrame({"s": [], "d": []}),
+    )
+
+    result = graph.gfql(
+        "MATCH (n:N) WHERE (n.s > 'a' AND n.x > 0) OR n.x < -1 RETURN n.id AS id"
+    )
+
+    ids = sorted(row["id"] for row in result._nodes.to_dict(orient="records"))
+    assert ids == ["n1", "n2", "n4"]
+
+
 @pytest.mark.parametrize(
     "query,expected_rows",
     [