Skip to content

Commit 4253ef6

Browse files
lmeyerovclaude
andauthored
test+docs(#1219): residual row-boolean compositional matrix + guardrails (#1227)
* test+docs(#1219): residual row-boolean compositional matrix + guardrails Worker B / independent-hardening stream of #1219. After #1217's Earley swap surfaced row-boolean shapes (OR/NOT/XOR among row predicates) that LALR rejected, four compositional shapes remained unverified beyond the fixtures #1217 covered. This PR locks empirical correctness across the residual matrix + adds two lightweight guardrail comments. ## Compositional matrix tests (test_lowering.py) All four shapes verified correct empirically; locked with sorted-id assertions against discriminating fixtures: 1. **Nullable NOT/OR** — `WHERE NOT n.x = 1 OR n.y IS NULL` against a 4-row fixture mixing real and projected nulls. Locks pandas-backed row-evaluator preserves Cypher 3VL truth table (NULL OR T = T): `{n2, n4}`. 2. **N-ary OR (3 branches)** — `WHERE n.x = 1 OR n.x = 2 OR n.x = 3`. Locks left-associative parse `or(or(=1, =2), =3)` doesn't degenerate under associativity bugs. `{n1, n2, n3}`. 3. **De Morgan compositions** (parametrized × 4) — both `NOT(A OR B)` ≡ `NOT-A AND NOT-B` and `NOT(A AND B)` ≡ `NOT-A OR NOT-B` against a 4-row fixture covering all (x∈{1,2}, y∈{2,3}) combos. Each form and its De-Morganed equivalent return the same row set. 4. **Mixed-string-numeric AND inside OR** — `WHERE (n.s = 'a' AND n.x > 0) OR n.x < -1`, exercising `_StringAllowingComparisonMixin` (#1217) paired with OR composition. `{n1, n2, n4}`. ## Guardrails - `expr_split.py::split_top_level_and` — added load-bearing AND-only docstring with #1219 cross-ref explaining why a sibling `split_top_level_or` would break OR-distributivity-over-join correctness. No future maintainer should accidentally add it without first redesigning topology-aware pushdown safety. - `_boolean_expr_text.py::boolean_expr_to_text` — added explicit `if expr.op == "pattern"` branch with docstring. Currently unreachable in production (lift step extracts pattern leaves before the binder walks the tree) but documents the contract: emit the raw pattern source rather than silently falling through to empty string. ## Test impact Validated on dgx-spark: `graphistry/tests/compute/` → 2524 passed (7 new tests; remaining delta from baseline absorbed by #1224's unrelated additions). Closes the residual frontier portion of #1219. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test+docs(#1219): wave-1 review fixes — discriminating fixtures, equivalence assertions, pattern-op unit test, mirrored guard Wave-1 review on c196393 surfaced 3 IMPORTANTs + 5 SUGGESTIONs. Addressed: 1. N-ary OR test now has a companion (duplicate-leftmost-branch) that isolates the rightmost-drop associativity bug from the any-branch- drop case. Comment rephrased to be honest about what the original test covers (any branch dropped, not specifically rightmost). 2. De Morgan parametrize restructured: paired (compound, distributed, expected) tuples instead of independent rows. Now asserts: - compound matches expected - distributed matches expected - compound == distributed (the actual De Morgan equivalence) Added separate double-negation test (NOT NOT A ≡ A). 3. New test_boolean_expr_to_text_emits_atom_text_for_pattern_op in test_boolean_expr.py exercises the explicit pattern branch added in c196393. Locks the contract even though the branch is currently unreachable in production (lift step extracts pattern leaves before the binder walks the tree). 4. Mirrored AND-only guard comment near _split_conjuncts in predicate_pushdown.py — that's where future maintainers actually look when adding pushdown features; the load-bearing rationale stays in expr_split.py's docstring. Test counts on dgx-spark: 2525 passed (was 2524 + 1 pattern_op unit test; net 8 new tests in PR diff vs master). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test+docs(#1219): wave-2 review fixes — accurate docstring, inline pattern, terser comments Wave-2 (2a targeted + 2b broad) found 1 IMPORTANT + several SUGGESTIONs. IMPORTANT (Wave-2b): - expr_split.split_top_level_and docstring overstated the topology argument. Original draft claimed distribute-OR breaks on fan-out topologies; the cross-alias-OR test in test_lowering.py:3206-3211 (added in #1217) explicitly confirms distribute-OR converges to correct row-set semantics on a 2:2 fan-out fixture. The actual reason pushdown leaves OR opaque is that the multi-alias references on the conjunct cause it to be retained post-join. Rewrote docstring to describe the real mechanism + cite the test. SUGGESTIONs (Wave-2a + 2b): - _ids_for(graph, query) helper inlined to match the established inline-sorted-comprehension style used elsewhere in test_lowering.py (12+ existing call sites). Removes inconsistency within this PR. - Pattern-op branch comment in _boolean_expr_text.py compressed from 9 lines to 3 — the verbose explanation duplicated the test docstring. - Local-variable assignment for graph.gfql() result kept (matches existing test patterns + works around a Plottable.gfql pyright attribute warning that fires on fluent chains). Test counts unchanged: 8 new tests; full gfql suite 1581 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test+docs(#1219): wave-3b fixes — accurate split-OR rationale + XOR runtime sibling Wave-3b from-scratch fresh-eyes review found 1 IMPORTANT + 3 SUGGESTIONs. Addressed: - IMPORTANT: split_top_level_and docstring's stated mechanism ('multi-alias refs cause retention post-join') was misleading. Real mechanism: pushdown silently AND-recombines the pushed conjuncts inside PatternMatch.predicates. AND distributes (split + AND-recombine ≡ original); OR does not (split + AND-recombine ≠ original). An OR-aware split would need a UNION-of-pushed-branches recombine path the current pipeline doesn't implement. Rewrote the docstring with the accurate split + AND-recombine rationale. - SUGGESTION 3: XOR runtime row-set test (sibling to OR/AND/NOT tests this PR adds). Locks symmetric-difference semantics. Skipped: - SUGGESTION 4 (NOT IS NULL standalone): nullable_not_or already exercises IS NULL composed with NOT/OR; standalone marginal. - SUGGESTION 5 (cuDF parametrize): real scope creep, defer. Test counts: 9 new tests; full gfql suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test+docs(#1219): wave-4 polish — accurate mixin claim, XOR with NULL, mirrored-guard clarification Wave-4 (4a targeted + 4b from-scratch) — 0 BLOCKER + 0 IMPORTANT, 9 SUGGESTIONs. Picked up the cheapest valuable ones: - Wave-4b S1: mixed-string-numeric AND-inside-OR test claimed to exercise _StringAllowingComparisonMixin but used `n.s = 'a'` (plain EQ, supported pre-#1217). Swapped to `n.s > 'a'` (string GT, the mixin-specific path); flipped fixture s-values to keep the same truth-table outcomes. - Wave-4b S3: added test_string_cypher_executes_xor_with_null_uses_three_valued_logic — sibling to the OR/NOT 3VL test, locks NULL XOR T = NULL. Reuses the 3VL fixture from the De Morgan tests. - Wave-4a S1: `_split_conjuncts` mirrored guard now names the actual failure mode (`_combine_conjuncts` AND-joins residuals) before pointing to the fuller rationale in expr_split. Skipped (out of scope or duplicate): - Wave-4b S2: rightmost-only discriminator comment is already honest; test verified correct. - Wave-4b S4: cross-alias OR / cross-product fixture — already covered by test_string_cypher_executes_cross_alias_or_returns_correct_union (#1217). - Wave-4b S6: optional CHANGELOG bullet for test+docs PR. - Wave-4a S2 (OR-analyzer in pushdown_safety.py:58-60): pre-existing unrelated, separate followup. Test counts: 10 new tests; full gfql suite green (1583 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test+docs(#1219): wave-5 polish — extract 3VL fixture helper, clarify pattern-op unreachability Wave-5 formal review (multi-dim per .agents/skills/review/SKILL.md + adversarial pressure-test) found 0 BLOCKER + 0 IMPORTANT, 3 SUGGESTIONs of which 2 were actionable + cheap: - Extract `_three_valued_logic_fixture_graph()` helper paralleling `_de_morgan_fixture_graph()` — eliminates byte-identical 4-row NaN- mixed fixture duplication between nullable_not_or and xor_with_null. - Clarify pattern-op branch comment in `_boolean_expr_text.py` to name BOTH unreachability paths (top-level AND lift + nested-NOT/OR/XOR E108 rejection) instead of just the first. - Adversarial-rejected: malformed-NOT-chain test would be a tautology (right-recursive grammar makes depth-N equivalent to depth-2; existing double-negation test already exercises the path). Test counts unchanged: 11 new tests; full gfql suite 1583 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(changelog): add #1227 row-boolean residual matrix entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 202abd3 commit 4253ef6

6 files changed

Lines changed: 253 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
3030
- **Polars support**: `polars.DataFrame` and `polars.LazyFrame` now work in `plot()`, `materialize_nodes()`, `get_degrees()`, `get_indegrees()`, `get_outdegrees()`, and `hypergraph()`. Polars is an optional dependency — no behavior change when not installed. Upload path uses efficient Arrow conversion (`to_arrow()` with schema-metadata stripping and memoization); compute/hypergraph paths coerce to pandas at entry. `LazyFrame` is materialized via `.collect()` at each boundary. Adds `test_polars.py` with 17 tests; skips gracefully when polars is absent (#1133).
3131

3232
### Internal
33+
- **GFQL / Cypher row-boolean residual matrix + guardrails (#1219 hardening)**: Locks compositional row-boolean WHERE shapes that #1217's Earley swap admitted but its initial test surface didn't cover. Adds 11 native tests: nullable NOT/OR over a 4-row 3VL fixture (`NULL OR T = T`); N-ary OR (3 branches) + duplicate-branch companion isolating rightmost-drop associativity bugs; De Morgan equivalences (`NOT (A OR B)` ≡ `NOT A AND NOT B`; `NOT (A AND B)` ≡ `NOT A OR NOT B`) parametrized to assert both per-form expected rows AND the form-equivalence; double negation; XOR symmetric difference + XOR with NULL preserving 3VL; mixed-string-numeric AND inside OR exercising `_StringAllowingComparisonMixin` GT path; unit test locking `boolean_expr_to_text(BooleanExpr(op="pattern", ...))` round-trip for the (currently unreachable) defensive branch. Three docstring guardrails: `expr_split.split_top_level_and` documents AND-only intent + the `_combine_conjuncts` AND-recombine mechanism that makes a hypothetical `split_top_level_or` silently incorrect; `predicate_pushdown._split_conjuncts` mirrored guard naming the failure mode; `_boolean_expr_text.boolean_expr_to_text` explicit `op == "pattern"` branch with both unreachability paths documented. No production-code behavior change. Closes the residual-frontier portion of #1219; deeper compositional shapes beyond current fixtures remain tracked under that issue (#1219, #1227).
3334
- **GFQL / Cypher parser + ast_normalizer — multi-positive WHERE pattern predicates (#1031 slice 3)**: AND-joined positive WHERE pattern predicates (`WHERE (n)-[:R]->() AND (n)-[:T]->()`) now lift into structured `WhereClause.predicates` as N `WherePatternPredicate` entries. The ast_normalizer packs them into a single appended `MatchClause` whose `patterns: Tuple[Tuple[PatternElement, ...], ...]` carries one tuple per predicate (multi-pattern cartesian within MATCH), preserving the lowering invariant that only the FINAL match is connected — pre-binding seeds remain node-only. Per-predicate validation (must include a relationship; cannot introduce new aliases) runs independently before the lift. Removes the legacy `len(pattern_leaves) > 1` gate in `parser.py::_build_where_with_pattern_lift` and the corresponding gate in `ast_normalizer._rewrite_where_pattern_predicates_to_matches`. Refactors `pattern_atom` to split the greedy `WHERE_PATTERN` lexer token (which gobbles `pattern AND pattern AND ...` chains as a single match) back into individual pattern-item texts via `_WHERE_PATTERN_ITEM_RE.finditer` and emit one `BooleanExpr(op="pattern")` per item, joined by an AND-tree via `_rebuild_and_tree`. Adds `test_gfql_executes_multi_positive_where_pattern_predicates_as_intersected_seed` and updates the legacy rejection test to assert the new lift + compile shape. Closes #1031 slice 3 (#1031).
3435
- **GFQL / Cypher lowering**: Connected `MATCH + OPTIONAL MATCH` compilation now supports row-boolean `WHERE` expressions (`OR`/`NOT`/`XOR` and mixed row predicates) by carrying non-lowerable expressions into post-binding `where_rows(...)` filters for base and optional arms, preserving null-extension behavior while expanding supported disjunction shapes (#1219, #1224).
3536
- **GFQL / Cypher parser**: switched the Cypher parser from Lark's LALR(1) backend to Earley. *Earley's broader unification incidentally lifts the implicit LALR rejection on row-side OR/NOT/XOR among row predicates. Coverage validated empirically across the risky shapes available to current fixtures: simple homogeneous AND/OR/NOT (correct rows); cross-alias OR with predicate-pushdown candidates (correct union — pushdown leaves the OR intact past the join); OPTIONAL MATCH + WHERE OR (the pre-existing OPTIONAL-MATCH-projection validator gates the projection shape regardless of WHERE — including the OR variant); type-coerced OR against a mixed-type Series (the call executor wraps pandas's `TypeError` as `GFQLTypeError(E303)` via the generic unsupported-row-expression path). No silent wrong-rows surfaced in the shapes exercised; deeper compositional shapes (NOT inside OR with nullable arms, N-ary OR associativity, mixed-string-numeric AND inside OR, De Morgan compositions) are tracked under #1219.* This eliminates four LALR-induced workarounds: the 3 dedicated pattern-shape `where_clause` grammar alternatives (now collapsed into a single `WHERE_PATTERN -> pattern_atom` leaf in `?primary`), `_canonicalize_where_single_pattern_and_expr` (regex source-rewrite that reordered `expr AND pattern AND expr` to `pattern AND <rest>` so LALR could match), `_mixed_where_pattern_expr_error` (pre-flight rejector replaced with a structural lift in `generic_where_clause`), and the `parse_cypher` `except LarkError` retry block. `BooleanExpr.op` literal extended with `"pattern"` plus a new `BooleanExpr.pattern` payload field; the `pattern_atom` transformer wraps `WHERE_PATTERN` tokens as boolean-tree leaves; `_split_top_level_and_pattern_leaves` + `_rebuild_and_tree` + `_build_where_with_pattern_lift` extract pattern leaves from `expr_tree` into `WhereClause.predicates` as `WherePatternPredicate` entries before lowering. Strict-improvement consequences (Earley accepts what LALR rejected): `WHERE expr OR expr` now parses as a structured `or` tree; `WHERE expr AND (expr OR expr)` parses as `and(left, or(...))`; `WHERE n:Label AND n.prop = X` routes through structured `where_predicates`; mixed label+property+string-comparison shapes work via the paired `_StringAllowingComparisonMixin` fix in `comparison.py`. Slice 2/3/4 of #1200 territory (NOT-pattern, multi-positive-pattern, OR/XOR-around-pattern) emit explicit `unsupported` errors at the lift step. 1551 GFQL tests pass; the matching `tck-gfql` branch (`issue-1031-grammar-mixed-where-pattern-expr`) carries 8 paired contract refinements: `match-where1-10` lifted to UNEXPECTED_SUCCESS; `match-where5-{1,2,3}` + `expr-comparison2-1` + `with-where5-3` migrated to a new TYPE_ERROR_KEYS bucket (string-comparison-on-mixed-Series wraps as `GFQLTypeError(E303)`); `match-where5-4` upgraded to MATCHES_EXPECTED (Earley + 3-valued OR makes `WHERE i.var > 'te' OR i.var IS NOT NULL` semantically correct); `with2-1` filed under WRONG_ROW_KEYS (WITH-pipelined join now parses + executes but rows differ from oracle — separate gap); `with-where5-3` demoted from PROMOTION_ROW_KEYS (#1031, #1217).

graphistry/compute/gfql/cypher/_boolean_expr_text.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,13 @@ def boolean_expr_to_text(expr: BooleanExpr) -> str:
5858
"""
5959
if expr.op == "atom":
6060
return expr.atom_text or ""
61+
if expr.op == "pattern":
62+
# Unreachable today: top-level AND leaves are lifted out by
63+
# ``_split_top_level_and_pattern_leaves`` before the binder walks
64+
# the tree, and patterns nested under NOT/OR/XOR are rejected
65+
# earlier with E108 errors. Contract for the defensive branch:
66+
# emit raw pattern source for round-trippability.
67+
return expr.atom_text or ""
6168
if expr.op == "not":
6269
operand = boolean_expr_to_text(expr.left) if expr.left is not None else ""
6370
if expr.left is not None and expr.left.op != "atom":

graphistry/compute/gfql/expr_split.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,21 @@ def split_top_level_and(expr: str) -> Tuple[str, ...]:
2525
them do not split. Leading and trailing whitespace on each term is
2626
stripped.
2727
28+
**AND-only by design.** Do NOT add a sibling ``split_top_level_or``.
29+
The pushdown pipeline (``predicate_pushdown._push_filter_into_pattern``)
30+
splits a filter into conjuncts, decides per-conjunct whether to
31+
push, and silently AND-combines the pushed conjuncts inside
32+
``PatternMatch.predicates``. That contract is correct for AND
33+
(split + AND-recombine is the identity on the original AND-tree)
34+
but wrong for OR: splitting ``a.x = 1 OR b.y = 2`` and AND-recombining
35+
yields ``a.x = 1 AND b.y = 2``, which is a strict subset of the
36+
correct answer. An OR-aware split would need a UNION-of-pushed-
37+
branches recombine path (with row-multiplicity / dedup logic) that
38+
the current pipeline does not implement. See #1219 for the design
39+
space. Cross-alias OR conjuncts route through the per-pattern
40+
pushdown intact today (verified by
41+
``test_string_cypher_executes_cross_alias_or_returns_correct_union``).
42+
2843
:param expr: The expression text to split (typically a WHERE body).
2944
:returns: A tuple of non-empty terms. ``()`` when *expr* is empty,
3045
whitespace-only, has a leading/trailing top-level ``AND``, or

graphistry/compute/gfql/passes/predicate_pushdown.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,12 @@ def _optional_arm_aliases(pattern: PatternMatch) -> FrozenSet[str]:
129129

130130

131131
def _split_conjuncts(predicate: BoundPredicate) -> List[BoundPredicate]:
132-
"""Split ``A AND B`` into top-level conjunct predicates."""
132+
"""Split ``A AND B`` into top-level conjunct predicates.
133+
134+
AND-only — splitting an OR here would be silently wrong because
135+
``_combine_conjuncts`` (below) AND-joins residuals. See
136+
``expr_split.split_top_level_and`` docstring for the full rationale.
137+
"""
133138
expression = predicate.expression.strip()
134139
if not expression:
135140
return []

graphistry/tests/compute/gfql/cypher/test_boolean_expr.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -219,3 +219,28 @@ def test_literal_boolean_atoms_known_limitation_python_style_text() -> None:
219219
assert tree.left.atom_text == "True" # known limitation — see docstring
220220
# Right operand is a comparable with a Lark Tree span — accurate slice.
221221
assert tree.right is not None and tree.right.atom_text == "n.x > 1"
222+
223+
224+
# ---------------------------------------------------------------------------
225+
# boolean_expr_to_text contract for the op == "pattern" branch
226+
# ---------------------------------------------------------------------------
227+
228+
229+
def test_boolean_expr_to_text_emits_atom_text_for_pattern_op() -> None:
230+
# Pattern leaves are normally lifted out of expr_tree by
231+
# _split_top_level_and_pattern_leaves before the binder walks the
232+
# tree, so this branch is unreachable in production. The unit test
233+
# locks the contract explicitly so a future code path that DOES
234+
# reach boolean_expr_to_text with a pattern leaf gets the raw
235+
# pattern source rather than the empty-string fallthrough.
236+
from graphistry.compute.gfql.cypher.ast import SourceSpan
237+
from graphistry.compute.gfql.cypher._boolean_expr_text import boolean_expr_to_text
238+
239+
span = SourceSpan(line=1, column=1, end_line=1, end_column=10, start_pos=0, end_pos=10)
240+
pattern_leaf = BooleanExpr(
241+
op="pattern",
242+
span=span,
243+
atom_text="(a)-->(b)",
244+
atom_span=span,
245+
)
246+
assert boolean_expr_to_text(pattern_leaf) == "(a)-->(b)"

graphistry/tests/compute/gfql/cypher/test_lowering.py

Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3324,6 +3324,205 @@ def test_string_cypher_executes_homogeneous_or_returns_correct_union() -> None:
33243324
assert ids == ["n1", "n3"]
33253325

33263326

3327+
# Compositional row-boolean shapes (#1219 residual matrix). Each shape locks
3328+
# Cypher 3VL semantics + boolean-tree composition correctness against
3329+
# fixtures designed to discriminate against subtle bugs.
3330+
3331+
3332+
def _three_valued_logic_fixture_graph() -> _CypherTestGraph:
3333+
# 4-row fixture mixing actual and projected NaN over (x, y) for 3VL tests.
3334+
return _mk_graph(
3335+
pd.DataFrame({
3336+
"id": ["n1", "n2", "n3", "n4"],
3337+
"label__N": [True, True, True, True],
3338+
"x": [1.0, 2.0, float("nan"), float("nan")],
3339+
"y": [10.0, float("nan"), 20.0, float("nan")],
3340+
}),
3341+
pd.DataFrame({"s": [], "d": []}),
3342+
)
3343+
3344+
3345+
def test_string_cypher_executes_nullable_not_or_uses_three_valued_logic() -> None:
3346+
# `WHERE NOT n.x = 1 OR n.y IS NULL` against a fixture mixing actual
3347+
# and projected nulls. Cypher 3VL truth table:
3348+
# n1{x=1, y=10}: NOT(1=1)=F, y IS NULL=F → F OR F = F → drop
3349+
# n2{x=2, y=NaN}: NOT(2=1)=T, y IS NULL=T → T OR T = T → keep
3350+
# n3{x=NaN,y=20}: NOT(NaN=1)=NULL, y IS NULL=F → NULL OR F = NULL → drop
3351+
# n4{x=NaN,y=NaN}: NOT(NaN=1)=NULL, y IS NULL=T → NULL OR T = T → keep
3352+
# Locks that the pandas-backed row-evaluator preserves NULL OR T = T.
3353+
graph = _three_valued_logic_fixture_graph()
3354+
3355+
result = graph.gfql("MATCH (n:N) WHERE NOT n.x = 1 OR n.y IS NULL RETURN n.id AS id")
3356+
3357+
ids = sorted(row["id"] for row in result._nodes.to_dict(orient="records"))
3358+
assert ids == ["n2", "n4"]
3359+
3360+
3361+
def test_string_cypher_executes_nary_or_returns_full_union() -> None:
3362+
# `WHERE n.x = 1 OR n.x = 2 OR n.x = 3` — three OR branches against a
3363+
# 5-row fixture where each value matches a unique row. Locks that the
3364+
# binder's parse evaluates ALL branches; silently dropping any one
3365+
# branch yields a 2-row result and fails the assertion.
3366+
graph = _mk_graph(
3367+
pd.DataFrame({
3368+
"id": ["n1", "n2", "n3", "n4", "n5"],
3369+
"label__N": [True, True, True, True, True],
3370+
"x": [1, 2, 3, 4, 5],
3371+
}),
3372+
pd.DataFrame({"s": [], "d": []}),
3373+
)
3374+
3375+
result = graph.gfql("MATCH (n:N) WHERE n.x = 1 OR n.x = 2 OR n.x = 3 RETURN n.id AS id")
3376+
3377+
ids = sorted(row["id"] for row in result._nodes.to_dict(orient="records"))
3378+
assert ids == ["n1", "n2", "n3"]
3379+
3380+
3381+
def test_string_cypher_executes_nary_or_with_duplicate_branch_locks_specific_associativity() -> None:
3382+
# Companion to test_string_cypher_executes_nary_or_returns_full_union:
3383+
# `WHERE n.x = 1 OR n.x = 1 OR n.x = 3` has a duplicated leftmost
3384+
# branch. If the binder silently dropped the rightmost branch under
3385+
# an associativity bug, the result would be `[n1]` only. If it
3386+
# silently dropped one of the duplicates, the result is still `[n1, n3]`
3387+
# (correct) — so this isolates the rightmost-drop case from the
3388+
# any-branch-drop case the previous test covers.
3389+
graph = _mk_graph(
3390+
pd.DataFrame({
3391+
"id": ["n1", "n2", "n3"],
3392+
"label__N": [True, True, True],
3393+
"x": [1, 2, 3],
3394+
}),
3395+
pd.DataFrame({"s": [], "d": []}),
3396+
)
3397+
3398+
result = graph.gfql("MATCH (n:N) WHERE n.x = 1 OR n.x = 1 OR n.x = 3 RETURN n.id AS id")
3399+
3400+
ids = sorted(row["id"] for row in result._nodes.to_dict(orient="records"))
3401+
assert ids == ["n1", "n3"]
3402+
3403+
3404+
def _de_morgan_fixture_graph() -> _CypherTestGraph:
3405+
# 4-row fixture covering all (x∈{1,2}, y∈{2,3}) combos.
3406+
return _mk_graph(
3407+
pd.DataFrame({
3408+
"id": ["n1", "n2", "n3", "n4"],
3409+
"label__N": [True, True, True, True],
3410+
"x": [1, 1, 2, 2],
3411+
"y": [2, 3, 2, 3],
3412+
}),
3413+
pd.DataFrame({"s": [], "d": []}),
3414+
)
3415+
3416+
3417+
@pytest.mark.parametrize("compound,distributed,expected", [
3418+
# NOT(A OR B) ≡ NOT(A) AND NOT(B) — both forms must return {n4}
3419+
(
3420+
"MATCH (n:N) WHERE NOT (n.x = 1 OR n.y = 2) RETURN n.id AS id",
3421+
"MATCH (n:N) WHERE NOT n.x = 1 AND NOT n.y = 2 RETURN n.id AS id",
3422+
["n4"],
3423+
),
3424+
# NOT(A AND B) ≡ NOT(A) OR NOT(B) — both forms must return {n2,n3,n4}
3425+
(
3426+
"MATCH (n:N) WHERE NOT (n.x = 1 AND n.y = 2) RETURN n.id AS id",
3427+
"MATCH (n:N) WHERE NOT n.x = 1 OR NOT n.y = 2 RETURN n.id AS id",
3428+
["n2", "n3", "n4"],
3429+
),
3430+
])
3431+
def test_string_cypher_executes_de_morgan_compositions(
3432+
compound: str, distributed: str, expected: List[str],
3433+
) -> None:
3434+
# Each NOT-of-compound and its De-Morganed equivalent must return the
3435+
# same row set AND that row set must equal the hardcoded expected.
3436+
graph = _de_morgan_fixture_graph()
3437+
3438+
compound_result = graph.gfql(compound)
3439+
distributed_result = graph.gfql(distributed)
3440+
compound_ids = sorted(row["id"] for row in compound_result._nodes.to_dict(orient="records"))
3441+
distributed_ids = sorted(row["id"] for row in distributed_result._nodes.to_dict(orient="records"))
3442+
3443+
assert compound_ids == expected
3444+
assert distributed_ids == expected
3445+
assert compound_ids == distributed_ids # De Morgan equivalence
3446+
3447+
3448+
def test_string_cypher_executes_xor_with_null_uses_three_valued_logic() -> None:
3449+
# XOR + IS NULL on the 3VL fixture. IS NULL is deterministic
3450+
# (NaN → TRUE, non-null → FALSE; no NULL output), so XOR's NULL
3451+
# comes only from the comparison branch.
3452+
#
3453+
# n1{x=1, y=10}: x=1=T, y IS NULL=F → T XOR F = T → keep
3454+
# n2{x=2, y=NaN}: x=1=F, y IS NULL=T → F XOR T = T → keep
3455+
# n3{x=NaN,y=20}: x=1=NULL, y IS NULL=F → NULL XOR F = NULL → drop
3456+
# n4{x=NaN,y=NaN}: x=1=NULL, y IS NULL=T → NULL XOR T = NULL → drop
3457+
graph = _three_valued_logic_fixture_graph()
3458+
3459+
result = graph.gfql("MATCH (n:N) WHERE n.x = 1 XOR n.y IS NULL RETURN n.id AS id")
3460+
3461+
ids = sorted(row["id"] for row in result._nodes.to_dict(orient="records"))
3462+
assert ids == ["n1", "n2"]
3463+
3464+
3465+
def test_string_cypher_executes_xor_returns_symmetric_difference() -> None:
3466+
# Sibling to the OR/AND/NOT runtime locks: XOR(A, B) ≡ (A AND NOT B) OR (NOT A AND B).
3467+
# Locks pandas-backed evaluator returns the symmetric-difference row set
3468+
# rather than treating XOR as OR (the boolean_expr_to_text and parse-tree
3469+
# tests already cover structure; this is the runtime sibling).
3470+
#
3471+
# n1{x=1, y=2}: x=1=T, y=2=T → T XOR T = F → drop
3472+
# n2{x=1, y=3}: x=1=T, y=2=F → T XOR F = T → keep
3473+
# n3{x=2, y=2}: x=1=F, y=2=T → F XOR T = T → keep
3474+
# n4{x=2, y=3}: x=1=F, y=2=F → F XOR F = F → drop
3475+
graph = _de_morgan_fixture_graph()
3476+
3477+
result = graph.gfql("MATCH (n:N) WHERE n.x = 1 XOR n.y = 2 RETURN n.id AS id")
3478+
3479+
ids = sorted(row["id"] for row in result._nodes.to_dict(orient="records"))
3480+
assert ids == ["n2", "n3"]
3481+
3482+
3483+
def test_string_cypher_executes_double_negation_returns_original() -> None:
3484+
# NOT(NOT A) ≡ A. Locks compound-NOT lowering doesn't drop one negation.
3485+
graph = _de_morgan_fixture_graph()
3486+
3487+
plain_result = graph.gfql("MATCH (n:N) WHERE n.x = 1 RETURN n.id AS id")
3488+
double_neg_result = graph.gfql("MATCH (n:N) WHERE NOT NOT n.x = 1 RETURN n.id AS id")
3489+
plain_ids = sorted(row["id"] for row in plain_result._nodes.to_dict(orient="records"))
3490+
double_neg_ids = sorted(row["id"] for row in double_neg_result._nodes.to_dict(orient="records"))
3491+
3492+
assert plain_ids == ["n1", "n2"]
3493+
assert double_neg_ids == plain_ids
3494+
3495+
3496+
def test_string_cypher_executes_mixed_string_numeric_and_inside_or() -> None:
3497+
# `WHERE (n.s > 'a' AND n.x > 0) OR n.x < -1` — exercises the
3498+
# `_StringAllowingComparisonMixin` (#1217: extended GT/LT/GE/LE/NE
3499+
# to strings) paired with OR composition. The string GT branch
3500+
# `n.s > 'a'` is the mixin-specific path; plain EQ on strings was
3501+
# already supported pre-#1217 and would not exercise the mixin.
3502+
# Truth table over the 5-row fixture:
3503+
# n1{s='b', x=5}: ('b'>'a' AND 5>0)=T; T OR (5<-1)=T → keep
3504+
# n2{s='b', x=-5}: ('b'>'a' AND -5>0)=F; F OR (-5<-1)=T → keep
3505+
# n3{s='a', x=5}: ('a'>'a' AND 5>0)=F; F OR (5<-1)=F → drop
3506+
# n4{s='a', x=-5}: ('a'>'a' AND -5>0)=F; F OR (-5<-1)=T → keep
3507+
# n5{s='a', x=0}: ('a'>'a' AND 0>0)=F; F OR (0<-1)=F → drop
3508+
graph = _mk_graph(
3509+
pd.DataFrame({
3510+
"id": ["n1", "n2", "n3", "n4", "n5"],
3511+
"label__N": [True, True, True, True, True],
3512+
"s": ["b", "b", "a", "a", "a"],
3513+
"x": [5, -5, 5, -5, 0],
3514+
}),
3515+
pd.DataFrame({"s": [], "d": []}),
3516+
)
3517+
3518+
result = graph.gfql(
3519+
"MATCH (n:N) WHERE (n.s > 'a' AND n.x > 0) OR n.x < -1 RETURN n.id AS id"
3520+
)
3521+
3522+
ids = sorted(row["id"] for row in result._nodes.to_dict(orient="records"))
3523+
assert ids == ["n1", "n2", "n4"]
3524+
3525+
33273526
@pytest.mark.parametrize(
33283527
"query,expected_rows",
33293528
[

0 commit comments

Comments
 (0)