Skip to content

Commit c069e8f

Browse files
authored
fix(gfql): harden optional-arm OR null-safety token detection (#1219) (#1286)
1 parent bf9755c commit c069e8f

4 files changed

Lines changed: 49 additions & 3 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
3333
- **GFQL / Cypher lowering — #1273 acceptance slice admits scalar multi-alias WITH shape A**: First-stage bounded reentry now admits `MATCH ... WITH a.id AS a_id, b.id AS b_id ... MATCH ...` multi-alias scalar/property projections (shape A) while keeping whole-row multi-alias carries (for example `WITH a, b`) outside the admitted lane. The admission gate is explicitly narrowed to scalar/property projections only, so known TCK xfail contract shape `WITH n, x` (`with-where3-3`) remains fail-fast and unchanged pending broader #1273 closure. Tests were converted to positive execution assertions for the admitted shape, and remaining rejected shapes continue to point explicitly to #1273.
3434
- **GFQL / Cypher row-boolean residual topology amplification (#1219)**: Added two cross-alias OR stress tests in `test_lowering.py` for previously under-covered topology classes: (1) disconnected cartesian-product `MATCH (a:A), (b:B)` with `WHERE a.x = 1 OR b.y = 2`, and (2) broader two-hop fanout `MATCH (a:A)-->(m:M)-->(b:B)` with the same disjunction. These lock row-wise disjunction semantics over wider join/fanout shapes on top of the existing 2x2 cross-alias union test.
3535
- **GFQL / predicate pushdown safety — quote-aware null-safety classification hardening (#1219)**: `is_null_rejecting()` now masks quoted/backticked segments before scanning for boolean operators and null-safe forms (`IS NULL`, `IS NOT NULL`, `COALESCE`, `NULLIF`). This closes false-safe substring matches like `n.note = 'x is not null'` / `'coalesce('` that could previously be misclassified as null-safe for OPTIONAL-arm pushdown. Added regression coverage in `test_pushdown_safety.py` and pass-level lock in `test_predicate_pushdown_pass.py` to ensure quoted literal text does not trigger null-safe admission.
36+
- **GFQL / predicate pushdown safety — whitespace-delimited OR detection hardening (#1219)**: `is_null_rejecting()` now normalizes unquoted whitespace and detects boolean operators via word-boundary token checks instead of exact `" and "` / `" or "` substrings. This closes a false-safe gap where newline/tab-delimited OR compounds (for example `n IS NULL\\nOR n.age > 5`) could bypass conservative OR handling and be incorrectly pushed into OPTIONAL arms. Added regression coverage in `test_pushdown_safety.py` plus a pass-level lock in `test_predicate_pushdown_pass.py`.
3637
- **GFQL / Cypher lowering — runtime carry forwarding through trailing WITH narrowing (#1272, #999 partial)**: Fixed the IC5 sub-A collect/unwind reentry residual where carried scalars could degrade to null/NaN across a trailing `WITH d, bid` stage. Hidden reentry-property references are now excluded from one-source alias guard counting while still forcing bindings-row lowering for stages that reference hidden carry columns, so carry fields survive narrowing and remain readable in subsequent RETURN/ORDER BY. Updated regression coverage by converting the two #1272 failfast locks into positive execution assertions for both plain trailing-WITH and ORDER BY/LIMIT variants.
3738
- **GFQL / Cypher lowering — residual multi-source row-projection failfast now points to issue #1273**: Remaining unsupported one-MATCH-source row-lowering residuals now raise the same `E108` boundary with an explicit `#1273` pointer so rejected shapes route directly to the tracked closure lane. Updated rejection-lock tests assert the issue pointer is present in emitted validation messages.
3839
- **GFQL / Cypher lowering — admit simple free-form intermediate MATCH after WITH (#1263, #999 partial)**: Lifts the failfast at the `first_alias != reentry_alias` site of `_compile_bounded_reentry_query` for the conservative case: trailing MATCH whose first alias is not in the prefix WITH's carried whole-row set, with no carried-alias property reference in the trailing scope. Three-commit shape: (1) IR adds `ReentryPlan.free_form: bool = False` to mark the new shape (`graphistry/compute/gfql/cypher/reentry_plan.py`); (2) compile sets `reentry_alias = first_alias` and `non_source_alias_names = whole_row_carried` so the trailing MATCH compiles as a regular fresh-MATCH and the existing property-carry rewriter (#1248) treats every carried alias as non-source (`graphistry/compute/gfql/cypher/lowering.py`); (3) runtime adds `_compiled_query_freeform_reentry_state` in `gfql_unified.py` — for a single-prefix-row, broadcasts every `__cypher_reentry_*` hidden column from the prefix row onto every base node and returns `start_nodes=None` so the suffix runs as a global MATCH and the row pipeline carries the broadcast values through whichever alias the trailing MATCH binds. Compile-time failfast added for the carried-property variant (`free_form` AND any `<carried>.<prop>` ref in trailing scope) — the existing demote (#1071) + property-rewriter (#1248) chain double-wraps the synthesized hidden column name, and closing that case requires a rewrite-order refactor that lands as its own focused follow-up. Runtime-side multi-prefix-row free-form raises a clear failfast pointing at the multi-row follow-up. Tests retargeted: `..._failfast_rejects_intermediate_reentry_match_with_no_carried_source` → `..._failfast_rejects_intermediate_reentry_match_with_carried_property_in_trailing_where`; `..._failfast_rejects_simple_freeform_intermediate_reentry_match` → `..._executes_simple_freeform_intermediate_reentry_match`. Adds cuDF parity test and a multi-prefix-row failfast regression. Three TCK xfail-contract scenarios admitted to `MATCHES_EXPECTED` (`expr-typeconversion2-7`, `expr-typeconversion3-5`, `expr-typeconversion4-7` — all `MATCH (single-node) WITH * MATCH (n) RETURN <typecast>(n.<prop>)` shape) (#1263, #999, #989).

graphistry/compute/gfql/ir/pushdown_safety.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
"""
1515
from __future__ import annotations
1616

17+
import re
1718
from typing import FrozenSet, Sequence
1819

1920
from graphistry.compute.gfql.ir.bound_ir import ScopeFrame
@@ -62,6 +63,11 @@ def _mask_quoted_and_backticked(expr: str) -> str:
6263
return "".join(chars)
6364

6465

66+
def _normalize_unquoted_whitespace(expr: str) -> str:
67+
"""Collapse unquoted whitespace runs for stable token/substr checks."""
68+
return re.sub(r"\s+", " ", expr).strip()
69+
70+
6571
def is_null_rejecting(
6672
predicate: BoundPredicate,
6773
null_extended_aliases: FrozenSet[str],
@@ -96,14 +102,16 @@ def is_null_rejecting(
96102
return False
97103
if not predicate.expression:
98104
return True
99-
expr_lower = _mask_quoted_and_backticked(predicate.expression.lower())
105+
expr_lower = _normalize_unquoted_whitespace(
106+
_mask_quoted_and_backticked(predicate.expression.lower())
107+
)
100108
# AND compounds are always null-rejecting: even if one conjunct is null-safe,
101109
# the other may not be, and True AND NULL = NULL (row filtered).
102-
if " and " in expr_lower:
110+
if re.search(r"\band\b", expr_lower):
103111
return True
104112
# OR compounds are conservatively treated as null-rejecting; we do not
105113
# perform disjunct-level alias analysis in this helper.
106-
if " or " in expr_lower:
114+
if re.search(r"\bor\b", expr_lower):
107115
return True
108116
for form in _NULL_SAFE_FORMS:
109117
if form in expr_lower:

graphistry/tests/compute/gfql/test_predicate_pushdown_pass.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,31 @@ def test_predicate_pushdown_keeps_or_with_null_safe_subexpression_on_optional_ar
152152
assert result.predicate.expression == "n IS NOT NULL OR m.flag = 1"
153153

154154

155+
def test_predicate_pushdown_keeps_newline_delimited_or_on_optional_arm() -> None:
156+
input_scan = NodeScan(op_id=1, label="Person", output_schema=_schema("m"))
157+
optional_match = PatternMatch(
158+
op_id=2,
159+
input=input_scan,
160+
pattern={"aliases": ("n",)},
161+
optional=True,
162+
arm_id="arm1",
163+
output_schema=_schema("m", "n"),
164+
)
165+
plan = Filter(
166+
op_id=3,
167+
input=optional_match,
168+
predicate=_pred("n IS NULL\nOR n.age > 5", frozenset({"n"})),
169+
output_schema=_schema("m", "n"),
170+
)
171+
172+
result = PredicatePushdownPass().run(plan, PlanContext()).plan
173+
174+
assert isinstance(result, Filter)
175+
assert isinstance(result.input, PatternMatch)
176+
assert result.input.predicates == []
177+
assert result.predicate.expression == "n IS NULL\nOR n.age > 5"
178+
179+
155180
def test_predicate_pushdown_does_not_treat_null_keywords_inside_string_literals_as_safe() -> None:
156181
input_scan = NodeScan(op_id=1, label="Person", output_schema=_schema("m"))
157182
optional_match = PatternMatch(

graphistry/tests/compute/gfql/test_pushdown_safety.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,18 @@ def test_or_without_null_safe_form_is_rejecting(self):
127127
frozenset({"n"}),
128128
)
129129

130+
def test_or_with_newline_delimiter_is_rejecting(self):
131+
assert is_null_rejecting(
132+
_pred("n IS NULL\nOR n.age > 5", frozenset({"n"})),
133+
frozenset({"n"}),
134+
)
135+
136+
def test_or_with_tab_delimiter_is_rejecting(self):
137+
assert is_null_rejecting(
138+
_pred("n IS NULL\tOR\tn.age > 5", frozenset({"n"})),
139+
frozenset({"n"}),
140+
)
141+
130142
# --- property-level IS NULL --- #
131143

132144
def test_property_is_null_is_not_rejecting(self):

0 commit comments

Comments
 (0)