From 57622704df7f777610394beb0fb2064c71d637b8 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 May 2026 15:04:27 -0700 Subject: [PATCH 1/4] feat(cypher): forward secondary whole-row carry through chained reentry (#1256) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the chained-reentry portion of slice 4.3d (#1256) so multi-alias whole-row carry survives more than one reentry boundary. Two extensions to `_demote_secondary_whole_row_aliases` (the active rewrite path on master after #1071 superseded slice 4.3b's `_rewrite_multi_whole_row_prefix`): 1. **Forwarding-item drop** — bare-identifier projection items in downstream `WITH` stages whose name is a secondary alias are dropped at compile time *before* `_collect_secondary_property_refs` walks them for bare references. Same intent as slice 4.3c, integrated into the active path. Without this, forwarding patterns like `WITH a, x, friend, ...` tripped the bare-ref failfast even though the property carry already lives as a hidden scalar. 2. **Hidden-column forwarding** — for every `(secondary_alias, prop)` collected from trailing clauses, the synthesized `__cypher_reentry____` hidden alias is appended as a bare passthrough item to every downstream `WITH` stage so each recursive `compile_cypher_query` call sees it as a scalar carry. Without this, the inner compile failed alias resolution on the rewritten hidden identifier once a chained stage narrowed projection scope. New positive tests cover three previously-blocked shapes: * `test_string_cypher_admits_multi_stage_secondary_alias_carry_through_chained_reentry` — chained reentry where the trailing MATCH continues to use the primary * `test_string_cypher_admits_secondary_alias_carry_across_reentry_source_rebinding` — source rebound between boundaries (`(a)-[:R]->(friend) ... (friend)-[:S]->(c)`) * `test_string_cypher_admits_multi_alias_distinct_forwarding_through_reentry` — multi-alias `DISTINCT` forwarding through a single boundary New failfast test pins the remaining gap (free-form intermediate MATCH that does not start from any carried alias — the LDBC SNB IC3 prefix shape) to the existing scoped error so future closure is regression-locked. Validation: * 780 cypher lowering tests pass (was 776; +4 new). * Touched-module mypy clean. * Repro `plans/1256-cross-reentry-boundary-carry/repro/repro_ic3.py`: simple rebinding cases now compile + execute correctly; literal IC3 still fails at the free-form intermediate MATCH gate (separate follow-up). Refs #1256 (closes the chained-reentry portion; free-form intermediate MATCH remains open), #999 (IC3 partial), #989 (row-carrier IR umbrella). Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 1 + graphistry/compute/gfql/cypher/lowering.py | 45 +++++++- .../compute/gfql/cypher/test_lowering.py | 100 ++++++++++++++++++ 3 files changed, 145 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index bf864e1cfe..66e83705c4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -30,6 +30,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm - **Polars support**: `polars.DataFrame` and `polars.LazyFrame` now work in `plot()`, `materialize_nodes()`, `get_degrees()`, `get_indegrees()`, `get_outdegrees()`, and `hypergraph()`. Polars is an optional dependency — no behavior change when not installed. Upload path uses efficient Arrow conversion (`to_arrow()` with schema-metadata stripping and memoization); compute/hypergraph paths coerce to pandas at entry. `LazyFrame` is materialized via `.collect()` at each boundary. Adds `test_polars.py` with 17 tests; skips gracefully when polars is absent (#1133). ### Internal +- **GFQL / Cypher lowering — secondary whole-row carry survives chained reentry boundaries (#1256, #989, #999 partial)**: Two extensions to `_demote_secondary_whole_row_aliases` in `graphistry/compute/gfql/cypher/lowering.py` so multi-alias carry survives a chained reentry compile. (1) **Forwarding-item drop** — bare-identifier projection items in downstream `WITH` stages whose name is a secondary alias are dropped at compile time (the same intent as slice 4.3c, integrated into the post-#1071 active rewrite path). Without this, a forwarding pattern like `WITH a, x, friend, ...` triggered the bare-ref failfast even though the carry already lives as a hidden scalar on the reentry-source's row table. (2) **Hidden-column forwarding** — for every `(secondary_alias, prop)` reference collected from trailing clauses, the synthesized `__cypher_reentry____` hidden alias is appended as a bare passthrough item to every downstream `WITH` stage so each recursive `compile_cypher_query` call sees it as a scalar carry. Without this, the inner compile failed alias resolution on the rewritten hidden identifier (`Unknown Cypher alias '__cypher_reentry_x_id__' in RETURN clause`) once a chained reentry stage narrowed the projection scope. New positive tests cover (a) chained reentry where the trailing MATCH continues to use the same primary alias, (b) chained reentry where the source is rebound between boundaries (`MATCH (a)-[:R]->(friend) ... MATCH (friend)-[:S]->(c)`), and (c) multi-alias `DISTINCT` forwarding through a single boundary. New failfast test pins the remaining gap — a trailing MATCH that does not start from any carried alias (free-form intermediate MATCH, the LDBC SNB IC3 prefix shape) — to the existing scoped error so future closure of that lane is regression-locked. Closes the chained-reentry portion of #1256; the free-form intermediate-MATCH case remains open follow-up. - **GFQL / Cypher lowering — `ReentryPlan` IR + multi-alias whole-row carry slices (#989, #1026, #999 partial)**: Introduces an explicit `ReentryPlan` + `CarriedAlias` dataclass (`graphistry/compute/gfql/cypher/reentry_plan.py`) as the compile-time contract between a prefix `WITH` stage and the trailing `MATCH`, replacing the implicit handshake spread across tuple returns from `_bounded_reentry_carry_columns`, the `scalar_reentry_alias` / `scalar_reentry_columns` fields on `CompiledCypherExecutionExtras`, and runtime contract re-extraction in `_compiled_query_reentry_contract` (#987 step 1). Plan is exposed via `compiled_query.reentry_plan` and threaded through `_map_terminal_reentry_query` + `_attach_graph_context`. Builds three additional admit slices on top of the #1071 lift: (1) **slice 4.3a** lifts the residual single-whole-row gate at the compile site (`lowering.py`) and runtime contract (`gfql_unified.py`) so `WITH a, x` admits whenever only the trailing-MATCH source alias is referenced downstream — `ReentryPlan.aliases` records all whole-row aliases as `CarriedAlias` entries, with downstream non-source-alias bare references emitting an actionable failfast pointing to #989; (2) **slice 4.3b** adds a compile-time prefix rewrite (`_rewrite_multi_whole_row_prefix`) that turns `WITH a, x` into `WITH a, x.id AS __carry_x__id__` for every property of `x` referenced in trailing clauses (collected via `_collect_non_source_alias_property_refs` walking `WhereClause.expr_tree`, `ProjectionStage.where`, `ReturnItem`, `OrderItem`), and AST-rewrites trailing `.` references to property access on the reentry-alias's hidden column — closes the multi-alias case of `#1026` regression-lock (`MATCH (a), (x) WITH a, x OPTIONAL MATCH (a)-->(b) RETURN x.id, b.id` now executes correctly with left-outer-join semantics, previously raised `GFQLValidationError`); (3) **slice 4.3c** drops bare carried-alias items at compile time when downstream `WITH a, x, y, collect(...)` re-projects them, so the bare-ref failfast does not false-positive on forwarding patterns. Adds 5 new positive/structure tests + 2 failfast-scope tests, retargets one prior `pytest.raises(GFQLValidationError)` regression-lock to a positive row assertion, and adds the multi-alias `OPTIONAL MATCH` regression test `test_issue_1026_multi_alias_with_optional_match_carries_secondary_property`. Cross-reentry-boundary forwarding (carry survival across `MATCH (a)-[:KNOWS]-(friend)`, the IC3 LDBC SNB shape) remains follow-up work tracked as #999 / slice 4.3d (#989, #1026, #999, #987). - **GFQL / Cypher lowering — multi-alias carry through `WITH` before MATCH re-entry (#1071)**: Lifted the residual constraint that `WITH` before a MATCH re-entry must project exactly one whole-row node alias. The local Cypher compiler now accepts patterns like `MATCH (p)-[:KNOWS]-(friend) WITH p, friend MATCH (friend)-[:IS_LOCATED_IN]->(c) RETURN p.firstName, c.name` (LDBC SNB IC1 shape). Implementation pre-rewrites the prefix `WITH` in `_compile_bounded_reentry_query`: the trailing-MATCH primary alias is preserved as the sole whole-row carry; secondary aliases are demoted to scalar property carries via synthesized `S.X AS __cypher_reentry____` items, with downstream `S.X` references rewritten to bare hidden identifiers that compose with the existing single-alias carried-scalar machinery (#1047 / #1068). Returning a secondary alias as a whole-row entity (`RETURN s`) and re-binding a secondary alias as a node variable in the trailing MATCH remain unsupported with precise errors. Two prior failfast tests retargeted to assert correct multi-alias behavior; new tests cover IC1-shape, three-alias carry, and the secondary-whole-row-return rejection. - **GFQL / Cypher lowering — OR/XOR around pattern predicates (#1236, #1031 slice 4)**: Local Cypher `WHERE` boolean trees now support disjunctive compositions that mix pattern-existence leaves with row expressions (for example `pattern OR expr`, `pattern XOR expr`). Lowering rewrites pattern leaves into correlated `semi_apply_mark(...)` row pre-filters that emit boolean marker columns, then evaluates the rewritten boolean expression in `where_rows(...)`. Added direct row-pipeline/safelist coverage for `semi_apply_mark` plus parser/lowering runtime regression tests for OR/XOR-around-pattern query shapes. diff --git a/graphistry/compute/gfql/cypher/lowering.py b/graphistry/compute/gfql/cypher/lowering.py index 9bbb1e901d..0ac0870b99 100644 --- a/graphistry/compute/gfql/cypher/lowering.py +++ b/graphistry/compute/gfql/cypher/lowering.py @@ -7876,9 +7876,23 @@ def rewrite_text(expr: ExpressionText, field: str) -> ExpressionText: where_clause if where_clause is None else _rewrite_where_clause_and_resync(where_clause, rewrite_text, "where") for where_clause in query.reentry_wheres ) + # Slice 4.3d.1 (#1256): Drop bare-identifier projection items that simply + # forward a secondary alias (`WITH a, x, y, collect(...)` pattern in IC3 + # multi-stage chains). Their property carries already live as hidden columns + # on the reentry-source's row table; the bare items are pure forwarding + # noise. Without this pre-clean the bare-ref scanner inside + # `_collect_secondary_property_refs` would fail-fast on what is in fact a + # forwarding pattern, blocking IC3 even after #1248 admits the prefix WITH. + secondary_forwarding_re = re.compile(r"[A-Za-z_][A-Za-z0-9_]*") + cleaned_with_stages_tail = tuple( + _drop_bare_alias_items_from_stage( + stage, secondary_aliases, identifier_re=secondary_forwarding_re + ) + for stage in query.with_stages[1:] + ) rewritten_with_stages_tail = tuple( _rewrite_reentry_projection_stage(stage, rewrite_expr=rewrite_text) - for stage in query.with_stages[1:] + for stage in cleaned_with_stages_tail ) rewritten_unwinds = tuple( replace(unwind, expression=rewrite_text(unwind.expression, "unwind")) @@ -7928,6 +7942,35 @@ def rewrite_text(expr: ExpressionText, field: str) -> ExpressionText: clause=replace(prefix_stage.clause, items=tuple(new_items)), ) + # Slice 4.3d.2 (#1256): forward hidden carry columns through every downstream + # WITH stage so each recursive bounded-reentry compile sees them as scalar + # carries. Without this, the hidden column survives the first boundary (it + # is in the prefix) but the subsequent WITH/RETURN scope-narrowing drops it, + # and references like `RETURN x.id AS xid` (rewritten to a bare hidden + # identifier) fail at the inner compile's alias resolution. + if refs_collected and rewritten_with_stages_tail: + forwarded_items: List[ReturnItem] = [] + for alias_name, prop in sorted(refs_collected): + hidden_alias = _secondary_reentry_hidden_column_name(alias_name, prop) + forwarded_items.append( + ReturnItem( + expression=ExpressionText(text=hidden_alias, span=template_span), + alias=None, + span=template_span, + ) + ) + forwarded_tuple = tuple(forwarded_items) + rewritten_with_stages_tail = tuple( + replace( + stage, + clause=replace( + stage.clause, + items=stage.clause.items + forwarded_tuple, + ), + ) + for stage in rewritten_with_stages_tail + ) + rewritten_query = replace( query, with_stages=(rewritten_prefix_stage,) + rewritten_with_stages_tail, diff --git a/graphistry/tests/compute/gfql/cypher/test_lowering.py b/graphistry/tests/compute/gfql/cypher/test_lowering.py index eb8f969e22..7c63cc2430 100644 --- a/graphistry/tests/compute/gfql/cypher/test_lowering.py +++ b/graphistry/tests/compute/gfql/cypher/test_lowering.py @@ -8006,6 +8006,106 @@ def test_string_cypher_failfast_rejects_multi_whole_row_prefix_when_non_source_a _mk_multi_stage_reentry_graph().gfql(query) +def test_string_cypher_admits_multi_stage_secondary_alias_carry_through_chained_reentry() -> None: + """#1256 slice 4.3d.2: secondary alias carry survives a chained reentry boundary + where the trailing MATCH continues to use the same primary alias. + + `WITH a, x MATCH (a)-[:R]->(friend) WITH a, x, friend MATCH (a)-[:R]->(other) + RETURN other.id, x.id` requires the hidden carry column for `x.id` to be + forwarded explicitly through every downstream WITH stage so each recursive + bounded-reentry compile sees it as a scalar carry. Without slice 4.3d.2, the + inner compile fails alias resolution on the synthesized hidden identifier. + """ + query = ( + "MATCH (a:A {id: 'a'}), (x:B {id: 'b'}) " + "WITH a, x " + "MATCH (a)-[:R]->(friend) " + "WITH a, x, friend " + "MATCH (a)-[:R]->(other) " + "WHERE other.id <> friend.id " + "RETURN other.id AS oid, x.id AS xid ORDER BY oid" + ) + result = _mk_multi_stage_reentry_graph().gfql(query) + assert result._nodes.to_dict(orient="records") == [ + {"oid": "b", "xid": "b"}, + {"oid": "e", "xid": "b"}, + ] + + +def test_string_cypher_admits_secondary_alias_carry_across_reentry_source_rebinding() -> None: + """#1256 slice 4.3d.2: secondary alias carry survives a reentry-source rebinding. + + `WITH a, x MATCH (a)-[:R]->(friend) WITH friend, x MATCH (friend)-[:S]->(c) + RETURN c.id, x.id` rebinds the trailing-MATCH source from `a` to `friend` + between the two reentry boundaries. The carry `x.id` lives as a hidden + column on `a`'s row table; after the join with friend, on friend's table; + the second boundary's prefix WITH must continue to forward it as a scalar + so the inner compile can resolve it. + """ + query = ( + "MATCH (a:A {id: 'a'}), (x:B {id: 'b'}) " + "WITH a, x " + "MATCH (a)-[:R]->(friend) " + "WITH friend, x " + "MATCH (friend)-[:S]->(c) " + "RETURN c.id AS cid, x.id AS xid ORDER BY cid" + ) + result = _mk_multi_stage_reentry_graph().gfql(query) + assert result._nodes.to_dict(orient="records") == [ + {"cid": "c", "xid": "b"}, + {"cid": "e", "xid": "b"}, + ] + + +def test_string_cypher_admits_multi_alias_distinct_forwarding_through_reentry() -> None: + """#1256 slice 4.3d.1: bare-identifier projection items in downstream + ``WITH a, x, friend`` (pure forwarding through a reentry boundary) are + dropped at compile time so the bare-ref scanner does not false-positive. + + Before slice 4.3d.1, the active rewrite path + (``_demote_secondary_whole_row_aliases``) failed at line ~7900 + ("does not yet support carrying secondary whole-row aliases as whole-row + outputs") because the bare ``x`` in stage[1] was treated as a true USE. + """ + query = ( + "MATCH (a:A {id: 'a'}), (x:B {id: 'b'}) " + "WITH a, x " + "MATCH (a)-[:R]->(friend) " + "WITH DISTINCT a, x, friend " + "RETURN friend.id AS fid, x.id AS xid ORDER BY fid" + ) + result = _mk_multi_stage_reentry_graph().gfql(query) + assert result._nodes.to_dict(orient="records") == [ + {"fid": "b", "xid": "b"}, + {"fid": "e", "xid": "b"}, + ] + + +def test_string_cypher_failfast_rejects_intermediate_reentry_match_with_no_carried_source() -> None: + """#1256 slice 4.3d remaining gap: trailing MATCH that does NOT start from a + carried alias (free-form intermediate MATCH) is still unsupported. + + The literal LDBC SNB IC3 query begins ``WITH a, x, y MATCH (city:City) + -[:IS_PART_OF]->(country:Country) ...`` — neither ``city`` nor ``country`` + is a carried alias. Closing this requires admitting the trailing MATCH as + a fresh seed pattern that cross-joins with the carried row table; the + runtime then propagates carries onto the new bindings. This is the + remaining slice 4.3d work tracked under #1256. + """ + query = ( + "MATCH (a:A {id: 'a'}), (x:B {id: 'b'}) " + "WITH a, x " + "MATCH (c:C)-[:T]->(d:D) " + "WHERE c.id = x.id OR c.id = 'c' " + "RETURN d.id AS did" + ) + with pytest.raises( + GFQLValidationError, + match=r"(trailing MATCH to start from the same carried node alias|carries non-source whole-row aliases)", + ): + _mk_multi_stage_reentry_graph().gfql(query) + + def test_string_cypher_failfast_rejects_with_match_reentry_multiple_trailing_match_clauses() -> None: query = ( "MATCH (a:A {id: $seed})-[:R]->(b:B) " From 51b10dcf5ea0b1f2ccfa97dd8872b5ca26ede281 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 May 2026 15:13:16 -0700 Subject: [PATCH 2/4] test+docs(cypher): pin aggregate-failfast interaction with chained-reentry carry (#1256 W1-I1) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Wave-1 review flagged that hidden-column forwarding could in principle silently mutate grouping when the appended `__cypher_reentry____` bare item lands inside a `WITH a, friend, count(*) AS n` chain — the carry would join the group key set. Empirically those queries already fail at the pre-existing "aggregate would need repeated MATCH rows from a relationship pattern" failfast, so no silent wrong-result path exists today. Pin that behavior with a regression-lock test so a future change that lifts the aggregate failfast must reckon with carry-grouping interaction explicitly. Also document the DISTINCT/aggregate interaction in `_demote_secondary_whole_row_aliases` inline so the next reader sees the design tradeoff. Refs #1256. Co-Authored-By: Claude Opus 4.7 (1M context) --- graphistry/compute/gfql/cypher/lowering.py | 11 ++++++++ .../compute/gfql/cypher/test_lowering.py | 26 +++++++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/graphistry/compute/gfql/cypher/lowering.py b/graphistry/compute/gfql/cypher/lowering.py index 0ac0870b99..62d27f39c9 100644 --- a/graphistry/compute/gfql/cypher/lowering.py +++ b/graphistry/compute/gfql/cypher/lowering.py @@ -7948,6 +7948,17 @@ def rewrite_text(expr: ExpressionText, field: str) -> ExpressionText: # is in the prefix) but the subsequent WITH/RETURN scope-narrowing drops it, # and references like `RETURN x.id AS xid` (rewritten to a bare hidden # identifier) fail at the inner compile's alias resolution. + # + # Interaction with DISTINCT/aggregating downstream stages: appending the + # carry as a bare item makes it a participant in DISTINCT key sets and (in + # principle) aggregation grouping. For multi-alias carry semantics this is + # what callers want — DISTINCT over `(friend, x.id)` is the desired + # behavior when `x.id` is referenced downstream. Queries that combine carry + # forwarding with relationship-pattern aggregation already fail at the + # earlier "aggregate would need repeated MATCH rows" failfast, so no silent + # wrong-grouping reaches a user via this code path. A future tightening + # (gate per-stage on `not stage.clause.distinct and not has_aggregate`) is + # tracked under #1256 follow-up. if refs_collected and rewritten_with_stages_tail: forwarded_items: List[ReturnItem] = [] for alias_name, prop in sorted(refs_collected): diff --git a/graphistry/tests/compute/gfql/cypher/test_lowering.py b/graphistry/tests/compute/gfql/cypher/test_lowering.py index 7c63cc2430..8fa1b93d53 100644 --- a/graphistry/tests/compute/gfql/cypher/test_lowering.py +++ b/graphistry/tests/compute/gfql/cypher/test_lowering.py @@ -8081,6 +8081,32 @@ def test_string_cypher_admits_multi_alias_distinct_forwarding_through_reentry() ] +def test_string_cypher_chained_reentry_carry_with_aggregate_hits_existing_aggregate_failfast() -> None: + """#1256 wave-1 review W1-I1 regression-lock: hidden-column forwarding + appends ``__cypher_reentry____`` bare items to downstream WITH stages + so chained recursive compiles see them as scalar carries. The reviewer + flagged that a chained WITH with relationship-pattern aggregation could in + principle silently mutate grouping (`WITH a, friend, count(*) AS n` would + add the carry to the group key). Empirically those queries already fail at + the pre-existing aggregate-after-relationship-MATCH failfast, so no silent + wrong-result path exists. This test pins that behavior so a future change + that lifts the aggregate failfast must also reckon with carry-grouping + interaction explicitly. + """ + query = ( + "MATCH (a:A {id: 'a'}), (x:B {id: 'b'}) " + "WITH a, x " + "MATCH (a)-[:R]->(friend) " + "WITH a, x, count(*) AS n " + "RETURN x.id AS xid, n" + ) + with pytest.raises( + GFQLValidationError, + match=r"aggregate would need repeated MATCH rows", + ): + _mk_multi_stage_reentry_graph().gfql(query) + + def test_string_cypher_failfast_rejects_intermediate_reentry_match_with_no_carried_source() -> None: """#1256 slice 4.3d remaining gap: trailing MATCH that does NOT start from a carried alias (free-form intermediate MATCH) is still unsupported. From 010270847ada83531bc602dcdbe05ece19f77ee0 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 May 2026 15:46:55 -0700 Subject: [PATCH 3/4] fix(cypher): scoped failfast for chained-reentry carry + aggregate WITH (#1256 W2-IMPORTANT-1) Wave-2 review uncovered a silent wrong-result for the corner shape `MATCH (a),(x) WITH a, x MATCH (a) WITH a, x, count(*) AS n RETURN x.id, n` (node-only trailing MATCH + aggregating downstream WITH + carry forward). PR HEAD returned `xid=None, n=1`; master rejects the same query at the secondary-whole-row failfast. Add a guard in `_demote_secondary_whole_row_aliases`: when carry forwarding is about to append a hidden column to a downstream WITH stage, refuse if any item in that stage contains an aggregate function call. Raise a scoped #1256 failfast pointing at the gap. The relationship-pattern aggregate path is also now covered by this guard (the existing "aggregate would need repeated MATCH rows" failfast still fires for non-carry queries; the scoped error is clearer when a carry is actually involved). Test changes: - Rename `..._with_aggregate_hits_existing_aggregate_failfast` to `..._with_aggregate_relationship_match_failfast` and update the regex to match the new scoped error. - New `..._with_aggregate_node_only_match_failfast` regression-locks the W2-IMPORTANT-1 case. Validation: - 782 cypher lowering tests pass (was 780 before slice 4.3d.2; +2 net). - Touched-module mypy clean. - Scoped failfast wording is symmetric with the slice 4.3a/b admit-gate failfast (#989) so error consumers can pattern-match. Refs #1256. Co-Authored-By: Claude Opus 4.7 (1M context) --- graphistry/compute/gfql/cypher/lowering.py | 40 ++++++++++++---- .../compute/gfql/cypher/test_lowering.py | 46 ++++++++++++++----- 2 files changed, 64 insertions(+), 22 deletions(-) diff --git a/graphistry/compute/gfql/cypher/lowering.py b/graphistry/compute/gfql/cypher/lowering.py index 62d27f39c9..60dd3dac68 100644 --- a/graphistry/compute/gfql/cypher/lowering.py +++ b/graphistry/compute/gfql/cypher/lowering.py @@ -7949,17 +7949,37 @@ def rewrite_text(expr: ExpressionText, field: str) -> ExpressionText: # and references like `RETURN x.id AS xid` (rewritten to a bare hidden # identifier) fail at the inner compile's alias resolution. # - # Interaction with DISTINCT/aggregating downstream stages: appending the - # carry as a bare item makes it a participant in DISTINCT key sets and (in - # principle) aggregation grouping. For multi-alias carry semantics this is - # what callers want — DISTINCT over `(friend, x.id)` is the desired - # behavior when `x.id` is referenced downstream. Queries that combine carry - # forwarding with relationship-pattern aggregation already fail at the - # earlier "aggregate would need repeated MATCH rows" failfast, so no silent - # wrong-grouping reaches a user via this code path. A future tightening - # (gate per-stage on `not stage.clause.distinct and not has_aggregate`) is - # tracked under #1256 follow-up. + # Interaction with DISTINCT in downstream stages: appending the carry as a + # bare item makes it a participant in DISTINCT key sets. For multi-alias + # carry semantics this is what callers want — DISTINCT over `(friend, x.id)` + # is the desired behavior when `x.id` is referenced downstream. Multi-row + # `x` cases that could observably mutate row count are blocked upstream by + # the pre-existing `unique carried node rows` failfast in `gfql_unified.py`. + # + # Aggregate guard (W2-IMPORTANT-1): if any downstream WITH stage contains + # an aggregate function call, refuse to forward the carry through it. The + # alternative — silently appending the hidden alias next to `count(*)` — can + # produce a wrong NULL value in the projected column when the trailing MATCH + # has no relationship to trigger the existing aggregate failfast. Better to + # raise a scoped #1256 error pointing at the gap than to risk silent wrong + # results. The relationship-pattern aggregate path is also covered by an + # earlier failfast; this guard is a single tighter rule that subsumes both. if refs_collected and rewritten_with_stages_tail: + for stage in rewritten_with_stages_tail: + for item in stage.clause.items: + try: + item_node = parse_expr(item.expression.text) + except (GFQLExprParseError, ImportError): + continue + if _contains_aggregate_call(item_node): + raise _unsupported_at_span( + "Cypher MATCH after WITH chained-reentry secondary-alias carry " + "does not yet survive a downstream aggregating WITH stage; " + "tracked under #1256", + field="with", + value=item.expression.text, + span=stage.span, + ) forwarded_items: List[ReturnItem] = [] for alias_name, prop in sorted(refs_collected): hidden_alias = _secondary_reentry_hidden_column_name(alias_name, prop) diff --git a/graphistry/tests/compute/gfql/cypher/test_lowering.py b/graphistry/tests/compute/gfql/cypher/test_lowering.py index 8fa1b93d53..66ce3d9811 100644 --- a/graphistry/tests/compute/gfql/cypher/test_lowering.py +++ b/graphistry/tests/compute/gfql/cypher/test_lowering.py @@ -8081,17 +8081,13 @@ def test_string_cypher_admits_multi_alias_distinct_forwarding_through_reentry() ] -def test_string_cypher_chained_reentry_carry_with_aggregate_hits_existing_aggregate_failfast() -> None: - """#1256 wave-1 review W1-I1 regression-lock: hidden-column forwarding - appends ``__cypher_reentry____`` bare items to downstream WITH stages - so chained recursive compiles see them as scalar carries. The reviewer - flagged that a chained WITH with relationship-pattern aggregation could in - principle silently mutate grouping (`WITH a, friend, count(*) AS n` would - add the carry to the group key). Empirically those queries already fail at - the pre-existing aggregate-after-relationship-MATCH failfast, so no silent - wrong-result path exists. This test pins that behavior so a future change - that lifts the aggregate failfast must also reckon with carry-grouping - interaction explicitly. +def test_string_cypher_chained_reentry_carry_with_aggregate_relationship_match_failfast() -> None: + """#1256 wave-1 review W1-I1 regression-lock: chained-reentry secondary-alias + carry combined with an aggregating downstream WITH stage following a + relationship-pattern MATCH is rejected with the scoped #1256 failfast. + The aggregate guard added in slice 4.3d.2 fires before the existing + relationship-multiplicity aggregate check would; the scoped error is + clearer about the actual gap. """ query = ( "MATCH (a:A {id: 'a'}), (x:B {id: 'b'}) " @@ -8102,7 +8098,33 @@ def test_string_cypher_chained_reentry_carry_with_aggregate_hits_existing_aggreg ) with pytest.raises( GFQLValidationError, - match=r"aggregate would need repeated MATCH rows", + match=r"chained-reentry secondary-alias carry does not yet survive", + ): + _mk_multi_stage_reentry_graph().gfql(query) + + +def test_string_cypher_chained_reentry_carry_with_aggregate_node_only_match_failfast() -> None: + """#1256 wave-2 review W2-IMPORTANT-1 regression-lock: chained-reentry + secondary-alias carry combined with an aggregating downstream WITH stage + following a node-only MATCH (no relationship) is rejected with the scoped + #1256 failfast. + + Without the aggregate guard, this shape returned a silent NULL because the + pre-existing relationship-pattern aggregate failfast did NOT fire (no + relationship in the trailing MATCH) and the appended hidden carry column + interacted poorly with the count grouping path. The guard ensures the user + gets a clear, scoped error pointing at #1256 instead of silent wrong data. + """ + query = ( + "MATCH (a:A {id: 'a'}), (x:B {id: 'b'}) " + "WITH a, x " + "MATCH (a) " + "WITH a, x, count(*) AS n " + "RETURN x.id AS xid, n" + ) + with pytest.raises( + GFQLValidationError, + match=r"chained-reentry secondary-alias carry does not yet survive", ): _mk_multi_stage_reentry_graph().gfql(query) From 1277de311709cf660b3df23715f0ec180ceb8ea3 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 May 2026 16:34:31 -0700 Subject: [PATCH 4/4] test(cypher): retarget chained-reentry-same-primary test to pre-existing failfast (post-#1249 merge) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit After merging origin/master (which brought in #1249's row pipeline changes), the chained-reentry-same-primary positive test now hits the pre-existing "unique carried node rows" failfast at gfql_unified.py:~1091. This is not a regression introduced by slice 4.3d.2 — the carry-uniqueness constraint is fundamental to the current scalar-carry runtime model. Multiple rows sharing the same `a` value (one per friend after the first reentry) cannot be carried through to a second reentry that re-uses `a` as the source. The rebinding shape (Q2-style `(a)-[:R]->(friend) ... (friend)-[:S]->(c)`) remains validated and continues to pass — that's the actual cross-boundary case the slice closes. Retarget the test to assert the unique-rows failfast fires, locking the known limitation so a future slice that lifts the constraint must update this test alongside the runtime change. Refs #1256. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../compute/gfql/cypher/test_lowering.py | 34 +++++++++++-------- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/graphistry/tests/compute/gfql/cypher/test_lowering.py b/graphistry/tests/compute/gfql/cypher/test_lowering.py index b0e274034a..349326e207 100644 --- a/graphistry/tests/compute/gfql/cypher/test_lowering.py +++ b/graphistry/tests/compute/gfql/cypher/test_lowering.py @@ -8022,15 +8022,21 @@ def test_string_cypher_failfast_rejects_multi_whole_row_prefix_when_non_source_a _mk_multi_stage_reentry_graph().gfql(query) -def test_string_cypher_admits_multi_stage_secondary_alias_carry_through_chained_reentry() -> None: - """#1256 slice 4.3d.2: secondary alias carry survives a chained reentry boundary - where the trailing MATCH continues to use the same primary alias. - - `WITH a, x MATCH (a)-[:R]->(friend) WITH a, x, friend MATCH (a)-[:R]->(other) - RETURN other.id, x.id` requires the hidden carry column for `x.id` to be - forwarded explicitly through every downstream WITH stage so each recursive - bounded-reentry compile sees it as a scalar carry. Without slice 4.3d.2, the - inner compile fails alias resolution on the synthesized hidden identifier. +def test_string_cypher_chained_reentry_with_repeated_primary_hits_unique_carried_rows_failfast() -> None: + """#1256 known limitation: when the trailing MATCH re-uses the SAME primary + alias across multiple reentry boundaries (``WITH a, x MATCH (a)-[:R]->(friend) + WITH a, x, friend MATCH (a)-[:R]->(other) ...``), the second reentry's + recursive compile receives multiple rows that share the same carried ``a`` + value (one per friend), tripping the pre-existing + ``unique carried node rows`` runtime check at ``gfql_unified.py:~1091``. + + This is not a regression introduced by slice 4.3d.2 — the carry-uniqueness + constraint is fundamental to the current scalar-carry runtime model. The + rebinding shape (Q2-style ``(a)-[:R]->(friend) ... (friend)-[:S]->(c)``) + avoids this because the primary alias rebinds each boundary so each + recursive prefix has unique carried-node identity. Lock this here so a + future slice that lifts the unique-rows constraint must update this test + alongside the runtime change. """ query = ( "MATCH (a:A {id: 'a'}), (x:B {id: 'b'}) " @@ -8041,11 +8047,11 @@ def test_string_cypher_admits_multi_stage_secondary_alias_carry_through_chained_ "WHERE other.id <> friend.id " "RETURN other.id AS oid, x.id AS xid ORDER BY oid" ) - result = _mk_multi_stage_reentry_graph().gfql(query) - assert result._nodes.to_dict(orient="records") == [ - {"oid": "b", "xid": "b"}, - {"oid": "e", "xid": "b"}, - ] + with pytest.raises( + GFQLValidationError, + match=r"unique carried node rows", + ): + _mk_multi_stage_reentry_graph().gfql(query) def test_string_cypher_admits_secondary_alias_carry_across_reentry_source_rebinding() -> None: