Skip to content

Commit 5595051

Browse files
lmeyerovclaude
andcommitted
feat(cypher): scope free-form intermediate MATCH failfast to #1263
Replace the generic "trailing MATCH must start from the same carried node alias" error in `_compile_bounded_reentry_query` with a #1263-scoped error that names the LDBC SNB IC3 endpoint and identifies the gap shape. Users hitting this gate can now distinguish the free-form intermediate-MATCH lane from the other open IC3 sub-cases (chained reentry, aggregate downstream, etc). Tests: - Add `test_string_cypher_failfast_rejects_simple_freeform_intermediate_reentry_match` — minimal single-alias-prefix shape that isolates the free-form gate (Site A from `plans/1263-freeform-intermediate-match/repro/findings.md`) without the slice 4.3a/b bare-ref interaction (Site B). - Retarget `test_string_cypher_failfast_rejects_intermediate_reentry_match_with_no_carried_source` from the #1256-era wording to the new #1263 wording. The IC3-shaped query still hits the slice 4.3a/b admit gate first (because `_demote_secondary_whole_row_aliases` early-bails when the trailing MATCH isn't carried), so the matcher accepts either failfast site. Closure of the gap (admitting the trailing MATCH as a fresh seed pattern that cross-joins with the carried row table at runtime, plus extending `ReentryPlan` with a per-stage mode marker) remains follow-up under #1263. Design at `plans/1263-freeform-intermediate-match/design/freeform-admit-design.md`. Refs #1263, #999 (IC3 partial), #989 (row-carrier IR umbrella). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 935c709 commit 5595051

3 files changed

Lines changed: 46 additions & 6 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
3030
- **Polars support**: `polars.DataFrame` and `polars.LazyFrame` now work in `plot()`, `materialize_nodes()`, `get_degrees()`, `get_indegrees()`, `get_outdegrees()`, and `hypergraph()`. Polars is an optional dependency — no behavior change when not installed. Upload path uses efficient Arrow conversion (`to_arrow()` with schema-metadata stripping and memoization); compute/hypergraph paths coerce to pandas at entry. `LazyFrame` is materialized via `.collect()` at each boundary. Adds `test_polars.py` with 17 tests; skips gracefully when polars is absent (#1133).
3131

3232
### Internal
33+
- **GFQL / Cypher lowering — scoped failfast for free-form intermediate MATCH after WITH (#1263, #999 partial)**: Replaced the generic `Cypher MATCH after WITH currently requires the trailing MATCH to start from the same carried node alias` error at `_compile_bounded_reentry_query` with a `#1263`-scoped error that names the LDBC SNB IC3 endpoint and points at the design doc for closure. The scoped wording calls out the gap shape — trailing MATCH whose first alias is not in the prefix `WITH`'s carried set — so users can identify whether their query falls into this lane or one of the other open IC3 sub-cases. Two regression-lock tests added: `test_string_cypher_failfast_rejects_simple_freeform_intermediate_reentry_match` (single-alias prefix; isolates the free-form gate without the slice 4.3a/b bare-ref interaction) and `test_string_cypher_failfast_rejects_intermediate_reentry_match_with_no_carried_source` retargeted from the prior #1256 wording. Closure of the gap (admitting the trailing MATCH as a fresh seed pattern that cross-joins with the carried row table at runtime) remains follow-up under #1263.
3334
- **GFQL / Cypher lowering — secondary whole-row carry survives chained reentry boundaries (#1256, #989, #999 partial)**: Two extensions to `_demote_secondary_whole_row_aliases` in `graphistry/compute/gfql/cypher/lowering.py` so multi-alias carry survives a chained reentry compile. (1) **Forwarding-item drop** — bare-identifier projection items in downstream `WITH` stages whose name is a secondary alias are dropped at compile time (the same intent as slice 4.3c, integrated into the post-#1071 active rewrite path). Without this, a forwarding pattern like `WITH a, x, friend, ...` triggered the bare-ref failfast even though the carry already lives as a hidden scalar on the reentry-source's row table. (2) **Hidden-column forwarding** — for every `(secondary_alias, prop)` reference collected from trailing clauses, the synthesized `__cypher_reentry_<S>_<X>__` hidden alias is appended as a bare passthrough item to every downstream `WITH` stage so each recursive `compile_cypher_query` call sees it as a scalar carry. Without this, the inner compile failed alias resolution on the rewritten hidden identifier (`Unknown Cypher alias '__cypher_reentry_x_id__' in RETURN clause`) once a chained reentry stage narrowed the projection scope. New positive tests cover (a) chained reentry where the trailing MATCH continues to use the same primary alias, (b) chained reentry where the source is rebound between boundaries (`MATCH (a)-[:R]->(friend) ... MATCH (friend)-[:S]->(c)`), and (c) multi-alias `DISTINCT` forwarding through a single boundary. New failfast test pins the remaining gap — a trailing MATCH that does not start from any carried alias (free-form intermediate MATCH, the LDBC SNB IC3 prefix shape) — to the existing scoped error so future closure of that lane is regression-locked. Closes the chained-reentry portion of #1256; the free-form intermediate-MATCH case remains open follow-up.
3435
- **GFQL / Cypher lowering — `ReentryPlan` IR + multi-alias whole-row carry slices (#989, #1026, #999 partial)**: Introduces an explicit `ReentryPlan` + `CarriedAlias` dataclass (`graphistry/compute/gfql/cypher/reentry_plan.py`) as the compile-time contract between a prefix `WITH` stage and the trailing `MATCH`, replacing the implicit handshake spread across tuple returns from `_bounded_reentry_carry_columns`, the `scalar_reentry_alias` / `scalar_reentry_columns` fields on `CompiledCypherExecutionExtras`, and runtime contract re-extraction in `_compiled_query_reentry_contract` (#987 step 1). Plan is exposed via `compiled_query.reentry_plan` and threaded through `_map_terminal_reentry_query` + `_attach_graph_context`. Builds three additional admit slices on top of the #1071 lift: (1) **slice 4.3a** lifts the residual single-whole-row gate at the compile site (`lowering.py`) and runtime contract (`gfql_unified.py`) so `WITH a, x` admits whenever only the trailing-MATCH source alias is referenced downstream — `ReentryPlan.aliases` records all whole-row aliases as `CarriedAlias` entries, with downstream non-source-alias bare references emitting an actionable failfast pointing to #989; (2) **slice 4.3b** adds a compile-time prefix rewrite (`_rewrite_multi_whole_row_prefix`) that turns `WITH a, x` into `WITH a, x.id AS __carry_x__id__` for every property of `x` referenced in trailing clauses (collected via `_collect_non_source_alias_property_refs` walking `WhereClause.expr_tree`, `ProjectionStage.where`, `ReturnItem`, `OrderItem`), and AST-rewrites trailing `<non_source>.<prop>` references to property access on the reentry-alias's hidden column — closes the multi-alias case of `#1026` regression-lock (`MATCH (a), (x) WITH a, x OPTIONAL MATCH (a)-->(b) RETURN x.id, b.id` now executes correctly with left-outer-join semantics, previously raised `GFQLValidationError`); (3) **slice 4.3c** drops bare carried-alias items at compile time when downstream `WITH a, x, y, collect(...)` re-projects them, so the bare-ref failfast does not false-positive on forwarding patterns. Adds 5 new positive/structure tests + 2 failfast-scope tests, retargets one prior `pytest.raises(GFQLValidationError)` regression-lock to a positive row assertion, and adds the multi-alias `OPTIONAL MATCH` regression test `test_issue_1026_multi_alias_with_optional_match_carries_secondary_property`. Cross-reentry-boundary forwarding (carry survival across `MATCH (a)-[:KNOWS]-(friend)`, the IC3 LDBC SNB shape) remains follow-up work tracked as #999 / slice 4.3d (#989, #1026, #999, #987).
3536
- **GFQL / Cypher lowering — multi-alias carry through `WITH` before MATCH re-entry (#1071)**: Lifted the residual constraint that `WITH` before a MATCH re-entry must project exactly one whole-row node alias. The local Cypher compiler now accepts patterns like `MATCH (p)-[:KNOWS]-(friend) WITH p, friend MATCH (friend)-[:IS_LOCATED_IN]->(c) RETURN p.firstName, c.name` (LDBC SNB IC1 shape). Implementation pre-rewrites the prefix `WITH` in `_compile_bounded_reentry_query`: the trailing-MATCH primary alias is preserved as the sole whole-row carry; secondary aliases are demoted to scalar property carries via synthesized `S.X AS __cypher_reentry_<S>_<X>__` items, with downstream `S.X` references rewritten to bare hidden identifiers that compose with the existing single-alias carried-scalar machinery (#1047 / #1068). Returning a secondary alias as a whole-row entity (`RETURN s`) and re-binding a secondary alias as a node variable in the trailing MATCH remain unsupported with precise errors. Two prior failfast tests retargeted to assert correct multi-alias behavior; new tests cover IC1-shape, three-alias carry, and the secondary-whole-row-return rejection.

graphistry/compute/gfql/cypher/lowering.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8492,8 +8492,18 @@ def _compile_bounded_reentry_query(
84928492
span=query.return_.span,
84938493
)
84948494
if first_alias is None or first_alias != reentry_alias:
8495+
# #1263 (LDBC SNB IC3 endpoint): trailing MATCH whose first alias is
8496+
# not in the carried set is the free-form intermediate MATCH case.
8497+
# Closing this requires admitting the trailing MATCH as a fresh seed
8498+
# pattern that cross-joins with the carried row table at runtime, plus
8499+
# extending `ReentryPlan` with a per-stage mode marker so the runtime
8500+
# branches between the existing carried-alias path and a new free-form
8501+
# cross-join path. See `plans/1263-freeform-intermediate-match/design/freeform-admit-design.md`.
84958502
raise _unsupported_at_span(
8496-
"Cypher MATCH after WITH currently requires the trailing MATCH to start from the same carried node alias",
8503+
"Cypher MATCH after WITH does not yet admit a trailing MATCH whose first alias is "
8504+
"not in the carried set (free-form intermediate MATCH; LDBC SNB IC3 endpoint, tracked under #1263). "
8505+
"The carried-alias path requires the trailing MATCH source to be one of the prefix WITH's "
8506+
"whole-row aliases.",
84978507
field="match",
84988508
value=first_alias,
84998509
span=reentry_match.span,

graphistry/tests/compute/gfql/cypher/test_lowering.py

Lines changed: 34 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8152,15 +8152,22 @@ def test_string_cypher_chained_reentry_carry_with_aggregate_node_only_match_fail
81528152

81538153

81548154
def test_string_cypher_failfast_rejects_intermediate_reentry_match_with_no_carried_source() -> None:
8155-
"""#1256 slice 4.3d remaining gap: trailing MATCH that does NOT start from a
8156-
carried alias (free-form intermediate MATCH) is still unsupported.
8155+
"""#1263 (LDBC SNB IC3 endpoint): trailing MATCH that does NOT start from
8156+
a carried alias (free-form intermediate MATCH) raises a scoped failfast
8157+
pointing at #1263.
81578158
81588159
The literal LDBC SNB IC3 query begins ``WITH a, x, y MATCH (city:City)
81598160
-[:IS_PART_OF]->(country:Country) ...`` — neither ``city`` nor ``country``
81608161
is a carried alias. Closing this requires admitting the trailing MATCH as
81618162
a fresh seed pattern that cross-joins with the carried row table; the
8162-
runtime then propagates carries onto the new bindings. This is the
8163-
remaining slice 4.3d work tracked under #1256.
8163+
runtime then propagates carries onto the new bindings.
8164+
8165+
Originally introduced under #1256 with a generic carried-alias gate
8166+
message; tightened in #1263 to call out the IC3 endpoint by name. The IC3
8167+
literal additionally hits the slice 4.3a/b admit gate (bare `x`/`y`
8168+
references in downstream WITH stages) which fires earlier — the test
8169+
matcher accepts either failfast site so the regression-lock holds at
8170+
whichever gate the compile reaches first.
81648171
"""
81658172
query = (
81668173
"MATCH (a:A {id: 'a'}), (x:B {id: 'b'}) "
@@ -8171,7 +8178,29 @@ def test_string_cypher_failfast_rejects_intermediate_reentry_match_with_no_carri
81718178
)
81728179
with pytest.raises(
81738180
GFQLValidationError,
8174-
match=r"(trailing MATCH to start from the same carried node alias|carries non-source whole-row aliases)",
8181+
match=r"(free-form intermediate MATCH|carries non-source whole-row aliases)",
8182+
):
8183+
_mk_multi_stage_reentry_graph().gfql(query)
8184+
8185+
8186+
def test_string_cypher_failfast_rejects_simple_freeform_intermediate_reentry_match() -> None:
8187+
"""#1263 minimal free-form regression-lock: even with single-alias prefix
8188+
and no bare references downstream, a trailing MATCH that does not start
8189+
from the carried alias is rejected with the scoped #1263 failfast.
8190+
8191+
This is the minimal shape that *only* trips the free-form gate (Site A),
8192+
distinguished from the IC3-shaped query above which trips the bare-ref
8193+
gate (Site B) first.
8194+
"""
8195+
query = (
8196+
"MATCH (a:A {id: 'a'}) "
8197+
"WITH a "
8198+
"MATCH (c:C)-[:T]->(d:D) "
8199+
"RETURN d.id AS did, c.id AS cid"
8200+
)
8201+
with pytest.raises(
8202+
GFQLValidationError,
8203+
match=r"free-form intermediate MATCH",
81758204
):
81768205
_mk_multi_stage_reentry_graph().gfql(query)
81778206

0 commit comments

Comments
 (0)