Skip to content

Commit 80d8084

Browse files
lmeyerovclaude
andauthored
feat(cypher): ReentryPlan + multi-whole-row prefix WITH (#989 slices 4.1+4.3a) (#1248)
* refactor(cypher): introduce ReentryPlan dataclass for bounded reentry (#989 slice 1) Slice 4.1 of the #989 row-carrier cleanup: add an explicit ReentryPlan + CarriedAlias contract for the bounded MATCH-after-WITH path, populated alongside the existing scalar_reentry_alias / scalar_reentry_columns fields on CompiledCypherExecutionExtras. This commit is additive and a no-behavior-change refactor: * New module graphistry/compute/gfql/cypher/reentry_plan.py * New field reentry_plan: Optional[ReentryPlan] on CompiledCypherExecutionExtras * New property accessor on CompiledCypherQuery * Population at the bounded-reentry construction site; propagation through _map_terminal_reentry_query and _attach_graph_context Read sites still consume scalar_reentry_alias / scalar_reentry_columns and _compiled_query_reentry_contract; later slices switch readers to the plan and lift the single-whole-row constraint. Tests: graphistry/tests/compute/gfql/cypher/ — 1091 passed, 66 skipped, 15 xfailed (no regressions). Refs #987 #989 #999. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cypher): admit multi-whole-row prefix WITH when non-source aliases unused (#989 slice 4.3a) Lift the single-whole-row gate at lowering.py:7207 (compile-time) and gfql_unified.py:1220 (runtime contract). Multi-whole-row prefix WITH (e.g. WITH a, x, y) now compiles when only the trailing-MATCH source alias is referenced downstream. For the case IC3 actually needs — non-source whole-row aliases referenced in trailing WHERE / RETURN / WITH — emit an actionable failfast pointing to #989. The full row-carrier rewrite (slice 4.3b) lands the carry of non-source alias properties as hidden columns; that is tracked under #989 as the architectural exit gate. Implementation: * `_bounded_reentry_carry_columns`: returns `(reentry_alias, carried_scalars, non_source_alias_names)` instead of raising on N>1 whole-rows. Picks `reentry_alias` from the trailing-MATCH first node alias when it is a carried whole-row, else falls back to the first. * `_collect_non_source_alias_references`: scans trailing WHERE (predicates + tree path), trailing WITH stages, RETURN items, ORDER BY items, and reentry UNWINDs for references to non-source aliases. * `_iter_property_refs`: walks structured `WhereClause.predicates` for `PropertyRef` nodes (the alias-keyed reference shape). * `_compiled_query_reentry_contract` (gfql_unified): consults the ReentryPlan field added in the prior slice and picks the recorded reentry_alias_name when present. * `ReentryPlan.aliases` now includes a `CarriedAlias` entry per non-source whole-row alias for future slices to populate `carried_properties`. Tests: graphistry/tests/compute/gfql/cypher/ — 1093 passed (+2 new focused cases for multi-whole-row admit + non-source-alias failfast). Refs #987 #989 #999. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cypher): handle ProjectionStage.where as ExpressionText in non-source-alias scanner (#989 slice 4.3a follow-up) Review of PR #1248 surfaced a latent AttributeError in _collect_non_source_alias_references: stage.where on ProjectionStage is Optional[ExpressionText] (raw text node), not Optional[WhereClause]. The prior code passed it to _scan_where(...) which crashes on .expr_tree on any multi-whole-row prefix followed by a downstream WITH ... WHERE stage. Fix: scan stage.where.text directly (symmetric to the unwind / return / order_by branches in the same loop). Add regression test test_string_cypher_admits_multi_whole_row_prefix_with_downstream_stage_where exercising the previously-broken code path on a green-execution shape. Tests: 1094 passed (+1 new regression), no regressions. Refs #989 #999. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(cypher): wave-2 review fixes — drop premature CarriedAlias fields, add plan structure test, comment cleanup (#989) Wave 2 review of PR #1248 (parallel per-dimension subagents) surfaced two IMPORTANT findings that survived the adversarial pass: * Spec + Quality consensus: `CarriedAlias.id_column` (Optional[str] = None) and `carried_properties` (Tuple[str, ...] = ()) were declared but never set, violating CLAUDE.md "Don't design for hypothetical future requirements." Removed; will land in slice 4.3b at first set. * Tests: no test asserted `compiled_query.reentry_plan` structure, leaving silent-regression risk for the plan field in slice 4.3b. Added `test_compile_cypher_records_reentry_plan_for_multi_whole_row_prefix` covering reentry_alias_name, scalar_only, alias output_names, and is_reentry_alias flag distribution. Stacking related Quality SUGGESTIONs: * Replaced "Slice 4.1+4.3" inline planning context with #989-anchored comment per the skill's "ephemeral reference leakage" rule. * Hoisted `dataclasses.fields` to module-top import, dropped the misleading "to avoid top-level shadowing" comment. * Narrowed `_iter_property_refs(node: Any)` to `node: object` and rewrote the docstring to enumerate the actual leaf types it cannot recurse into. Adversarial pass REJECTED a wave-2 BLOCKER claim of infinite recursion in `_iter_property_refs` — frozen dataclasses with PropertyRef leaves are acyclic by construction; the trailing fall-through is correct. Tests: 1095 passed (+1 new) on graphistry/tests/compute/gfql/cypher/. Refs #987 #989 #999. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cypher): property carry for non-source whole-row aliases (#989 slice 4.3b) When a prefix `WITH a, x, y` projects multiple whole-row aliases, rewrite the prefix at compile time to project the non-source aliases as scalar property carries: `WITH a, x.id AS x__id, ...`. Trailing `<non_source>.<prop>` references rewrite to a property access on the reentry-alias's hidden column (`<reentry_alias>.<__cypher_reentry_<alias>__<prop>__>`). The existing scalar carry plumbing forwards the values onto the reentry-alias's row table; no runtime changes needed. Closes the previously-locked #1026 regression (multi-alias WITH + OPTIONAL MATCH) which now executes correctly instead of raising. Slice 4.3a's failfast tightens to bare-reference cases only — bare `<non_source>` carries (and IC3's multi-stage `WITH a, x, y` chain that re-projects through stages) remain out of scope for this slice and now point users at the still-pending follow-up work under #989. Implementation: * `_reentry_property_carry_name(alias, prop)` helper for the unwrapped carry alias used in the prefix WITH; the existing `_reentry_hidden_column_name` wraps it at runtime to the final hidden column name. * `_rewrite_multi_whole_row_prefix(prefix_stage, ...)` rewrites the prefix items: drops bare non-source aliases that have property refs downstream, inserts scalar projection per (alias, property) pair. * `_collect_non_source_alias_property_refs` returns (props_by_alias, bare_referenced) — the dict drives the prefix rewrite, bare_referenced gates the failfast. * `_rewrite_reentry_expr_to_hidden_properties` extended with optional `non_source_carried_props` arg; AST-level rewrite of `PropertyAccessExpr(Identifier(non_source), prop)` runs BEFORE bare-identifier substitution to avoid double-mangling. * `ReentryPlan.aliases` now records `CarriedAlias.carried_properties` for the rewritten non-source aliases (field re-introduced from slice 4.1, now used). Tests: 1096 passed (+1 net new positive — `_admits_non_source_alias_property_carry_through_reentry`, `_failfast_..._bare_referenced` adapted from prior failfast). The `test_issue_1026_multi_alias_with_optional_match_rejects_cleanly` flipped to `_carries_non_source_property` asserting correct execution. Refs #987 #989 #999 #1026. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(cypher): wave-3 review fixes — namespace carry name, sort-determinism comment, ORDER BY failfast test (#989) Wave 3 review of slice 4.3b (commit c6a9852) flagged two SUGGESTION-level items resolvable in a small follow-up: * Carry name `<alias>__<prop>` could collide with a user-projected scalar in the same prefix WITH (hypothetical, low-likelihood). Tightened to `__carry_<alias>__<prop>__` so the namespace can't overlap with any unquoted user identifier. * Sort-determinism comment on the property-tuple ordering inside `_rewrite_multi_whole_row_prefix` so future readers know the order is cosmetic and that downstream AST rewrites match by name. Plus a missing failfast-scope test: * `test_string_cypher_failfast_rejects_multi_whole_row_prefix_when_non_source_alias_is_bare_referenced_in_order_by` exercises the `query.order_by` branch of `_collect_non_source_alias_property_refs`. Adversarial pass on wave 3 confirmed the remaining flagged items (double-rewrite safety, bare-vs-property filter logic, end-to-end carry correctness, flipped #1026 test) are all CORRECT. Tests: 1097 passed (+1 new failfast scope test), no regressions. Refs #989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cypher): drop bare non-source-alias forwarding items in downstream WITH stages (#989 slice 4.3c) Slice 4.3c fold-up: when a downstream WITH stage forwards carried whole-row aliases as bare items (`WITH a, x, y, collect(...)`), drop them at compile time. Their properties are already carried onto the reentry-alias's row table as hidden columns; the bare items are pure forwarding noise. Without this, the slice 4.3a/4.3b bare-reference failfast fires on IC3-shaped multi-stage chains where each `WITH` re-projects the carried aliases for downstream use. With this drop, those queries pass the failfast and proceed to the property-access rewrite layer. Note: this slice does NOT close the full IC3 lane. Multi-stage chains that span more than one reentry MATCH (`MATCH (a)-[:KNOWS]-(friend) ... MATCH (friend)<-[:HAS_CREATOR]-(message)`) still fail at a deeper runtime boundary — the hidden columns carrying `x.id`/`y.id` live on `a`'s row table and don't follow forward to `friend`'s row table after the first reentry's join. That cross-reentry carry-forward is the remaining slice 4.3d work tracked under #989. Implementation: * `_drop_bare_alias_items_from_stage(stage, aliases, *, identifier_re)`: pure helper that drops bare-identifier projection items whose name is in the carried set. Items with explicit aliases (`x AS xref`) are preserved — those are uses, not forwards. * `_rewrite_multi_whole_row_prefix(...)` extended to also return the rewritten downstream WITH stages tail. The bare-ref scanner now runs on the cleaned tail so forwarding patterns don't false-positive. * Caller threads the rewritten tail back into `query.with_stages` before the prefix compile + suffix construction. Tests: 1097 passed. The bare-ref failfast test updated to use `WHERE x = b` (a true USE) since `WITH b, x` is now correctly admitted as a forwarding pattern. Refs #989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix reentry carry mypy strict errors * Stabilize tck-gfql CI contract for reentry branch * Apply tck override for push and PR events * Advance tck override to include pattern1-21 contract * Use event-specific tck contract override * Fix reentry compile after master merge * Use unified tck contract override for #989 branch * Relax bare-alias failfast message assertions * ci: remove temporary #989 tck-gfql sha override * docs(changelog): record #989 ReentryPlan IR + multi-alias whole-row carry slices Adds an Internal entry under [Development] covering PR #1248's contributions beyond the #1071 lift already on master: - ReentryPlan + CarriedAlias dataclass IR (#987 step 1 / #989 slice 4.1) - Multi-whole-row admit when non-source aliases unused (slice 4.3a) - Compile-time property carry rewrite via __carry_<alias>__<prop>__ hidden columns, closing the multi-alias case of the #1026 regression-lock (slice 4.3b) - Bare-item drop in downstream forwarding WITH stages (slice 4.3c) Cross-reentry-boundary carry forwarding (IC3 / #999 slice 4.3d) is called out as remaining follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent d31e21a commit 80d8084

5 files changed

Lines changed: 612 additions & 19 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
3030
- **Polars support**: `polars.DataFrame` and `polars.LazyFrame` now work in `plot()`, `materialize_nodes()`, `get_degrees()`, `get_indegrees()`, `get_outdegrees()`, and `hypergraph()`. Polars is an optional dependency — no behavior change when not installed. Upload path uses efficient Arrow conversion (`to_arrow()` with schema-metadata stripping and memoization); compute/hypergraph paths coerce to pandas at entry. `LazyFrame` is materialized via `.collect()` at each boundary. Adds `test_polars.py` with 17 tests; skips gracefully when polars is absent (#1133).
3131

3232
### Internal
33+
- **GFQL / Cypher lowering — `ReentryPlan` IR + multi-alias whole-row carry slices (#989, #1026, #999 partial)**: Introduces an explicit `ReentryPlan` + `CarriedAlias` dataclass (`graphistry/compute/gfql/cypher/reentry_plan.py`) as the compile-time contract between a prefix `WITH` stage and the trailing `MATCH`, replacing the implicit handshake spread across tuple returns from `_bounded_reentry_carry_columns`, the `scalar_reentry_alias` / `scalar_reentry_columns` fields on `CompiledCypherExecutionExtras`, and runtime contract re-extraction in `_compiled_query_reentry_contract` (#987 step 1). Plan is exposed via `compiled_query.reentry_plan` and threaded through `_map_terminal_reentry_query` + `_attach_graph_context`. Builds three additional admit slices on top of the #1071 lift: (1) **slice 4.3a** lifts the residual single-whole-row gate at the compile site (`lowering.py`) and runtime contract (`gfql_unified.py`) so `WITH a, x` admits whenever only the trailing-MATCH source alias is referenced downstream — `ReentryPlan.aliases` records all whole-row aliases as `CarriedAlias` entries, with downstream non-source-alias bare references emitting an actionable failfast pointing to #989; (2) **slice 4.3b** adds a compile-time prefix rewrite (`_rewrite_multi_whole_row_prefix`) that turns `WITH a, x` into `WITH a, x.id AS __carry_x__id__` for every property of `x` referenced in trailing clauses (collected via `_collect_non_source_alias_property_refs` walking `WhereClause.expr_tree`, `ProjectionStage.where`, `ReturnItem`, `OrderItem`), and AST-rewrites trailing `<non_source>.<prop>` references to property access on the reentry-alias's hidden column — closes the multi-alias case of `#1026` regression-lock (`MATCH (a), (x) WITH a, x OPTIONAL MATCH (a)-->(b) RETURN x.id, b.id` now executes correctly with left-outer-join semantics, previously raised `GFQLValidationError`); (3) **slice 4.3c** drops bare carried-alias items at compile time when downstream `WITH a, x, y, collect(...)` re-projects them, so the bare-ref failfast does not false-positive on forwarding patterns. Adds 5 new positive/structure tests + 2 failfast-scope tests, retargets one prior `pytest.raises(GFQLValidationError)` regression-lock to a positive row assertion, and adds the multi-alias `OPTIONAL MATCH` regression test `test_issue_1026_multi_alias_with_optional_match_carries_secondary_property`. Cross-reentry-boundary forwarding (carry survival across `MATCH (a)-[:KNOWS]-(friend)`, the IC3 LDBC SNB shape) remains follow-up work tracked as #999 / slice 4.3d (#989, #1026, #999, #987).
3334
- **GFQL / Cypher lowering — multi-alias carry through `WITH` before MATCH re-entry (#1071)**: Lifted the residual constraint that `WITH` before a MATCH re-entry must project exactly one whole-row node alias. The local Cypher compiler now accepts patterns like `MATCH (p)-[:KNOWS]-(friend) WITH p, friend MATCH (friend)-[:IS_LOCATED_IN]->(c) RETURN p.firstName, c.name` (LDBC SNB IC1 shape). Implementation pre-rewrites the prefix `WITH` in `_compile_bounded_reentry_query`: the trailing-MATCH primary alias is preserved as the sole whole-row carry; secondary aliases are demoted to scalar property carries via synthesized `S.X AS __cypher_reentry_<S>_<X>__` items, with downstream `S.X` references rewritten to bare hidden identifiers that compose with the existing single-alias carried-scalar machinery (#1047 / #1068). Returning a secondary alias as a whole-row entity (`RETURN s`) and re-binding a secondary alias as a node variable in the trailing MATCH remain unsupported with precise errors. Two prior failfast tests retargeted to assert correct multi-alias behavior; new tests cover IC1-shape, three-alias carry, and the secondary-whole-row-return rejection.
3435
- **GFQL / Cypher lowering — OR/XOR around pattern predicates (#1236, #1031 slice 4)**: Local Cypher `WHERE` boolean trees now support disjunctive compositions that mix pattern-existence leaves with row expressions (for example `pattern OR expr`, `pattern XOR expr`). Lowering rewrites pattern leaves into correlated `semi_apply_mark(...)` row pre-filters that emit boolean marker columns, then evaluates the rewritten boolean expression in `where_rows(...)`. Added direct row-pipeline/safelist coverage for `semi_apply_mark` plus parser/lowering runtime regression tests for OR/XOR-around-pattern query shapes.
3536
- **GFQL / Cypher residual cleanup (#1226, #1219)**: Retired `graphistry/compute/gfql/expr_split.py` by migrating top-level-AND splitting to `predicate_pushdown.py` (`split_top_level_and_conjuncts`) and updating conformance coverage to target the migrated splitter path. Fixed primitive literal atom fallback text in parser boolean-tree construction so `atom_text` now preserves Cypher keyword casing (`true`/`false`/`null`) instead of Python casing. Tightened optional-arm pushdown null-safety guardrails by treating OR-compound predicates as conservatively null-rejecting (disjunct-level alias analysis remains out of scope), with new regression coverage for mixed-alias `IS NOT NULL OR ...` forms.

0 commit comments

Comments
 (0)