Skip to content

feat(cypher): ReentryPlan + multi-whole-row prefix WITH (#989 slices 4.1+4.3a)#1248

Merged
lmeyerov merged 18 commits intomasterfrom
feat/989-reentryplan-multi-alias-carry
May 3, 2026
Merged

feat(cypher): ReentryPlan + multi-whole-row prefix WITH (#989 slices 4.1+4.3a)#1248
lmeyerov merged 18 commits intomasterfrom
feat/989-reentryplan-multi-alias-carry

Conversation

@lmeyerov
Copy link
Copy Markdown
Contributor

@lmeyerov lmeyerov commented May 2, 2026

Summary

Foundation work toward #989 row-carrier IR. Two slices on top of master@6891d39e4:

  • Slice 4.1 — Introduce ReentryPlan + CarriedAlias dataclasses (graphistry/compute/gfql/cypher/reentry_plan.py). Add reentry_plan: Optional[ReentryPlan] to CompiledCypherExecutionExtras and propagate through _execution_extras_with / _map_terminal_reentry_query / _attach_graph_context. No behavior change — read sites still consume scalar_reentry_alias / scalar_reentry_columns.

  • Slice 4.3a — Lift the len(whole_row_columns) != 1 gate at lowering.py:7207 and the matching runtime gate at gfql_unified.py:1220. Multi-whole-row prefix WITH (WITH a, x, y) now compiles when only the trailing-MATCH source alias is referenced downstream. For the IC3 case (non-source whole-row alias referenced in trailing WHERE/RETURN), emit an actionable failfast pointing to #989 for the slice 4.3b follow-up that lands the actual property carry.

What this PR does and does not close

  • #987 ReentryPlan handshake cleanup — partially addressed (Step 1 dataclass + scalar/whole-row migration to plan field). Steps 2–5 (replace hidden-column protocol, move runtime stitching, split lowering.py) remain follow-up.
  • #989 row-carrier IR — partial. The plan dataclass and the multi-whole-row compile path are in place; the property carry for non-source aliases (slice 4.3b) is the remaining architectural exit gate.
  • #999 multi-stage MATCH/WITH/MATCH — not yet closed. IC3 still hits the new failfast (correctly: it references x.id / y.id in trailing). IC5 still hits the multi-MATCH-source row projection gate at lowering.py:1914 (Slice 2 follow-up).

The win in this PR: actionable diagnostics and a reusable plan contract for the rest of the row-carrier work.

Scope rationale

See plans/989-reentryplan-multi-alias-carry/{plan.md,design/current-shape.md,design/reentry-plan.md} (gitignored) for the full design. Short version: the parser-surface "Path A" referenced in #999 was empty — both literal IC3 and IC5 fall through to row-lowering invariants. Slice 4.3a is the smallest meaningful step that establishes the plan contract without writing throwaway hidden-column code.

Test plan

  • pytest -q graphistry/tests/compute/gfql/cypher/ — 1093 passed (+2 new), 66 skipped, 15 xfailed
  • All existing reentry tests stay green
  • New positive test: test_string_cypher_admits_multi_whole_row_prefix_when_non_source_aliases_are_unused
  • New failfast test: test_string_cypher_failfast_rejects_multi_whole_row_prefix_when_non_source_alias_is_referenced
  • cuDF lane validation — deferred until slice 4.3b adds the IC3 fixture

Follow-up plan

  1. Slice 4.3b — implement property carry: _reentry_hidden_property_column(alias, prop), scanner that returns Dict[alias, Set[prop]], AST-level rewriter for PropertyAccessExpr(<non_source>, prop)PropertyAccessExpr(<reentry_alias>, hidden_name), runtime extension of _compiled_query_reentry_state to merge hidden columns from each non-source alias's meta entry. Closes IC3 lane of #999.
  2. Slice 2 (IC5) — multi-MATCH-source row projection (lowering.py:1914). Larger scope; separate PR under #989.
  3. Cleanup PRs#987 Steps 2–5 (extract reentry runtime to reentry_runtime.py, remove the legacy scalar_reentry_* fields after consumers migrate to compiled_query.reentry_plan).

Refs #987 #989 #999.

lmeyerov and others added 16 commits May 1, 2026 18:03
…#989 slice 1)

Slice 4.1 of the #989 row-carrier cleanup: add an explicit ReentryPlan +
CarriedAlias contract for the bounded MATCH-after-WITH path, populated
alongside the existing scalar_reentry_alias / scalar_reentry_columns fields
on CompiledCypherExecutionExtras.

This commit is additive and a no-behavior-change refactor:
* New module graphistry/compute/gfql/cypher/reentry_plan.py
* New field reentry_plan: Optional[ReentryPlan] on CompiledCypherExecutionExtras
* New property accessor on CompiledCypherQuery
* Population at the bounded-reentry construction site; propagation through
  _map_terminal_reentry_query and _attach_graph_context

Read sites still consume scalar_reentry_alias / scalar_reentry_columns and
_compiled_query_reentry_contract; later slices switch readers to the plan
and lift the single-whole-row constraint.

Tests: graphistry/tests/compute/gfql/cypher/ — 1091 passed, 66 skipped,
15 xfailed (no regressions).

Refs #987 #989 #999.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…es unused (#989 slice 4.3a)

Lift the single-whole-row gate at lowering.py:7207 (compile-time) and
gfql_unified.py:1220 (runtime contract). Multi-whole-row prefix WITH
(e.g. WITH a, x, y) now compiles when only the trailing-MATCH source
alias is referenced downstream.

For the case IC3 actually needs — non-source whole-row aliases referenced
in trailing WHERE / RETURN / WITH — emit an actionable failfast pointing
to #989. The full row-carrier rewrite (slice 4.3b) lands the carry of
non-source alias properties as hidden columns; that is tracked under
#989 as the architectural exit gate.

Implementation:
* `_bounded_reentry_carry_columns`: returns `(reentry_alias,
  carried_scalars, non_source_alias_names)` instead of raising on N>1
  whole-rows. Picks `reentry_alias` from the trailing-MATCH first node
  alias when it is a carried whole-row, else falls back to the first.
* `_collect_non_source_alias_references`: scans trailing WHERE
  (predicates + tree path), trailing WITH stages, RETURN items,
  ORDER BY items, and reentry UNWINDs for references to non-source
  aliases.
* `_iter_property_refs`: walks structured `WhereClause.predicates` for
  `PropertyRef` nodes (the alias-keyed reference shape).
* `_compiled_query_reentry_contract` (gfql_unified): consults the
  ReentryPlan field added in the prior slice and picks the recorded
  reentry_alias_name when present.
* `ReentryPlan.aliases` now includes a `CarriedAlias` entry per
  non-source whole-row alias for future slices to populate
  `carried_properties`.

Tests: graphistry/tests/compute/gfql/cypher/ — 1093 passed (+2 new
focused cases for multi-whole-row admit + non-source-alias failfast).

Refs #987 #989 #999.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…urce-alias scanner (#989 slice 4.3a follow-up)

Review of PR #1248 surfaced a latent AttributeError in
_collect_non_source_alias_references: stage.where on ProjectionStage is
Optional[ExpressionText] (raw text node), not Optional[WhereClause]. The
prior code passed it to _scan_where(...) which crashes on .expr_tree on
any multi-whole-row prefix followed by a downstream WITH ... WHERE stage.

Fix: scan stage.where.text directly (symmetric to the unwind / return /
order_by branches in the same loop).

Add regression test
test_string_cypher_admits_multi_whole_row_prefix_with_downstream_stage_where
exercising the previously-broken code path on a green-execution shape.

Tests: 1094 passed (+1 new regression), no regressions.

Refs #989 #999.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ields, add plan structure test, comment cleanup (#989)

Wave 2 review of PR #1248 (parallel per-dimension subagents) surfaced two
IMPORTANT findings that survived the adversarial pass:

* Spec + Quality consensus: `CarriedAlias.id_column` (Optional[str] = None)
  and `carried_properties` (Tuple[str, ...] = ()) were declared but never
  set, violating CLAUDE.md "Don't design for hypothetical future
  requirements." Removed; will land in slice 4.3b at first set.

* Tests: no test asserted `compiled_query.reentry_plan` structure, leaving
  silent-regression risk for the plan field in slice 4.3b. Added
  `test_compile_cypher_records_reentry_plan_for_multi_whole_row_prefix`
  covering reentry_alias_name, scalar_only, alias output_names, and
  is_reentry_alias flag distribution.

Stacking related Quality SUGGESTIONs:
* Replaced "Slice 4.1+4.3" inline planning context with #989-anchored
  comment per the skill's "ephemeral reference leakage" rule.
* Hoisted `dataclasses.fields` to module-top import, dropped the
  misleading "to avoid top-level shadowing" comment.
* Narrowed `_iter_property_refs(node: Any)` to `node: object` and
  rewrote the docstring to enumerate the actual leaf types it cannot
  recurse into.

Adversarial pass REJECTED a wave-2 BLOCKER claim of infinite recursion in
`_iter_property_refs` — frozen dataclasses with PropertyRef leaves are
acyclic by construction; the trailing fall-through is correct.

Tests: 1095 passed (+1 new) on graphistry/tests/compute/gfql/cypher/.

Refs #987 #989 #999.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lice 4.3b)

When a prefix `WITH a, x, y` projects multiple whole-row aliases, rewrite the
prefix at compile time to project the non-source aliases as scalar property
carries: `WITH a, x.id AS x__id, ...`. Trailing `<non_source>.<prop>`
references rewrite to a property access on the reentry-alias's hidden column
(`<reentry_alias>.<__cypher_reentry_<alias>__<prop>__>`). The existing scalar
carry plumbing forwards the values onto the reentry-alias's row table; no
runtime changes needed.

Closes the previously-locked #1026 regression (multi-alias WITH +
OPTIONAL MATCH) which now executes correctly instead of raising. Slice 4.3a's
failfast tightens to bare-reference cases only — bare `<non_source>` carries
(and IC3's multi-stage `WITH a, x, y` chain that re-projects through stages)
remain out of scope for this slice and now point users at the still-pending
follow-up work under #989.

Implementation:
* `_reentry_property_carry_name(alias, prop)` helper for the unwrapped carry
  alias used in the prefix WITH; the existing `_reentry_hidden_column_name`
  wraps it at runtime to the final hidden column name.
* `_rewrite_multi_whole_row_prefix(prefix_stage, ...)` rewrites the prefix
  items: drops bare non-source aliases that have property refs downstream,
  inserts scalar projection per (alias, property) pair.
* `_collect_non_source_alias_property_refs` returns
  (props_by_alias, bare_referenced) — the dict drives the prefix rewrite,
  bare_referenced gates the failfast.
* `_rewrite_reentry_expr_to_hidden_properties` extended with optional
  `non_source_carried_props` arg; AST-level rewrite of
  `PropertyAccessExpr(Identifier(non_source), prop)` runs BEFORE bare-identifier
  substitution to avoid double-mangling.
* `ReentryPlan.aliases` now records `CarriedAlias.carried_properties` for the
  rewritten non-source aliases (field re-introduced from slice 4.1, now used).

Tests: 1096 passed (+1 net new positive — `_admits_non_source_alias_property_carry_through_reentry`,
`_failfast_..._bare_referenced` adapted from prior failfast). The
`test_issue_1026_multi_alias_with_optional_match_rejects_cleanly` flipped to
`_carries_non_source_property` asserting correct execution.

Refs #987 #989 #999 #1026.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…terminism comment, ORDER BY failfast test (#989)

Wave 3 review of slice 4.3b (commit c6a9852) flagged two SUGGESTION-level
items resolvable in a small follow-up:
* Carry name `<alias>__<prop>` could collide with a user-projected scalar
  in the same prefix WITH (hypothetical, low-likelihood). Tightened to
  `__carry_<alias>__<prop>__` so the namespace can't overlap with any
  unquoted user identifier.
* Sort-determinism comment on the property-tuple ordering inside
  `_rewrite_multi_whole_row_prefix` so future readers know the order is
  cosmetic and that downstream AST rewrites match by name.

Plus a missing failfast-scope test:
* `test_string_cypher_failfast_rejects_multi_whole_row_prefix_when_non_source_alias_is_bare_referenced_in_order_by`
  exercises the `query.order_by` branch of `_collect_non_source_alias_property_refs`.

Adversarial pass on wave 3 confirmed the remaining flagged items
(double-rewrite safety, bare-vs-property filter logic, end-to-end carry
correctness, flipped #1026 test) are all CORRECT.

Tests: 1097 passed (+1 new failfast scope test), no regressions.

Refs #989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…am WITH stages (#989 slice 4.3c)

Slice 4.3c fold-up: when a downstream WITH stage forwards carried whole-row
aliases as bare items (`WITH a, x, y, collect(...)`), drop them at compile
time. Their properties are already carried onto the reentry-alias's row
table as hidden columns; the bare items are pure forwarding noise.

Without this, the slice 4.3a/4.3b bare-reference failfast fires on
IC3-shaped multi-stage chains where each `WITH` re-projects the carried
aliases for downstream use. With this drop, those queries pass the
failfast and proceed to the property-access rewrite layer.

Note: this slice does NOT close the full IC3 lane. Multi-stage chains
that span more than one reentry MATCH (`MATCH (a)-[:KNOWS]-(friend)
... MATCH (friend)<-[:HAS_CREATOR]-(message)`) still fail at a deeper
runtime boundary — the hidden columns carrying `x.id`/`y.id` live on
`a`'s row table and don't follow forward to `friend`'s row table after
the first reentry's join. That cross-reentry carry-forward is the
remaining slice 4.3d work tracked under #989.

Implementation:
* `_drop_bare_alias_items_from_stage(stage, aliases, *, identifier_re)`:
  pure helper that drops bare-identifier projection items whose name is
  in the carried set. Items with explicit aliases (`x AS xref`) are
  preserved — those are uses, not forwards.
* `_rewrite_multi_whole_row_prefix(...)` extended to also return the
  rewritten downstream WITH stages tail. The bare-ref scanner now runs
  on the cleaned tail so forwarding patterns don't false-positive.
* Caller threads the rewritten tail back into `query.with_stages` before
  the prefix compile + suffix construction.

Tests: 1097 passed. The bare-ref failfast test updated to use
`WHERE x = b` (a true USE) since `WITH b, x` is now correctly admitted
as a forwarding pattern.

Refs #989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov marked this pull request as ready for review May 3, 2026 05:19
Comment thread .github/workflows/ci.yml Outdated
if [[ -n "${tck_sha_override}" ]]; then
tck_ref="${tck_ref}+sha-override"
tck_sha="${tck_sha_override}"
fi
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop most?

lmeyerov and others added 2 commits May 2, 2026 22:21
…arry slices

Adds an Internal entry under [Development] covering PR #1248's contributions
beyond the #1071 lift already on master:

- ReentryPlan + CarriedAlias dataclass IR (#987 step 1 / #989 slice 4.1)
- Multi-whole-row admit when non-source aliases unused (slice 4.3a)
- Compile-time property carry rewrite via __carry_<alias>__<prop>__
  hidden columns, closing the multi-alias case of the #1026 regression-lock
  (slice 4.3b)
- Bare-item drop in downstream forwarding WITH stages (slice 4.3c)

Cross-reentry-boundary carry forwarding (IC3 / #999 slice 4.3d) is called
out as remaining follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant