feat(gfql): two-tier pass execution — Tier 1 structural + Tier 2 fixed-point loop (#1189) (#1191)

lmeyerov · claude · web-flow · commit e1b36175882e · 2026-04-22T21:20:22.000-04:00
* feat(gfql): two-tier pass execution — Tier 1 structural + Tier 2 fixed-point loop (#1189) Extend PassManager to support two explicit pass tiers: - Tier 1 (structural): each pass runs exactly once in configured order - Tier 2 (rewrite): all passes repeat in a fixed-point loop until a full sweep produces no changes, bounded by max_iterations (default 100) PassResult gains `changed: bool = True` for Tier 2 convergence signalling. Default True preserves backward compatibility for passes that predate two-tier semantics. Add UnnestApply as Tier 1 structural pass: rewrites non-correlated Apply nodes (correlation_vars == frozenset()) to Join(join_type="cross"), exposing them to downstream join-ordering passes. Correlated Apply nodes are left untouched. Wire PredicatePushdownPass as the first Tier 2 rewrite rule; set changed=pushed > 0 for correct convergence detection. Update DEFAULT_LOGICAL_PASSES and DEFAULT_TIER2_PASSES accordingly and wire both into _run_logical_pass_pipeline in gfql_unified.py. 19 new tests cover tier-1 ordering, tier-2 fixed-point convergence, verifier failure propagation in both tiers, max_iterations breach, UnnestApply rewrites, and backward-compat single-arg PassManager call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(gfql): address wave-1 review findings for two-tier pass manager (#1189) - Remove duplicate DEFAULT_LOGICAL_PASSES/DEFAULT_TIER2_PASSES sentinel names from manager.py (inline as ()) to prevent silent empty-pass imports from the manager module; populated defaults remain in passes/__init__.py - Add comment to lowering.py explaining compilation-time vs runtime pass pipeline split and why double-application of PredicatePushdownPass is safe - Add output_schema preservation test for UnnestApply rewrite - Add combined smoke test: default PassManager config (UnnestApply T1 + PredicatePushdown T2) runs without error on a plain plan Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(gfql): wave-2 polish — Tier 1 API in lowering.py, real integration test (#1189) - lowering.py: use tier1_passes= (single-execution) instead of tier2_passes= to match the "single pass at compilation time" intent; update comment - test_unnest_apply.py: replace smoke test with real integration test that builds Apply(Filter(PatternMatch)) and asserts UnnestApply rewrites the outer Apply to Join and PredicatePushdownPass pushes the predicate into PatternMatch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(gfql/passes): remove redundant compilation-time predicate pushdown from lowering.py The PassManager call that applied PredicatePushdownPass at compile time was added before the runtime pass pipeline in gfql_unified.py was fully wired. Now that DEFAULT_TIER2_PASSES includes PredicatePushdownPass and _run_logical_pass_pipeline runs on every query execution, the compilation-time application is redundant and can be removed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(gfql/passes): accumulate Tier 2 integer metadata across iterations + document slot enumeration Tier 2 metadata previously overwrote per iteration, so final metadata showed pushed_predicates=0 after convergence (last iteration always pushes nothing). Now integer fields are summed across iterations so the final metadata reflects cumulative work. Also documents that the _unnest_tree child-slot tuple is exhaustive over all current LogicalPlan subclasses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(gfql/tests): update test_lowering to reflect runtime-only predicate pushdown The compilation step (lowering.py) no longer applies PredicatePushdownPass; it now happens in the runtime pass pipeline via DEFAULT_TIER2_PASSES. Update the test to assert the correct post-compilation state: the plan contains a Filter node for the WHERE predicate (not yet pushed into PatternMatch.predicates). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -32,6 +32,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
 - **GFQL / Cypher public API**: `compile_cypher()`, `compile_cypher_query()`, `CompiledCypherQuery`, `CompiledCypherUnionQuery`, and `CompiledCypherProcedureCall` are deprecated and **scheduled for removal in a future release**. All emit `DeprecationWarning` at use. Migrate to `g.gfql(..., language="cypher")` for execution or `cypher_to_gfql()` / `gfql_from_cypher()` for chain translation. Tracked in #1169.
 
 ### Added
+- **GFQL / two-tier pass execution**: Extended `PassManager` to support two explicit pass tiers: Tier 1 structural passes run once in configured order; Tier 2 rewrite rules run in a fixed-point loop until a full sweep makes no changes or `max_iterations` is exceeded. `PassResult` gains a `changed: bool` field (default `True` for backward compatibility) used by the convergence check. Added `UnnestApply` as the first Tier 1 structural pass — rewrites non-correlated `Apply` nodes (empty `correlation_vars`) to `Join(join_type="cross")`, exposing them to downstream join-ordering passes; correlated Apply nodes are preserved. `PredicatePushdownPass` is wired as the first Tier 2 rewrite rule and now sets `changed=pushed > 0` for correct convergence. `DEFAULT_LOGICAL_PASSES` and `DEFAULT_TIER2_PASSES` are populated accordingly and wired into `gfql()` execution. 19 new unit tests across `test_pass_manager.py` and `test_unnest_apply.py` (#1189).
 - **GFQL / pass framework skeleton**: Added `graphistry/compute/gfql/passes/` with `LogicalPass`, `PassResult`, and deterministic `PassManager.run()` sequencing. The pass manager now invokes IR `verify()` after each pass and fails fast on invalid pass output. Wired a new logical-pass pipeline hook into `gfql()` execution between logical-plan and physical-planner stages using a default no-op pass configuration to preserve runtime behavior. Added focused tests for pass ordering, verifier-failure propagation, and runtime pipeline hook invocation (`test_pass_manager.py`, `test_runtime_physical_cutover.py`) (#1180).
 - **GFQL / predicate pushdown safety**: Added `graphistry/compute/gfql/ir/pushdown_safety.py` with three reusable utilities for `PredicatePushdownPass`: `is_null_rejecting(pred, null_extended_aliases)` — conservative syntactic heuristic returning True when a predicate references a null-extended alias (OPTIONAL MATCH) and does not use a null-safe form (IS NULL, IS NOT NULL, COALESCE, NULLIF); `is_null_safe` — inverse; `with_barrier_blocks_pushdown(scope_stack, pred_refs)` — returns True when a WITH-clause `ScopeFrame` prevents backward predicate movement for the given reference set. All three exported from `ir/__init__.py`. 41 unit tests (#1181).
 - **GFQL / predicate pushdown rewrite**: Added `PredicatePushdownPass` implementation in `graphistry/compute/gfql/passes/predicate_pushdown.py` and wired it into logical planning route execution. The pass rewrites `Filter(input=PatternMatch(...))` by pushing safe predicates into `PatternMatch.predicates`, keeps residual filters for partial-push cases, and blocks null-rejecting pushdown into optional arms using existing safety helpers. Added focused pass tests and a lowering-route integration assertion (`test_predicate_pushdown_pass.py`, `cypher/test_lowering.py`) (#1187).
diff --git a/graphistry/compute/gfql/cypher/lowering.py b/graphistry/compute/gfql/cypher/lowering.py
@@ -47,7 +47,6 @@
 from graphistry.compute.gfql.ir.types import ScalarType
 from graphistry.compute.gfql.ir.verifier import verify as verify_logical_plan
 from graphistry.compute.gfql.logical_planner import LogicalPlanner
-from graphistry.compute.gfql.passes import PassManager, PredicatePushdownPass
 from graphistry.compute.predicates.ASTPredicate import ASTPredicate
 from graphistry.compute.predicates.comparison import eq, ge, gt, isna, le, lt, ne, notna
 from graphistry.compute.predicates.is_in import is_in
@@ -8335,7 +8334,6 @@ def _logical_plan_route_for_query(
         logical_plan = LogicalPlanner(
             allow_unknown_match_aliases=allow_unknown_match_aliases
         ).plan(bound_ir, ctx)
-        logical_plan = PassManager((PredicatePushdownPass(),)).run(logical_plan, ctx).plan
     except GFQLValidationError as exc:
         return None, str(exc.message)
     _verify_selected_logical_plan(logical_plan)
diff --git a/graphistry/compute/gfql/passes/__init__.py b/graphistry/compute/gfql/passes/__init__.py
@@ -1,12 +1,21 @@
 """Logical plan pass framework."""
 
-from .manager import DEFAULT_LOGICAL_PASSES, LogicalPass, PassManager, PassResult
+from .manager import LogicalPass, PassManager, PassResult
 from .predicate_pushdown import PredicatePushdownPass
+from .unnest_apply import UnnestApply
+
+# Tier 1: structural passes that run once in order.
+DEFAULT_LOGICAL_PASSES = (UnnestApply(),)
+
+# Tier 2: rewrite rules that run in a fixed-point loop until convergence.
+DEFAULT_TIER2_PASSES = (PredicatePushdownPass(),)
 
 __all__ = [
     "DEFAULT_LOGICAL_PASSES",
+    "DEFAULT_TIER2_PASSES",
     "LogicalPass",
     "PassManager",
     "PassResult",
     "PredicatePushdownPass",
+    "UnnestApply",
 ]
diff --git a/graphistry/compute/gfql/passes/manager.py b/graphistry/compute/gfql/passes/manager.py
@@ -2,7 +2,7 @@
 from __future__ import annotations
 
 from dataclasses import dataclass, field
-from typing import Dict, Optional, Protocol, Sequence, Tuple
+from typing import Dict, Protocol, Sequence, Tuple
 
 from graphistry.compute.exceptions import ErrorCode, GFQLValidationError
 from graphistry.compute.gfql.ir.compilation import CompilerError, PlanContext
@@ -16,6 +16,9 @@ class PassResult:
 
     plan: LogicalPlan
     metadata: Dict[str, object] = field(default_factory=dict)
+    # Tier 2 convergence signal: set False when the pass made no changes.
+    # Defaults to True so passes that predate two-tier semantics are conservative.
+    changed: bool = True
 
 
 class LogicalPass(Protocol):
@@ -25,22 +28,42 @@ class LogicalPass(Protocol):
 
     def run(self, plan: LogicalPlan, ctx: PlanContext) -> PassResult:
         """Transform a logical plan and return a PassResult."""
+        ...
 
 
-DEFAULT_LOGICAL_PASSES: Tuple[LogicalPass, ...] = ()
+_DEFAULT_MAX_ITERATIONS = 100
 
 
 class PassManager:
-    """Sequential pass runner with verifier guards after each pass."""
-
-    def __init__(self, passes: Sequence[LogicalPass] = DEFAULT_LOGICAL_PASSES) -> None:
-        self._passes: Tuple[LogicalPass, ...] = tuple(passes)
+    """Two-tier pass runner with verifier guards after each pass.
+
+    Tier 1 (*tier1_passes*): each pass runs exactly once in configured order.
+    Tier 2 (*tier2_passes*): all passes run repeatedly in a fixed-point loop
+    until a full sweep produces no changes (every pass returns
+    ``PassResult.changed=False``).  Bounded by *max_iterations* to guarantee
+    termination.
+
+    Populated defaults (``DEFAULT_LOGICAL_PASSES``, ``DEFAULT_TIER2_PASSES``)
+    are defined in the package ``__init__`` to avoid circular imports.
+    """
+
+    def __init__(
+        self,
+        tier1_passes: Sequence[LogicalPass] = (),
+        tier2_passes: Sequence[LogicalPass] = (),
+        *,
+        max_iterations: int = _DEFAULT_MAX_ITERATIONS,
+    ) -> None:
+        self._tier1: Tuple[LogicalPass, ...] = tuple(tier1_passes)
+        self._tier2: Tuple[LogicalPass, ...] = tuple(tier2_passes)
+        self._max_iterations = max_iterations
 
     def run(self, logical_plan: LogicalPlan, ctx: PlanContext) -> PassResult:
         current = logical_plan
         merged_metadata: Dict[str, object] = {}
 
-        for logical_pass in self._passes:
+        # --- Tier 1: structural passes, each runs exactly once ---
+        for logical_pass in self._tier1:
             result = logical_pass.run(current, ctx)
             current = result.plan
             if result.metadata:
@@ -49,9 +72,42 @@ def run(self, logical_plan: LogicalPlan, ctx: PlanContext) -> PassResult:
             if diagnostics:
                 raise _verification_error(logical_pass.name, diagnostics)
 
+        # --- Tier 2: rewrite rules, fixed-point loop ---
+        if self._tier2:
+            for _ in range(self._max_iterations):
+                any_changed = False
+                for logical_pass in self._tier2:
+                    result = logical_pass.run(current, ctx)
+                    current = result.plan
+                    if result.metadata:
+                        _accumulate_metadata(merged_metadata, logical_pass.name, result.metadata)
+                    if result.changed:
+                        any_changed = True
+                    diagnostics = verify(current)
+                    if diagnostics:
+                        raise _verification_error(logical_pass.name, diagnostics)
+                if not any_changed:
+                    break
+            else:
+                raise _convergence_error(self._max_iterations)
+
         return PassResult(plan=current, metadata=merged_metadata)
 
 
+def _accumulate_metadata(
+    merged: Dict[str, object], pass_name: str, new: Dict[str, object]
+) -> None:
+    existing = merged.get(pass_name)
+    if isinstance(existing, dict):
+        acc: Dict[str, object] = dict(existing)
+        for k, v in new.items():
+            prev = acc.get(k)
+            acc[k] = prev + v if isinstance(prev, int) and isinstance(v, int) else v  # type: ignore[operator]
+        merged[pass_name] = acc
+    else:
+        merged[pass_name] = dict(new)
+
+
 def _verification_error(pass_name: str, diagnostics: Sequence[CompilerError]) -> GFQLValidationError:
     message = "; ".join(error.message for error in diagnostics[:3])
     if len(diagnostics) > 3:
@@ -64,3 +120,14 @@ def _verification_error(pass_name: str, diagnostics: Sequence[CompilerError]) ->
         suggestion=message or "Ensure pass output satisfies LogicalPlan verifier invariants.",
         language="cypher",
     )
+
+
+def _convergence_error(max_iterations: int) -> GFQLValidationError:
+    return GFQLValidationError(
+        ErrorCode.E108,
+        f"Tier 2 pass loop did not converge after {max_iterations} iterations",
+        field="pass",
+        value="tier2",
+        suggestion="Check that Tier 2 passes converge and set PassResult.changed=False when unchanged.",
+        language="cypher",
+    )
diff --git a/graphistry/compute/gfql/passes/predicate_pushdown.py b/graphistry/compute/gfql/passes/predicate_pushdown.py
@@ -30,6 +30,7 @@ def run(self, plan: LogicalPlan, ctx) -> PassResult:  # noqa: ANN001
                 "pushed_predicates": pushed,
                 "residual_predicates": residual,
             },
+            changed=pushed > 0,
         )
 
 
diff --git a/graphistry/compute/gfql/passes/unnest_apply.py b/graphistry/compute/gfql/passes/unnest_apply.py
@@ -0,0 +1,63 @@
+"""UnnestApply: Tier 1 structural pass eliminating non-correlated Apply operators.
+
+A non-correlated Apply (``correlation_vars == frozenset()``) is semantically
+equivalent to a cross join because the subquery does not reference any variable
+from the outer input.  Rewriting to ``Join(join_type="cross")`` exposes the
+shape to downstream join-ordering and predicate-pushdown passes.
+
+Correlated Apply operators are left untouched.
+"""
+from __future__ import annotations
+
+from dataclasses import replace
+from typing import Any, Tuple, cast
+
+from graphistry.compute.gfql.ir.logical_plan import Apply, Join, LogicalPlan
+from graphistry.compute.gfql.passes.manager import LogicalPass, PassResult
+
+
+class UnnestApply:
+    """Rewrite non-correlated Apply nodes to cross Join nodes."""
+
+    name = "unnest_apply"
+
+    def run(self, plan: LogicalPlan, ctx: Any) -> PassResult:  # noqa: ANN401
+        _ = ctx
+        rewritten, count = _unnest_tree(plan)
+        return PassResult(
+            plan=rewritten,
+            metadata={"unnested": count},
+            changed=count > 0,
+        )
+
+
+def _unnest_tree(plan: LogicalPlan) -> Tuple[LogicalPlan, int]:
+    count = 0
+    children_updates = {}
+    # Exhaustive list of plan-child slot names across all LogicalPlan subclasses (logical_plan.py).
+    for slot in ("input", "left", "right", "subquery"):
+        child = getattr(plan, slot, None)
+        if isinstance(child, LogicalPlan):
+            rewritten_child, child_count = _unnest_tree(child)
+            count += child_count
+            if rewritten_child is not child:
+                children_updates[slot] = rewritten_child
+
+    current: LogicalPlan = (
+        cast(LogicalPlan, replace(cast(Any, plan), **children_updates))
+        if children_updates
+        else plan
+    )
+
+    if isinstance(current, Apply) and not current.correlation_vars:
+        joined = Join(
+            op_id=current.op_id,
+            output_schema=current.output_schema,
+            left=current.input,
+            right=current.subquery,
+            condition=None,
+            join_type="cross",
+        )
+        return joined, count + 1
+
+    return current, count
diff --git a/graphistry/compute/gfql_unified.py b/graphistry/compute/gfql_unified.py
@@ -49,7 +49,7 @@
     SamePathExecutorWrapper,
     WavefrontExecutorWrapper,
 )
-from graphistry.compute.gfql.passes import DEFAULT_LOGICAL_PASSES, PassManager
+from graphistry.compute.gfql.passes import DEFAULT_LOGICAL_PASSES, DEFAULT_TIER2_PASSES, PassManager
 from graphistry.compute.gfql.row.pipeline import is_row_pipeline_call
 from graphistry.compute.typing import DataFrameT, SeriesT
 from graphistry.compute.util.generate_safe_column_name import generate_safe_column_name
@@ -682,8 +682,8 @@ def _execute_compiled_query_non_union(
 
 
 def _run_logical_pass_pipeline(logical_plan: LogicalPlan, ctx: PlanContext) -> LogicalPlan:
-    """Run logical pass pipeline with default no-op pass configuration."""
-    return PassManager(DEFAULT_LOGICAL_PASSES).run(logical_plan, ctx).plan
+    """Run logical pass pipeline: Tier 1 structural passes then Tier 2 fixed-point rewrite loop."""
+    return PassManager(DEFAULT_LOGICAL_PASSES, DEFAULT_TIER2_PASSES).run(logical_plan, ctx).plan
 
 
 def _execute_compiled_query_via_physical_plan(
diff --git a/graphistry/tests/compute/gfql/cypher/test_lowering.py b/graphistry/tests/compute/gfql/cypher/test_lowering.py
@@ -786,7 +786,9 @@ def test_logical_plan_route_for_query_allows_unknown_alias_match_shape_when_opte
     assert defer_reason is None
 
 
-def test_logical_plan_route_for_query_pushes_where_predicate_into_pattern_match() -> None:
+def test_logical_plan_route_for_query_emits_filter_for_where_predicate() -> None:
+    # Compilation emits a Filter node for the WHERE clause; predicate pushdown into
+    # PatternMatch.predicates happens later in the runtime pass pipeline (gfql_unified.py).
     query = _parse_query("MATCH (a)-[r]->(b) WHERE r.weight > 5 RETURN b")
     bound_ir = FrontendBinder().bind(query, PlanContext())
 
@@ -803,10 +805,11 @@ def _walk(node):  # noqa: ANN001, ANN202
                 yield from _walk(child)
 
     nodes = list(_walk(logical_plan))
+    # Predicate is in a Filter node — not yet pushed into PatternMatch
+    assert any(isinstance(node, Filter) and "alias='r'" in node.predicate.expression for node in nodes)
     pattern_nodes = [node for node in nodes if isinstance(node, PatternMatch)]
     assert pattern_nodes
-    assert any("alias='r'" in pred.expression for pred in pattern_nodes[0].predicates)
-    assert not any(isinstance(node, Filter) and "alias='r'" in node.predicate.expression for node in nodes)
+    assert not any("alias='r'" in pred.expression for pred in pattern_nodes[0].predicates)
 
 
 def test_compiled_query_sets_logical_plan_route_for_call_shape() -> None:
diff --git a/graphistry/tests/compute/gfql/test_pass_manager.py b/graphistry/tests/compute/gfql/test_pass_manager.py
diff --git a/graphistry/tests/compute/gfql/test_unnest_apply.py b/graphistry/tests/compute/gfql/test_unnest_apply.py

Original file line number	Diff line number	Diff line change
`@@ -30,6 +30,7 @@ def run(self, plan: LogicalPlan, ctx) -> PassResult: # noqa: ANN001`
`30`	`30`	`"pushed_predicates": pushed,`
`31`	`31`	`"residual_predicates": residual,`
`32`	`32`	`},`
	`33`	`+ changed=pushed > 0,`
`33`	`34`	`)`
`34`	`35`
`35`	`36`