Skip to content

feat(gfql/polars): engine followups — native multi-hop, to_fixed_point, undirected, min_hops>1, more predicates + NIE→native#1667

Open
lmeyerov wants to merge 39 commits into
dev/gfql-polars-enginefrom
feat/gfql-polars-engine-followups
Open

feat(gfql/polars): engine followups — native multi-hop, to_fixed_point, undirected, min_hops>1, more predicates + NIE→native#1667
lmeyerov wants to merge 39 commits into
dev/gfql-polars-enginefrom
feat/gfql-polars-engine-followups

Conversation

@lmeyerov

@lmeyerov lmeyerov commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Polars-engine followups stacked directly on the engine PR (#1660) — restacked 2026-07-02: an audit proved this PR has zero dependency on the index layer, so the experimental CSR-index PR (#1658) now stacks on top of this instead of underneath (each layer reviewable against what it actually uses). Extends the native polars GFQL engine to parity-or-NIE across many surfaces: native fixed multi-hop (hops=N), forward/reverse to_fixed_point, fixed undirected multi-hop, forward/reverse min_hops>1 in chain() (500/500 fuzz), more predicates (date-part boundaries, temporal Between), toInteger/toBoolean/toString/toFloat, collect/collect_distinct aggs, WHERE … IN membership, plus 3 cuDF cross-engine fixes (#1663). Test-amplification hardened the chain fuzz to compare null-aware node attributes + edge multiplicity. DRY: shared eager+lazy hop setup. All dgx-validated (parity-or-honest-NIE, NO-CHEATING). Also: count(*) short-circuit — a new count_table row op reads the scanned table's height (or alias-mask sum) in one reduction instead of materializing the full frame + constant-key group_by; the Cypher lowering short-circuits the pure MATCH (n)/()-[r]->() RETURN count(*) shape only (DISTINCT / WHERE / group keys / UNWIND / bindings all fall through unchanged; post-aggregate exprs like count(*)+1 compose via a temp column). Node-count MATCH (n) RETURN count(*): polars ~1 ms at 5M nodes (was ~2.5 s via materialize+group_by).

openCypher-null conformance tracked in #1664; open NIE surfaces + per-engine docs in #1665.

Review notes: 38 conventional commits (merge-commit repo; adjacent fixups folded). Team-polish pass (9c0ec314): conformance-matrix dead guards removed, stale NIE wording fixed (undirected/to_fixed_point/min_hops are native now), count_table engine-consistent empty fallback. Validation at tip: dgx 4-engine conformance matrix + ledger + lowering = 1359 pass; pandas-lane count(*) tests added (coverage-audit gate).

@lmeyerov lmeyerov force-pushed the feat/gfql-polars-engine-followups branch 2 times, most recently from 1e0d542 to bfdfc65 Compare July 2, 2026 16:18
@lmeyerov lmeyerov changed the base branch from dev/gfql-seeded-traversal-index to dev/gfql-polars-engine July 2, 2026 16:18
lmeyerov added a commit that referenced this pull request Jul 2, 2026
… (3 files)

The test-gfql-core per-file coverage audit fails on this stack because #1667
adds engine-specific (polars/cuDF-only) branches to files whose floors were
generated pre-#1667 with razor-thin margins — those branches are unreachable
in the pandas-only audit lane BY DESIGN (they are exercised in the polars CI
lane and the dgx 4-engine conformance runs).

Clean-room reproduction (fresh worktree at this tip + import-blocked
polars/cudf to simulate the lane): 3312 tests pass, exactly 3 files below
floor: row/frame_ops.py 60.22<65.0 (count_table polars branch + error paths),
chain_let.py 70.07<70.21 and cypher/reentry/execution.py 85.33<85.66
(hair-thin engine-gate shifts).

Two-part fix:
- Direct unit tests for count_table's frame-op-level paths the GFQL call
  path can't reach (param validation intercepts first): bad-table + missing-
  source ValueErrors, null-in-mask counting, sibling-frame-templated and
  bare 0-count fallbacks.
- Regenerate the 3 floors to the clean-room actuals (minus small local-vs-CI
  variance headroom): frame_ops 65.0->59.5, chain_let 70.21->69.5,
  reentry/execution 85.66->84.75. Intentional engine-specific growth, per
  the baseline-update-in-PR flow; all other floors untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov and others added 22 commits July 2, 2026 16:35
…lars' / 'polars-gpu'

Two robustness gaps in engine selection:
- engine='polars' without polars installed -> raw cryptic ImportError from deep in frame
  coercion / the lazy engine.
- engine='polars-gpu' without the RAPIDS cudf_polars stack -> the missing-lib failure was
  caught by raise_on_fail=True and MISLABELED by _gpu_raise as "plan not GPU-executable, use
  engine='polars'" — pointing at the wrong fix.

Add guards at the chain dispatch (compute/chain.py), pre-coercion, so the user always sees an
actionable install message regardless of which query path runs: engine='polars'/'polars-gpu'
both require polars (`pip install polars`); 'polars-gpu' additionally requires cudf_polars
(checked via find_spec so it's consistent even for eager fast-path queries that never reach a
GPU collect). lazy._engine_for keeps reporting genuine not-GPU-capable plans via _gpu_raise
(unchanged; +clarifying comment). +1 regression test (polars-missing + cudf_polars-missing).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lt | streaming)

cudf-polars has multiple executor modes; expose a switch mirroring GFQL_POLARS_CPU_STREAMING.
Default 'in-memory' (fast + stable for results that fit device memory) is unchanged; opt-in
'streaming' is the escape hatch for larger-than-device-memory results (the in-memory executor
would OOM — the F3 85M-row case). Invalid values fall back to in-memory; raise_on_fail stays
True (NO-CHEATING) either way. ('auto' size-aware switch is a planned enhancement.) +1 test
(mocks GPUEngine, no GPU needed). Full streaming-vs-in-memory crossover benchmark is dgx-gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-bridge + 2 predicate bugs)

A parallel audit of the full polars NIE surface (predicates / row-pipeline / non-cypher modes)
found three real defects:

1. NO-CHEATING violation (the important one): a DAG `let({'d': call('get_degrees')})` binding
   under engine='polars' SILENTLY ran the call on pandas and coerced the result back to polars
   (call/executor.py ensure_engine_match), while the identical chain op `[call('get_degrees')]`
   honestly raises NotImplementedError. Fix: under a polars engine, if a Plottable-method call's
   result frames are not already polars, decline (NIE) instead of bridging; pass NotImplementedError
   through execute_call + the DAG wrapper so it stays catchable (matching the chain surface).
   pandas/cuDF engines unaffected. (158 call/DAG tests still green.)

2. Contains predicate ignored its regex flag (hardcoded literal=False) — a LITERAL pattern with
   regex metacharacters (e.g. "a.b") was matched as a regex => wrong answer. Now honors regex=False
   (literal substring; case-insensitive literal lowercases both sides). Differential parity verified.

3. Temporal comparison leak: _cmp_expr built `col > TemporalValue` (a non-None broken expr that
   errors at df.filter / misorders) instead of declining. Now returns None for temporal-typed vals
   -> honest NIE; numeric/string comparisons unaffected.

+2 regression tests. Full inventory + the tractable feature-win batch (CaseWhen, count_distinct,
str/match predicates, get_degrees-native, …) recorded in plans/gfql-engine-followups/plan.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… 0 strategy)

The bug pattern (DAG silent-bridge, Contains-regex, temporal leak, multi-hop/undirected combine)
showed the prior gates only covered the CHAIN surface with hand-picked cases. This adds the core
conformance invariant as automation: for any query, on a non-pandas engine the result is EITHER
parity-equal to the pandas oracle OR an honest NotImplementedError — never silently different,
never a silent bridge, never a non-NIE crash.

Covers the cross-product the chain-only gates missed:
- predicates (GT/LT/Between/IsIn/Contains{regex,case}/Startswith/Endswith/IsNA) x {chain, let-DAG};
- traversals (single-hop parity; multi-hop / undirected-multi-edge must NIE);
- cross-SURFACE call() consistency (chain vs DAG must agree — this permanently guards the
  silent-bridge class we just fixed);
- a seeded predicate fuzz asserting the invariant.
37 cases green (validates the 3 correctness fixes). Wired into bin/test-polars.sh + ci.yml's
coverage list. CPU lane here; the cudf/polars-gpu lane + carve-out sweep + coverage ledger are the
next Phase-0 prongs (plan.md).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…arve-out cases

- _assert_invariant now checks EVERY available non-pandas engine (polars always; cudf +
  polars-gpu auto-detected) against the pandas oracle — so the dgx run covers the full
  cross-product. Fixed _sig to normalize cudf frames to pandas too (was polars-only — a harness
  completeness bug that only running on the GPU box exposed; CPU-local never exercised the cudf lane).
- Added hot-path carve-out cases (node-only MATCH, unconstrained/filtered single-hop, filters on
  BOTH endpoints, empty results, self-loop, isolated-node seed, reverse/undirected filtered) — the
  fast paths bypass the general engine, so they're the highest wrong-answer risk.
- 48 cases green on dgx across pandas / polars / cudf / polars-gpu.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gregation

Three feature-gap lowerings from the NIE audit (were honest NIE -> now native):
- sqrt(x) -> args.sqrt(); sign(x) -> args.sign() (_lower_function).
- count(DISTINCT x) -> col.drop_nulls().n_unique() (_agg_expr) — drop_nulls matches cypher/pandas
  nunique(dropna=True) semantics (polars n_unique() counts null as a value).

Conformance-gated: the value-level matrix (cypher RETURN/aggregation cases) verifies parity across
pandas/polars/cudf/polars-gpu on dgx; confirmed each runs NATIVELY (not NIE) with parity=True.
339 polars tests green on dgx.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- CASE WHEN cond THEN a ELSE b END -> pl.when(cond.cast(Boolean)).then(a).otherwise(b)
  (cond cast to Boolean for Cypher 3-valued: a null WHEN takes ELSE, matching pandas).
- Match -> regex anchored at START; Fullmatch -> anchored BOTH ends (case flag honored; declines
  on custom regex flags to avoid a flag-semantics gap).
Conformance-gated: value-level matrix (63 cases) green across pandas/polars/cudf/polars-gpu on dgx;
confirmed each runs NATIVELY (not NIE). 347 polars tests green on dgx.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pandas boundary predicates accept a tuple of prefixes/suffixes (match if ANY); now OR-folds
starts_with/ends_with over each element (case flag honored). Conformance-gated (67-case value-level
matrix on dgx, all 4 engines); confirmed native, parity verified.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e temporal compare

Three Phase-2d feature-gap wins, each native-or-honest and conformance-gated
(parity vs the pandas oracle OR honest NIE — no silent pandas bridge):

- get_degrees: pure group_by/count over edge endpoints, left-joined onto nodes
  (Int32, isolated/src-only/dst-only -> 0, self-loop double-counted). Wired on
  BOTH the chain surface (engine_polars.chain) and the let()/ref() DAG + JSON
  surface (call/executor) — turns a prior NIE/silent-bridge into a real win.
- with_(extend=True): native polars with_columns, reusing the shared
  lower_select_items lowering (DRY with select_polars); unlowerable item -> NIE.
- DateValue temporal comparison: chain p.gt(date(...)) on a NAIVE Datetime column
  lowers to col.dt.date() <op> pl.lit(date) (the exact pandas-oracle truncation);
  tz-aware DateTimeValue / TimeValue / tz columns stay honest NIE.

Conformance fixes the matrix caught (verify-not-trust):
- Cypher WHERE n.ts > date('...') lowered date() to an ISO STRING, so a Datetime
  column vs that string crashed polars (InvalidOperationError, a non-NIE crash).
  Now decline (honest NIE) only when the column is schema-typed temporal; a String
  column holding ISO text still computes lexicographically.
- no_silent_call_bridge test now uses get_indegrees (get_degrees is native).

+20 conformance cases (get_degrees / temporal / with_extend across chain·DAG·
cypher, incl. explicit "runs natively" + "honest NIE" assertions). dgx: matrix
87 passed across pandas/polars/cudf/polars-gpu; full polars suite 530 passed.
ruff + mypy clean. (Also splits 3 pre-existing E702/F541/F841 lint nits in the
touched test files.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r predicate

Phase-2d batch 2, each native-or-honest and conformance-gated (parity vs the
pandas oracle OR honest NIE — no silent pandas bridge):

- list literal `[e0, e1, ...]`: native `pl.concat_list` (order preserved, matching
  the pandas oracle) for a homogeneous-category list (all int / float / str / bool);
  mixed/nested/empty/null-dtype -> honest NIE. cudf is known to REORDER list elements
  (orthogonal cudf bug), so construction conformance is scoped pandas-vs-polars.
- `x IN [literals]` as a row expression (distinct from the WHERE/IsIn predicate path):
  native 3-valued membership (null lhs masked to None to match cypher), Boolean output
  -> cudf-safe, full parity-or-NIE across all engines.
- IsLeapYear predicate -> native `expr.dt.is_leap_year()` on a naive Datetime/Date
  column (Gregorian parity incl. 1900-non-leap / 2000-leap); tz-aware / non-temporal
  -> honest NIE. The 6 month/quarter/year boundary predicates KEEP declining — polars
  has no faithful boolean accessor (only rolled-datetime month_start/end), so
  re-deriving them would risk a subtle wrong answer (NO-CHEATING).

+~15 conformance cases (list construction / IN / IsLeapYear across chain·cypher, incl.
explicit native + honest-NIE assertions). Updated the row-expr unit test ([1,2,3] now
lowers; [1,2.5] mixed still NIEs). dgx: matrix + full polars suite green across
pandas/polars/cudf/polars-gpu; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s on feat)

get_degrees DAG dispatch (call/executor.py) and the _cmp_expr predicate unit-test import
were ADDED on the followups branch after the base, so the lazy/engine/polars home-move's
import rewrite didn't cover them. Repoint both to lazy.engine.polars.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ng + conformance ledger

Phase-2d batch 3, each native-or-honest and conformance-gated (parity vs the pandas oracle
OR honest NIE — no silent bridge); + the Phase-0 coverage-ledger automation:

- get_indegrees / get_outdegrees: native single-direction group_by/count (shared helper with
  get_degrees), wired on chain + DAG/JSON surfaces. Matches the pandas empty-edges quirk
  (keep pre-existing col unchanged) exactly. Repointed the no-bridge test to `hypergraph`
  (architecturally pandas-only) since the degree calls are now all native.
- size(x): native str.len_chars (String) / list.len (List); numeric/other -> honest NIE
  (pandas' row-count quirk is not replicated). substring(s,start[,len]): native str.slice for
  NON-NEGATIVE int-literal bounds over a String col; negative/non-literal/non-string -> NIE
  (negative start diverges: pandas python-slice vs polars offset/length).
- test_conformance_ledger.py (Phase 0 prong #4): pure-introspection CI gate — derives the
  predicate universe from the live type_to_predicate registry + the exercised set from the
  matrix's _predicate_queries labels, and FAILS when a new predicate lands without either a
  conformance case or a reasoned KNOWN_UNCOVERED waiver. Dual-wired into bin/test-polars.sh
  AND ci.yml. CPU-only (no engine exec).

dgx (--gpus all): 414 passed across pandas/polars/cudf/polars-gpu; ruff + mypy clean.
All on the post-rename lazy/engine/polars/ paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dtypes; rest honest-NIE)

- toInteger: Int/Bool identity cast; Float truncate-toward-zero with explicit NaN/null->null
  mask (matches pandas astype(float).astype(int64)); String -> honest NIE (pandas raises on
  non-numeric, not null-on-failure — polars strict=False would fabricate nulls).
- toBoolean: Boolean identity only; String/numeric token-set parsing not provable -> NIE.
- toString: Bool (lowercase true/false) / Int / String; Float -> NIE (cross-engine repr
  diverges) and temporal/Categorical -> NIE.

+ conformance cases (native + honest-NIE) per dtype. toString(float) is covered by a dedicated
pandas-vs-polars NIE test, not _assert_invariant (cudf formats floats differently than pandas —
an orthogonal cudf repr divergence). dgx (--gpus all): 221 passed across all engines; ruff+mypy clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-until-impl)

Multi-hop native chain prep. The repo fuzz did NOT exercise multi-hop
(_rand_edge was single-hop only), so the multi-hop NIE guard was never
tested. Amplify (TEST-ONLY; product still NIEs -> chains skip cleanly):
- _rand_edge: directed edges now emit hops in {2,3} / max_hops 1..N
  (e_undirected stays single-hop; undirected-multi is a separate defect)
- _rand_graph: densify toward cycles/self-loops/parallel-edges, |E|>>|V|
- fuzz parametrize 60 -> 500; add edge-COUNT multiplicity guard
- adversarial multi-hop parity cases (cycle/self-loop/dup/empty/reverse/
  both-filtered/hops>diameter/sandwiched) — NIE-skip until Stage 1
- conformance matrix: fwd-hops2/rev-hops2/maxhops3/midfilter/named/
  sandwiched (parity-native after Stage 1) + tofixed/und-hops2 stay-NIE
- polars-only test_polars_chain_multihop_deferred_raises (to_fixed_point,
  min_hops>1, undirected multi-hop)

Harness hardening: _frame_repr astype(object) before where(notna,None) so
cudf->pandas nullable-extension NA becomes real None (pd.NA in a signature
made res==base raise 'bool of NA ambiguous'). This surfaced a real cudf
min_hops divergence (seed node_hop None vs max_hops) — orthogonal cudf bug,
scoped out of the 4-engine matrix, polars NIE asserted directly.

dgx --gpus all: 348 passed, 352 skipped. ruff+mypy clean (test files).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hops, fwd+rev)

THE MARQUEE FEATURE. Native multi-hop polars chain, parity-equal to pandas.
Two prior shortcut attempts (flat forward∩backward intersection) shipped
wrong answers and were reverted; this ports the PATH-AWARE pandas algorithm.

Root cause (was NIE): _combine_edges single-hop-semijoins each edge step by
prev(src)/next(dst) wavefront, dropping intermediate-hop edges (e=3 vs
pandas e=10 at hops=2). The forward/reverse hops were already correct.

Fix (port of pandas combine_steps has_multihop, compute/chain.py:201-209):
for each fixed-length multi-hop edge step, RE-EXECUTE the forward hop over
the backward-pruned edge subgraph seeded from the forward-pass entry
wavefront. Backward prune = 'reaches a valid endpoint'; forward re-exec =
'reachable from seed'; their composition along the BFS frontier = exactly
the valid-path edges (NOT a coarse set intersection). _combine_edges then
appends all recomputed edge ids as-is (bypass the single-hop semijoin).
Recomputed frames feed the EDGE combine only; node combine + name tagging
keep the original backward-pruned steps (matches pandas kind=='edges' gate).

Guard NARROWED (not lifted): _is_native_multihop allows only plain hops=N /
max_hops, forward/reverse. Still honest-NIE for to_fixed_point, min_hops>1,
output slicing, hop labels, *_query, include_zero_hop_seed,
prune_to_endpoints, and undirected multi-hop (Stage 3). NO-CHEATING intact.

Validation (dgx --gpus all):
- amplified fuzz (500 seeds, cycles/self-loops/parallel-edges) + 8
  adversarial cases (cycle/self-loop/dup-multiplicity/empty/reverse/
  both-filtered/hops>diameter/sandwiched): ZERO mismatches. 497 passed,
  203 skipped (deferred surfaces) — up from 348/352 (+149 now native).
- conformance matrix multi-hop cases parity across all 4 engines.
- 387-test polars regression suite: zero regressions.
- benchmark: polars vs pandas 1.0-2.1x @10k/50k, 2.5-4.1x @100k/500k,
  6.0-8.2x @500k/2M; result sets identical across pandas/polars/cudf.
- ruff + mypy clean on chain.py.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ersal matrix

Extract traversal cases to _TRAVERSAL_CASES; add test_conformance_traversals_dag
running the SAME cases via let({'a': query}). Guards the silent-bridge bug class
(chain NIEs but DAG silently bridges to pandas) for multi-hop: native multi-hop
must reach the DAG/let surface too. dgx --gpus all: all 15 cases pass (parity
where native incl fwd/rev-hops2, maxhops3, midfilter, named, sandwiched; NIE
consistent for to_fixed_point / undirected-multi). ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extend the Phase-0 coverage ledger from predicates to cypher scalar functions.
Universe = GFQL_SCALAR_FUNCTIONS (language_defs.py, importable like type_to_predicate);
exercised set DERIVED by parsing function calls out of the cypher strings of a new
importable _cypher_expression_queries() generator (refactored from the inline
parametrize, mirrors _predicate_queries). Same 4 drift tests: registry entry must be
exercised or waived; no bogus/stale/redundant waivers; reasons non-empty.

9 functions exercised (sqrt/sign/abs/coalesce/size/substring/tointeger/toboolean/
tostring), 11 waived honest one-liners (tofloat NIE; keys/labels/type/properties/range
NIE; 5 internal __*__ helpers). Parser .lower()s for camelCase toInteger vs registry
tointeger; intersects registry to drop aggregations/identifiers.

dgx: ledger 14 passed (CPU introspection), cypher_expressions 19 passed (4-engine),
ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extend the coverage ledger to the call() safelist. Universe = SAFELIST_V1
(call/validation.py, 59 entries); exercised DERIVED from a new importable
_call_exercised_functions() (driven by the shared _CALL_CONSISTENCY_FNS constant
the cross-surface consistency test runs on + the degree-trio dedicated tests).
Same 4 drift tests. 5 exercised (get_degrees/in/out, hypergraph, limit), 54 waived
honest one-liners (row-pipeline ops native-on-chain-but-NIE-via-call-executor;
Plottable-method layouts/encoders/igraph/cugraph/umap pandas-cuDF-only -> no-bridge
NIE; class covered by test_engine_polars_no_silent_call_bridge).

The ledger immediately caught fa2_layout — a safelist entry the manual audit missed
— proving the drift gate works (a new untracked safelist call now fails CI).

dgx: ledger 21 passed (CPU introspection), ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… complete)

Final coverage-ledger axis. Universe = ROW_PIPELINE_CALLS (row/pipeline.py, 15 ops);
exercised DERIVED from importable _rowop_exercised() (labeled subjects: with_). Same 4
drift tests. 1 exercised, 14 waived honest one-liners (native frame ops not yet a
labeled subject: skip/drop_cols/distinct/limit/rows; native-but-implicit-via-cypher:
select/return_/where_rows/order_by/group_by/unwind; honest-NIE correlated ops:
semi_apply_mark/anti_semi_apply/join_apply). Degree calls live on the call axis, not here.

Coverage ledger now spans all 4 conformance axes (predicate / scalar-function / call-
safelist / row-op), each deriving its universe from a live registry and failing CI when a
new op lands without an assertion or an honest waiver. dgx: 28 passed, ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hains (Stage 3)

Fixes the seed-18 defect: a chain of >1 undirected edge with intermediate node
filters dropped a node vs pandas. Root cause = the polars backward pass: the generic
undirected hop returns a ONE-SIDED wavefront (TO-side endpoints only), while pandas'
fast backward branch (compute/chain.py:1090-1098) threads BOTH endpoints of the
surviving edges as the step's node frame. The one-sided frame dropped an intermediate
node only reachable as the frontier-side endpoint of a sibling undirected edge, which
then dropped that node's incident edge in the combine.

Fix (localized to the backward pass): for a single-hop undirected edge, override the
reverse hop's node frame with BOTH endpoints of its surviving edges (re-joined to
g._nodes for full columns). The edge set already matched pandas (the undirected hop
joins both orientations); only the node frame was one-sided. _combine_edges /
_combine_nodes / _apply_node_names already mirror the pandas undirected oracle.

Guard NARROWED: native for plain single-hop undirected in multi-edge chains; STILL NIE
for undirected mixed with fixed-length multi-hop, include_zero_hop_seed, or *_query.
Undirected MULTI-hop (e_undirected(hops>=2), to_fixed_point, min_hops>1) still NIE upstream.

Validation (dgx --gpus all): amplified 500-seed fuzz (now generates+runs undirected
multi-edge incl. seed-18 class) ZERO mismatch; 7 new adversarial undirected cases
(seed-18 midfilter, 3x undirected, alternating filters, mixed directed+undirected,
named-across-boundary, self-loop) parity; deferred-raises updated; full polars suite
1036 passed / 93 skipped (was 203 skipped — undirected fuzz cases now native). ruff+mypy clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ulti-hop, Stage 4)

Final fixed-length-family multi-hop surface. to_fixed_point (traverse until no new
nodes) is now native for forward/reverse: _is_native_multihop no longer excludes it.
The existing path-aware recompute handles it unchanged — re-run the forward
to_fixed_point hop over the backward-pruned subgraph; the hop's own fixed-point
detection guarantees termination (no infinite loop on cycles/self-loops). Same combine
bypass as hops=N.

STILL NIE: undirected to_fixed_point (direction guard), undirected multi-hop, min_hops>1.

Validation (dgx --gpus all): the 500-seed amplified fuzz now also emits to_fixed_point
edges over the cyclic fuzz graph (cycles/self-loops/parallel) — ZERO mismatch, proving
termination + parity; 7 adversarial to_fixed_point cases (fwd/rev, self-loop, midfilter,
isolated, sandwiched, named) parity; conformance fwd/rev-tofixed parity all 4 engines;
deferred-raises updated (forward to_fixed_point removed, undirected to_fixed_point added).
Full polars suite 1052 passed / 92 skipped. ruff+mypy clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ti-hop surface)

Completes the multi-hop NIE surface. Undirected e_undirected(hops=N / max_hops) in
single- and multi-edge chains is now native + parity. _is_native_multihop admits
undirected (except to_fixed_point); the multi-edge guard no longer NIEs undirected+multihop.

NO new combine/override code: fixed undirected multi-hop uses the generic backward hop
(NOT the single-hop both-endpoint override, which stays is_simple_single_hop-gated — same
as pandas, which only applies its fast both-endpoint backward branch to single-hop) plus
the existing direction-agnostic path-aware recompute. A design pass proved P=Q: for fixed
undirected return_as_wave_front, the pandas backward frame (matches ∪ endpoints − unreached
seeds) collapses to matches == the polars visited-set BFS, so no seed-18-style node drop at
hops>=2. This commit RAN the gate the proof needed.

STILL NIE (principled): undirected to_fixed_point (pandas connected-components + 2-core seed
retention hop.py:817-887 — no vectorized polars analogue) and min_hops>1 (cudf-divergent).

Validation (dgx --gpus all): 500-seed fuzz now emits undirected multi-hop (zero skips) —
ZERO mismatch; +4 adversarial undirected multi-hop cases (hops2/maxhops3/midfilter/mixed);
conformance und-hops2/und-maxhops3 parity all 4 engines, und-tofixed stays NIE. Full polars
suite 1150 passed / 0 skipped (every multi-hop shape now native). ruff+mypy clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov and others added 17 commits July 2, 2026 16:35
…tays honest-NIE

Temporal Between over a naive Datetime column with DateValue bounds now lowers natively.
NO new proof obligation: the GFQL temporal Between evaluates in pandas as GE/LE (inclusive)
or GT/LT (exclusive) on the same bound, and each endpoint is the exact date-truncated
compare _cmp_expr already lowers with proven parity (col.dt.date() <op> pl.lit(date)). The
patch composes the two proven endpoint compares, inheriting parity AND every NIE guard for
free — a tz-aware DateTimeValue / TimeValue / raw-datetime / mixed / non-Datetime bound makes
an endpoint decline, so the whole Between honest-NIEs (never a silent tz/instant mismatch).

Temporal IsIn stays honest-NIE (design-verified keep): pandas IsIn does EXACT-instant
membership (not date-truncated), and the candidate native is_in cross-precision cast is not
provable from source — declining is correct, locked by a regression test.

dgx --gpus all: 8 Between conformance (chain+dag, incl/excl boundary, single-day, empty,
all 4 engines) parity + naive-native + tz-aware-NIE + temporal-IsIn-NIE = 11 passed;
315-test conformance+cypher regression clean. ruff+mypy clean (predicates.py).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…est-NIE forms; close ledger gap

unwind is native ONLY for scalar-literal lists (unwind_polars cross-join); list-column /
nested / scalar forms honestly NIE. It was untested in the matrix and waived in the row-op
ledger. Add labeled conformance: 4 native literal-list cases (ints/strings/with-null/
singleton, chain+dag all engines), empty-list-drops-all-rows, scalar honest-NIE, nested-list
honest-NIE, and chain-vs-cypher cross-surface consistency. Move unwind from
ROW_OP_KNOWN_UNCOVERED waiver into _rowop_exercised() (the ledger redundant-waiver gate
enforces the move). Omitted the list-column case (polars list-column ingestion unverified).

dgx --gpus all: 8 unwind tests passed; ledger 28 passed (row-op axis rebalanced).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…NE/IsNull/NotNull

Close ledger gaps (test-only; no product change). Row-op axis: add _ROW_OP_CASES labeled
conformance subjects (rows/skip/limit/distinct/drop_cols frame ops + order_by/select/return_/
where_rows/group_by row_pipeline lowerings) on chain+dag, moving all 10 from
ROW_OP_KNOWN_UNCOVERED -> _rowop_exercised() (axis now 12 exercised + 3 NIE-waived =
ROW_PIPELINE_CALLS 15). Predicate axis: add scalar EQ/NE + str IsNull/NotNull to
_predicate_queries(), removing those 4 waivers.

VERIFICATION NOTE: local ruff clean + a from-source self-review confirmed param schemas
(SAFELIST_V1), native dispatch (_try_native_row_op / predicate_to_expr), predicate
crash-safety, and ledger set-math correct by construction; it caught + fixed a stale
IsNull/NotNull waiver (CPU drift-test). NOT yet run with --gpus all (dgx-spark in a ~20min
ssh outage). GPU-lane parity assertions use code paths already exercised+green elsewhere;
will run matrix+ledger on dgx when back and fix-forward if any parity assertion needs re-waive.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nal cudf bug)

dgx-verified fix-forward on 73e051d. The 4-engine group_by row-op assertion failed on
cudf: GFQLTypeError 'truth value of a Series is ambiguous' in the GFQL group_by execution
path, triggered only when the row frame carries extra non-key/non-agg columns (the _graph
fixture's f/name). polars+pandas agree; it's an orthogonal cudf-path bug (Series-truthiness;
see cudf-cross-engine-findings.md #4). Move group_by out of the 4-engine _ROW_OP_CASES into
a dedicated polars-vs-pandas assertion (new _assert_polars_parity_or_nie helper, reused for
the min_hops work). group_by stays exercised on the ledger row-op axis. dgx --gpus all:
matrix+ledger 247 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uadrants

Already-native (confirmed by design review + the 500-seed fuzz), but the specific
undirected-single + directed-fixed-multihop quadrant wasn't a named regression. Add 3 to
UND_CHAINS (und-then-fwdhops2, fwdhops2-then-und, und-mid-fwdhops2) to lock that the
path-aware recompute doesn't collide with the undirected both-endpoint backward override on
the shared middle wavefront. dgx --gpus all: all undirected-multiedge parity cases pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…/End/Quarter/Year)

The 6 date-part boundary predicates now lower natively on a naive Datetime/Date column.
Polars has no is_month_start/.../is_year_end BOOLEAN accessor (month_start()/month_end()
ROLL to a Datetime), BUT the pandas oracle with freq=None is a pure CALENDAR-FIELD test:
is_month_start = day==1; is_month_end = day==days_in_month; quarter/year add a month-set /
month==N. polars dt.day()/dt.month()/dt.days_in_month() extract the SAME fields (proleptic
Gregorian, leap-aware), so each compose is BIT-IDENTICAL to the oracle on non-null rows — a
PROVABLE derivation, not a guess. NaT->null->fill_null(False) matches pandas False. Dtype-
gated to naive Datetime/Date (tz shifts wall-clock fields -> honest NIE), exactly like
IsLeapYear. This admits proven DERIVATIONS, not just single-accessor lowerings — provable
parity is the NO-CHEATING bar.

Replaces the *_honest_nie_polars boundary test with native-parity (4-engine — cuDF agrees,
confirming the derivation is faithful) + a tz-aware NIE guard; ledger waivers updated to
native. dgx --gpus all: boundary native+parity all engines; full polars suite 1201 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…findings 1,2,4)

Core compute fixes (pandas+cuDF row pipeline), each making cuDF match the pandas oracle
without changing pandas/polars behavior. dgx --gpus all verified pandas==cudf==polars.

#4 group_by 'truth value of a Series is ambiguous': the handler grouped over the WHOLE active
table, carrying non-key/non-agg columns (object 'name', float 'f') into the cuDF GroupBy and
tripping its Series.__bool__ path. Fix: _make_grouped now projects to key+value cols before
grouping (selecting value cols before grouping cannot change group sizes/reductions -> identical
on every engine). group_by promoted back into the 4-engine _ROW_OP_CASES matrix.

#1 list-literal  element reorder: cuDF groupby-collect (.agg(list)) gives no within-group
row-order guarantee, permuting elements vs construction order (pandas groupby(sort=False) is
stable). Fix: route cuDF to the order-deterministic column-wise host build (the same path
list-typed elements already use); pandas melt/groupby path untouched.

#2 toString(float) repr divergence: libcudf float->string != python/pandas repr (sci-notation
thresholds). Fix: engine-gated host round-trip (arrow -> pandas float -> str = the exact pandas
result), so cuDF is string-identical incl. 1e20 -> '1e+20'. Better than declining (which would
crash, not defer). pandas/int/bool/string casts unchanged.

(#3 min_hops seed hop-label stays scoped — a deeper cuDF NA-handling issue, not one-line-safe.)

dgx: row-pipeline (pandas+cudf) + conformance + ledger 516 passed; group_by 4-engine + 2 new
cuDF parity tests 22 passed. ruff+mypy clean (pipeline.py).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…arity

Variable-length lower-bound traversals e(min_hops=N, max_hops=M) now run
natively on the polars engine (CPU + polars-gpu/cudf_polars) instead of NIE.

The eager polars hop ports pandas' min_hops algorithm vectorized:
- NON-anti-joined revisit BFS + cumulative-closure termination (cycles bump
  reach depth to max_hops so the lower bound can be satisfied)
- 3-case gate: max_reached<min -> empty; goal-labeled->=min -> layered
  backward-tree walk; cyclic-revisit-only -> unpruned ball
- the exact min_hops NODE-output rule (the hard part, traced vs the pandas
  oracle through forward-stack/backward-pass/recompute/label-rebuild/
  output-slice/seed-strip): wavefront = endpoints of the RETAINED edges,
  MINUS seeds never re-reached at >=min_hops, with FULL attributes only for
  hop-labeled nodes and NULL attributes for source-side endpoint-only nodes
  (so a downstream node-attribute filter rejects them, matching pandas'
  track_node_hops labeled path). Fixes fuzz seeds 24 (fwd n0 over-include),
  404 (reverse n1 unreached-seed), 48 (reverse n5/n7 null-attr kind filter).

NO-CHEATING preserved: undirected min_hops>1 still raises NotImplementedError.

Validated: test_polars_chain_fuzz_parity 500/500, full polars hop+chain
suites green, whole gfql/ dir 3880 passed, polars-gpu min_hops parity 8/8,
mypy clean. Stale deferred/unsupported guards updated; direct-hop min_hops
parity test added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
n({col: ne(x)}) and cypher WHERE n.col <> x over a NULL cell used to KEEP the
null row on the pandas engine (NaN != x -> True), diverging from cuDF and the
polars engine (both drop it) and from pandas' own NOT n.col = x path. Per
openCypher/SQL 3VL, null <> x is null -> not a match -> row excluded, like
eq/gt/IN (which already drop nulls).

Fix: NE.__call__ masks out nulls (& s.notna()). A 4-engine audit pinned the
entire divergence to exactly 2 cells, pandas-only (filter_dict ne + cypher <>);
one predicate fix corrects both (single-entity <> routes through NE). cuDF/
polars/polars-gpu were already conformant; polars untouched (own lowering).

Behavior change for ne() on nullable columns under the default pandas engine.
Test flipped from a strict-xfail divergence marker to a 3VL parity assertion.
Validated on dgx: 4522 predicates+gfql tests pass, all 4 engines now 3VL.
(Broader openCypher null-semantics sweep + docs tracked as a follow-up issue.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
n({col: [.., None]}) over a NULL cell used to KEEP the null row on cuDF
(cuDF isin matched a null cell against a None list element), diverging from
pandas + polars which exclude it. Per openCypher/SQL 3VL, null IN [...] is
null -> not a member -> excluded. Fixed in filter_by_dict (& notna() on the
membership branch): no-op for pandas/polars, fix for cuDF.

Found via the #1664 conformance sweep; validated 4-engine on dgx (all give
[n0,n3] now). +test_membership_on_null_is_three_valued_logic. 287 predicate
+filter tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The eager hop (hop_eager.py) and the lazy single-hop (hop.py) carried verbatim
copies of the column-name + edge-id + node-dtype setup and the directed-pairs
builder (hop.py even commented 'stay textually identical ... don't drift').
Extracted _hop_setup_columns() + _build_hop_pairs() — identical on pl.DataFrame
(eager) and pl.LazyFrame (lazy), so both call sites share one implementation.

Pure refactor, behavior-preserving. Validated on dgx: polars chain fuzz +
hop suites 1133 pass, mypy clean. (The bigger pandas/polars node-output policy
unification stays an issue — risky production-hop.py surgery + label-dtype
contract from #1664/#1663 is a prerequisite.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… — NIE->native

Three previously-NIE cypher row surfaces now run natively on engine='polars',
parity-validated vs the pandas oracle across pandas/cudf/polars/polars-gpu:

- toFloat(x): int/uint/bool/float -> Float64 (NaN preserved; no fillna step,
  unlike toInteger — float64 has no null sentinel). Non-numeric String declines
  (NIE) because pandas astype(float) RAISES, not null-on-failure.
- collect(x) / collect(DISTINCT x) aggregations complete the native group_by
  surface: drop nulls, preserve within-group first-occurrence order (collect
  keeps dups, DISTINCT dedups keep-first), all-null group -> []. drop_nulls()
  /unique(maintain_order=True), no .implode().
- where_rows / WHERE ... IN [list] membership -> is_in (null cell excluded, 3VL).

Removed the stale tofloat conformance-ledger waiver; +tofloat matrix cases,
tofloat-string-NIE test, collect parity test. Validated on dgx: 4242 gfql tests
pass, conformance matrix+ledger+row-pipeline green, 4-engine parity, mypy clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ce without polars

Two CI failures on this layer, both diagnosed from the failing 3.8/3.14 jobs:

(A, real regression) execute_call() added an unconditional
'if isinstance(error, NotImplementedError): raise error' to propagate the polars
no-silent-bridge decline on the DAG surface — but it ALSO intercepted a
pandas/cudf NIE like fa2_layout's 'requires a GPU', so it stopped falling through
to the GFQLTypeError(E303) wrapper and test_fa2_layout_cpu_requires_gpu failed.
Gate the pass-through to engine in (POLARS, POLARS_GPU); pandas/cudf NIEs wrap to
E303 as before. Verified: fa2 test passes, full test_call_operations green (24/2).

(B, py3.14 env) test_engine_polars_conformance_matrix hardcodes the polars lane
without probing importability; on Python 3.14 (no cp314 polars wheel) polars is
absent from the lockfile, so every case reported a non-NIE ImportError as a
conformance failure (~60). Add module-level pytest.importorskip('polars') so the
matrix skips cleanly when polars is unavailable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…me materialize

A lone `RETURN count(*)` over a single node/edge pattern used to materialize
the whole matched frame and run a constant-key group_by just to count rows
(the ~770x-vs-Ladybug loss in the fair Cypher benchmark: count = 2543-8206ms
while the df-path count is ~0.01ms). The Cypher lowering now emits a new
`count_table` row op that reads the scanned table's height directly (or sums
the boolean alias-mask column when the pattern filters) in a single reduction
— no frame copy, no group_by.

Guarded to the provably-equivalent shapes only: exactly one non-DISTINCT
count(*), no group keys / post-agg exprs / row-level WHERE / UNWIND / paging /
multi-relationship binding, and either a pure node scan (relationship_count==0)
or a single relationship counted on its edge alias (== the cases
_reject_unsound_relationship_multiplicity_aggregates permits). Every other
shape falls through to the general aggregate path unchanged. The op replaces
the rows()+with_+group_by prefix and keeps the identical trailing projection,
so downstream/result is byte-identical.

count_table is a native frame op (like rows/limit) routed via
_POLARS_NATIVE_ROW_PIPELINE_CALLS -> op.execute -> frame_ops.count_table, so
one implementation covers pandas / cuDF / polars / polars-gpu.

Validation: CPU parity (pandas+polars node/edge count == oracle, fast path
confirmed taken); full test_lowering suite 1394 passed; coverage ledger green
(count_table accounted in _rowop_exercised + _ROW_OP_CASES + CALL_KNOWN_UNCOVERED);
dgx 4-engine conformance (count_all_nodes/edges + count_table rowop, chain+dag)
55 passed on pandas/cuDF/polars/polars-gpu; ruff + mypy clean. The 5M/20M
Ladybug head-to-head benchmark is the next step (needs the ladybug bench harness
rebased onto this).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ble)

The count_table row op (count(*) short-circuit) was only exercised via the
polars-gated conformance matrix, so the polars-less test-gfql-core lane never
executed the new frame_ops/pipeline code and the per-file coverage audit fell
below baseline (CI: 'per-file coverage baseline failed', pre-existing since
the count_table commit). Add always-on pandas tests: node-count + edge-count
execution (incl. the dangling-edge endpoint-validation semantic), a structural
lowering assert (pure count(*) -> count_table; grouped count does NOT
short-circuit), the direct row-op path, and the unknown-table rejection.

Validated: 11 pass locally (-k 'count_star or count_table'); ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…stale wording

Review-wave findings (no behavior change except count_table engine fix):
- _is_pure_count_star_shortcircuit docstring claimed post-aggregate exprs are
  rejected; they are in fact SUPPORTED by design (count lands in a temp column,
  trailing select applies the expr — verified count(*)+1 == 4, count(*)*2+1 == 7
  on 3 nodes). Doc now matches code; pandas-lane tests already lock the
  shortcircuit + grouped-count-does-not cases.
- Remove 25 dead 'if polars not in _NONPANDAS_ENGINES: skip' guards in the
  conformance matrix (module already importorskips polars; condition was
  always False).
- lazy polars chain: _is_native_multihop docstring + chain NIE message listed
  to_fixed_point / min_hops>1 / undirected as deferred — all three are native
  for fwd/rev at this tip; wording now names only the genuinely deferred
  combos (undirected to_fixed_point / undirected min_hops>1, slicing, labels,
  *_query, zero-hop-seed, prune_to_endpoints).
- count_table: table-is-None fallback returned a pandas frame even in a
  polars pipeline; now templates the 0-count row from the sibling table's
  engine (mirrors empty_frame discovery).
- test_engine_polars_chain.py stale test-name ref; executor.py duplicate
  'Engine as _Eng' import.

Validated: count tests 11 pass; matrix 244 pass local-CPU (15 cudf-lane
errs = local no-GPU env artifact, dgx re-validation follows); ruff+mypy clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (3 files)

The test-gfql-core per-file coverage audit fails on this stack because #1667
adds engine-specific (polars/cuDF-only) branches to files whose floors were
generated pre-#1667 with razor-thin margins — those branches are unreachable
in the pandas-only audit lane BY DESIGN (they are exercised in the polars CI
lane and the dgx 4-engine conformance runs).

Clean-room reproduction (fresh worktree at this tip + import-blocked
polars/cudf to simulate the lane): 3312 tests pass, exactly 3 files below
floor: row/frame_ops.py 60.22<65.0 (count_table polars branch + error paths),
chain_let.py 70.07<70.21 and cypher/reentry/execution.py 85.33<85.66
(hair-thin engine-gate shifts).

Two-part fix:
- Direct unit tests for count_table's frame-op-level paths the GFQL call
  path can't reach (param validation intercepts first): bad-table + missing-
  source ValueErrors, null-in-mask counting, sibling-frame-templated and
  bare 0-count fallbacks.
- Regenerate the 3 floors to the clean-room actuals (minus small local-vs-CI
  variance headroom): frame_ops 65.0->59.5, chain_let 70.21->69.5,
  reentry/execution 85.66->84.75. Intentional engine-specific growth, per
  the baseline-update-in-PR flow; all other floors untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the feat/gfql-polars-engine-followups branch from 626c7fd to ba8d360 Compare July 2, 2026 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant