docs(gfql): engine-selection guide (pandas/polars/cuDF/polars-gpu) + motivating comparison by lmeyerov · Pull Request #1661 · graphistry/pygraphistry

lmeyerov · 2026-06-29T04:36:03Z

Stacks on #1658 (spine: #1666 → #1660 [engine+GPU target] → #1667 [followups] → #1658 [index] → this PR). Docs-only — no code change.

What

A persona-tested Choosing a GFQL Engine page documenting the four interchangeable engines (pandas / polars / cudf / polars-gpu), which until now were undocumented (grep confirmed zero doc mentions of polars).

docs/source/gfql/engines.rst (new) — numbers-first:
- the one-keyword engine='polars' speedup (up to ~38× over pandas on real graphs, no GPU)
- a motivating warm-median comparison table on real public graphs (LiveJournal 35M / Orkut 117M)
- a decision matrix (workload shape × size × hardware → engine) with footnotes: ~1M crossover, GPU work-bound rule, polars-gpu 85M-row memory pressure, GPU-or-error contract
- cuDF vs polars-gpu disambiguation (eager-op vs fused-lazy; cuDF is not deprecated)
- honest "when NOT to use Polars", the differential-parity guarantee, and methodology + reproducer scripts
performance.rst — rewrote the top to lead with the engine comparison; de-marketed the prose flagged by the skeptic persona ("Unleashing", "Graph 500 levels", NVIDIA name-drop)
nav — wired the page into the GFQL toctree + recommended paths (added a CPU performance path)
quick.rst / about.rst — added polars/polars-gpu to the engine examples (previously pandas/cuDF only)

How it was scoped

Driven by 4-persona doc user-testing (pandas data scientist, RAPIDS/cuDF user, performance engineer, skeptical evaluator). Each persona read the current docs cold; the union of their must-haves is the acceptance bar. A round-2 user-test against the rendered docs follows.

Numbers trace to guarded benchmark runs (benchmarks/gfql/index_bulk_olap_bench.py); no figures invented.

🤖 Generated with Claude Code

Review notes

Team-polish pass: the illustrative opener snippet is doc-test: skip (was executing in polars-less CI lanes — this branch's pre-existing red); crossover guidance aligned to the measured ~10K-edge figure (was a stale ~1M claim contradicting engines.rst); public phrasing de-jargoned ('no silent fallback — parity-verified'); CHANGELOG entry aligned to the page's numbers.

The engine-selection guide (#1661) documented all four engines + a decision matrix but the CSR adjacency index — the strongest competitive claim and the exact answer to 'Neo4j has an index, does GFQL?' — was only a footnote. Adds a full guide: create_index/gfql_index_all/show_indexes/drop_index, index_policy (use/auto/force/off), gfql_explain, Cypher DDL + wire protocol, and the sourced numbers (flat-in-N 0.12ms @8M-117M edges; 9-28x vs Kuzu/Neo4j on selective lookups; CPU-wins-seeded vs GPU floor). Honest build-cost + parity-or-fallback section. Wires into the toctree + a seeded-lookup recommended path; shrinks the engines.rst F5 footnote to a cross-link. Persona-driven (round-1 user-testing: Priya/Neo4j-migrant + Maya's slow seeded lookup). Numbers already measured (benchmarks/gfql/index_*bench.py, dgx-spark). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…motivating comparison New persona-tested "Choosing a GFQL Engine" page (gfql/engines.rst): the four interchangeable engines, the one-keyword engine='polars' speedup (11-47x over pandas on real graphs, no GPU), a motivating warm-median comparison table on real public graphs (LiveJournal 35M / Orkut 117M), a decision matrix (shape x size x hardware -> engine) with crossover/work-bound/memory-pressure/GPU-or-error footnotes, cuDF-vs-polars-gpu disambiguation (eager vs fused-lazy; cuDF not deprecated), an honest "when NOT to use Polars", the differential-parity guarantee, and methodology + reproducer scripts. Also: rewrote the top of gfql/performance.rst to lead with the engine comparison (de-marketed the prose flagged by the skeptic persona), wired the page into the GFQL toctree + recommended paths, and added polars/polars-gpu to the engine examples in quick.rst and about.rst (docs previously mentioned only pandas/cuDF). Driven by 4-persona doc user-testing (pandas data scientist, RAPIDS/cuDF user, performance engineer, skeptical evaluator). Docs-only; no code change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…numbers, output-frame note Applies fixes from a second persona user-testing pass on the rendered docs: - performance.rst: removed the surviving marketing tail (the skeptic persona's #1 residual) — "A New Era", "Empower Your Data Journey", "Join the Community", and the NVIDIA-investment-implies-performance line — replaced with a tight, de-superlatived "How GFQL is fast" (the real mechanisms) + a focused Next Steps. - engines.rst: added the cuDF-WINS row to the comparison table (2-hop/100K seeds, ~85M output rows: cuDF 6.0s) so cuDF winning is visible without reading footnotes (RAPIDS persona); added a prominent note that result frames match the engine (polars-gpu/polars return polars.DataFrame; .to_pandas() to convert) — the pandas+RAPIDS personas' top practical gotcha; fixed the LDBC sf1 figure attribution (it is from a separate benchmark, not the cited Orkut/LiveJournal source-of-truth) to keep every on-page number traceable; added run counts + unified-memory note to Methodology (perf-engineer persona). Docs-only; no code change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…) + fixes Ran the repo's documented user-testing protocol (test-amplification SKILL §0 "User-Workflow Exploration") clean-room — two passes (need-finding vs original docs, then QA on the produced docs) — and applied the deltas it surfaced that the earlier ad-hoc persona pass missed: Completeness (Pass A): finished the engine enumerations the ad-hoc pass deferred — overview.rst now names all four engines + auto's resolution rule + the opt-in/no-silent- fallback contract (was "GFQL automatically executes on GPU", which implied silent selection); notebooks/gpu.rst now points GPU readers to the engines page. Accuracy/QA (Pass B): reconciled the recurring "11-47x" headline to what the on-page table supports (-> "up to ~38x", Orkut 1-hop, traceable) across 9 sites; fixed cuDF "6-18x" -> "~15x (Orkut 1-hop)"; corrected a wrong "polars (CPU) is GPU-or-error" claim (only polars-gpu is — CPU polars raises NotImplementedError); dropped the deprecated `chain` from the engines.rst entrypoint line (gfql/hop only); scoped the ~87x kuzu claim to LiveJournal + named its reproducer; stopped the CSR-index footnote from over-promising an API page that doesn't document it yet; cited the orphaned [F4] footnote. Docs-only; no code change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… silent-coercion warning A §0 user-testing pass on polars-centric personas found a real P0 gap: nothing in the docs spoke to a user who is ALREADY on Polars, and the silent default-path downgrade was never warned. A graph built from polars.DataFrame run with the default engine='auto' is coerced to pandas (auto -> cudf for cuDF input, pandas for everything else incl. Polars; it never selects the Polars engine), so result._nodes comes back pandas and downstream pl.* breaks at runtime. Fixes: - engines.rst: a `.. warning::` "Already a Polars user? pass engine='polars' — the default does not" with a pl.DataFrame in -> engine='polars' -> pl.DataFrame out worked example; co-located the "catch" (crossover + NotImplementedError) under the one-liner. - overview.rst: spelled out that auto coerces a Polars-frame graph to pandas unless you pass engine='polars'. - Added Polars to the accepted-input lists in engines.rst / overview.rst / about.rst (was "pandas, cuDF" only). Artifact: plans/gfql-engine-docs/rounds/round-003/user_testing_playbook.md. Docs-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… + reproducer Leo pushed: polars beats pandas even below 1M? Correct. The "pandas wins below ~1M" claim was stale (a coarse early finding) and contradicted the fast-path work. Fresh CPU bench (benchmarks/gfql/index_crossover_bench.py, LiveJournal subsampled, warm-median, current stack): shape 10K 100K 1M 1-hop hop polars2.7x polars4.5x polars7.6x WHERE+ORDER polars3.0x polars3.0x polars18x trivial filter polars1.5x pandas2.0x pandas1.6x (sub-ms; immaterial) So CPU polars wins the common graph-query shapes (traversal / WHERE / aggregation) from ~10K edges up; the only pandas win is a trivial sub-millisecond equality mask where the absolute difference is immaterial. The real small-size floor is GPU-only (cuDF/polars-gpu kernel launch, work-bound) — NOT extended to GPU here (this bench is CPU-only; polars-gpu stays the rougher, conditional case via F2/F3/F4). Corrected: F1 (crossover ~10K not ~1M), the decision matrix (size col >~1M -> >~10K; the "<1M -> pandas" row -> "trivial sub-ms op -> pandas, immaterial"), the "When not to use Polars" first bullet, and the motivating-table note. Also reframed "Why opt-in?" so the rationale rests on the NIE-surface robustness (auto-polars could error where pandas works), not a perf regression — consistent with keeping auto on pandas. Docs-only + one CPU bench reproducer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ll 4 engines) The CSR index works on all four engines; benchmarked seeded 1-hop on LiveJournal 35M (guarded, index==scan): pandas ~0.13ms / polars ~0.16ms (numpy searchsorted) vs cuDF ~3ms (GPU kernel-launch floor) — the clean inverse of bulk. Pick the index for selective traversal + a CPU engine to drive it. Reproducer benchmarks/gfql/index_largegraph_bench.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ome from) §0 newcomer user-test: a first-timer landing on engines.rst cold hit the headline `g.gfql(query, engine='polars')` with no `g`/`query` defined (NameError on copy-paste); construction was buried (inside the coercion warning + the bottom install block) and there was no early pointer to getting-started. Add (1) an early "New to GFQL? build a graph first -> :doc:`about`" note, and (2) a 2-line self-contained preamble (graphistry.edges + a query) so the first example runs as-is. Reuses content already on the page; no restructure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The engine-selection guide (#1661) documented all four engines + a decision matrix but the CSR adjacency index — the strongest competitive claim and the exact answer to 'Neo4j has an index, does GFQL?' — was only a footnote. Adds a full guide: create_index/gfql_index_all/show_indexes/drop_index, index_policy (use/auto/force/off), gfql_explain, Cypher DDL + wire protocol, and the sourced numbers (flat-in-N 0.12ms @8M-117M edges; 9-28x vs Kuzu/Neo4j on selective lookups; CPU-wins-seeded vs GPU floor). Honest build-cost + parity-or-fallback section. Wires into the toctree + a seeded-lookup recommended path; shrinks the engines.rst F5 footnote to a cross-link. Persona-driven (round-1 user-testing: Priya/Neo4j-migrant + Maya's slow seeded lookup). Numbers already measured (benchmarks/gfql/index_*bench.py, dgx-spark). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…s-links The end-to-end benchmark showed CPU-vs-GPU speedups with no statement that they return the SAME answer (skeptic persona P0-4) and no path to the broader engine story. Adds: 'same answer on every engine' parity note (release-gate: parity or NotImplementedError), a 'this is one workload vs one baseline' framing pointing to the 4-engine guide (engines) + the seeded-index guide (index_adjacency), and those two in the see-also list. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ok (persona P1) Two persona-driven additions to the engine guide: - 'GFQL vs external graph tools': honest positioning table (Neo4j/Kuzu/igraph/ networkx) with every number conditioned + '>'/did-not-finish/not-benchmarked markers kept, and the cyclic-join caveat we do NOT claim. Serves the skeptic (Sam) + Neo4j-migrant (Priya) personas. - 'Switching engines' cookbook: the one-keyword switch, .to_pandas() round-trip for pandas-only downstream code, mixing build-frame vs run-engine, and the auto-never-picks-polars note. Consolidates scattered one-liners (Maya/Tom). Uses only already-measured numbers. RST validated (docutils clean bar Sphinx :doc: roles). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ifecycle (round-2 personas E/F) Round-2 user-testing closed all round-1 P0s and fully served personas A-D, but E (Databricks GraphFrames/Spark) and F (Snowflake/Databricks + PuppyGraph) FAILED — their tools + decision axes were absent, and GraphFrames' motif queries mapped onto the one case we disclaim with no 'it runs' reassurance. - Add GraphFrames + PuppyGraph rows to the vs-external-tools table (qualitative, 'not benchmarked yet' markers): single-node-vs-cluster (100M+ on one machine; cluster only above the single-node ceiling) and warehouse-in-place-vs-pull- subgraph (GFQL adds PageRank/centrality PuppyGraph lacks; complement). Note motif/multi-way-join queries RUN but aren't yet perf-tuned. - Benchmark page: label the headline table as PIPELINE time and note the per-graph sections are full-lifecycle (incl ETL) — kills the 3.33s-vs-7.1s apparent contradiction a skeptic hits. Numbers unchanged (positioning is qualitative; head-to-heads are the later stacked benchmark PR). RST validated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

….rst doctests New section documents the two opt-in streaming modes with honest scope: - GFQL_POLARS_CPU_STREAMING=1 -> Polars streaming engine, disk-spill (CPU) - GFQL_POLARS_GPU_EXECUTOR=streaming -> cudf-polars streaming executor (GPU) Covers when to use (oversized intermediates), the opt-in trade-off (~0.86x small-size regress; parity-identical), a set-before-import example, and an explicit limits note: streaming covers the QUERY collect, but input still materializes at ingestion (a passed LazyFrame is collected), so out-of-core INPUT (lazy scan_parquet end-to-end) is work-in-progress — cross-links the Friendster discussion. Also marks the two illustrative one-keyword snippets (placeholder df/query) '.. doc-test: skip', clearing engines.rst's test-docs failures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… unsupported Cypher The quick-start ran a Cypher string 'MATCH (a)-[e]->(b) WHERE a.id IN $seeds RETURN a, e, b' that hits a known limitation (#1273: row lowering supports one MATCH source alias at a time), so it raised in test-docs and would mislead any reader who copy-pasted it. Replace with the canonical native seeded-traversal chain — [n({id: is_in(seeds)}), e_forward(), n()] — which is what this index page is actually about, uses the index automatically, and runs green. Also defines the previously-undefined my_seed_ids. Full doc-examples suite now passes locally. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The performance.rst opener references an undefined g/query and engine='polars', so the doc-example runner executed it and failed in every polars-less lane (test-docs + test-minimal-python, pre-existing red on this branch). It is shown for reading, not execution — mark '.. doc-test: skip', same treatment as the benchmark_graphframes snippets (911f4e3). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… phrasing - performance.rst said 'below ~1M edges pandas often wins', contradicting engines.rst's measured ~10K polars crossover one click away — aligned to the measured guidance. - engines.rst referenced :doc:`benchmark_graphframes`, a page that only lands in the stacked benchmarks PR (#1668) — Sphinx unknown-doc warning if this PR ships alone. Reworded; #1668 restores the live link. - 'NO-CHEATING' is internal methodology jargon — public page now says 'No silent fallback — parity-verified' (same guarantee, reader-facing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The entry said '11-47x' and 'the ~1M crossover' while the page it describes says 'up to ~38x' and a measured ~10K-edge CPU crossover — stale from an earlier draft of the docs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test-docs' pdflatex pass rejects Unicode ≈ (U+2248) in the PDF build (./PyGraphistry.tex:6457: LaTeX Error) — a failure previously masked by the doc-example failure ahead of it in the same job. Identical change applied on both docs-tail branches so each CI tree builds (identical both-side changes merge cleanly). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The engine guide said the streaming flags are 'read at import time' — no longer true: they're read live and settable from Python (set_cpu_streaming / set_gpu_executor + the public GPU_EXECUTORS options, added on the polars-engine PR). Document the Python API alongside the existing env vars. doc-test:skip (the API lands with the polars PR this docs PR stacks on). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… reconcile Document PHASE 12 call_mode (auto/strict) in engines.rst: a new 'Analytics under Polars' subsection (umap/hypergraph/compute_cugraph run off-engine by default, coerce back to polars, warn once; polars-gpu bridges to cuDF GPU-or-error; strict declines), and reconcile the 'Parity and honesty' section — traversal/row ops stay parity-or-NIE (never bridge) while whole-graph analytics are the one mode-gated, warned exception. (P13.6 executor-mode knobs were already documented in the streaming section.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Document why there is no per-call size cap on the off-engine bridge: the transient copy is the same allocation as running the analytic on engine='cudf' directly, a row count is a poor memory proxy, and the real cap belongs at the RMM/container/deployment layer. Point memory-conscious users at call_mode='strict' or RMM/container limits. (G5 decision: reject a row-cap knob as the wrong mechanism — see plan PHASE 13.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lmeyerov force-pushed the dev/gfql-seeded-traversal-index branch from ca5dfab to b65ca7f Compare June 29, 2026 23:08

lmeyerov force-pushed the docs/gfql-engine-docs branch from 02e5834 to 0683690 Compare June 29, 2026 23:08

lmeyerov force-pushed the dev/gfql-seeded-traversal-index branch from b65ca7f to 1257dac Compare July 1, 2026 01:20

lmeyerov force-pushed the docs/gfql-engine-docs branch from cea4664 to c7f3af4 Compare July 1, 2026 01:20

lmeyerov changed the base branch from dev/gfql-seeded-traversal-index to feat/gfql-polars-engine-followups July 1, 2026 01:20

lmeyerov mentioned this pull request Jul 1, 2026

docs(gfql): benchmarks vs Spark GraphFrames + LadybugDB (single-node, honest) #1668

Open

lmeyerov force-pushed the docs/gfql-engine-docs branch from 422731f to 1e31fc5 Compare July 1, 2026 09:56

lmeyerov force-pushed the docs/gfql-engine-docs branch from 1e31fc5 to 13085fb Compare July 2, 2026 05:29

lmeyerov force-pushed the feat/gfql-polars-engine-followups branch from 73fa242 to 1e0d542 Compare July 2, 2026 06:50

lmeyerov force-pushed the docs/gfql-engine-docs branch from 13085fb to 12476c6 Compare July 2, 2026 06:50

lmeyerov force-pushed the feat/gfql-polars-engine-followups branch from 1e0d542 to bfdfc65 Compare July 2, 2026 16:18

lmeyerov changed the base branch from feat/gfql-polars-engine-followups to dev/gfql-seeded-traversal-index July 2, 2026 16:18

lmeyerov force-pushed the docs/gfql-engine-docs branch from 12476c6 to 0aece20 Compare July 2, 2026 16:19

lmeyerov force-pushed the dev/gfql-seeded-traversal-index branch from 399a5a6 to 6a01e2a Compare July 2, 2026 16:34

lmeyerov force-pushed the docs/gfql-engine-docs branch from 0aece20 to 373463e Compare July 2, 2026 16:34

lmeyerov force-pushed the dev/gfql-seeded-traversal-index branch from dc4f9da to f9733bf Compare July 2, 2026 17:48

lmeyerov force-pushed the docs/gfql-engine-docs branch from 6128a8b to 1075938 Compare July 2, 2026 17:48

lmeyerov force-pushed the dev/gfql-seeded-traversal-index branch from f9733bf to a71cb31 Compare July 2, 2026 23:40

lmeyerov force-pushed the docs/gfql-engine-docs branch from 1075938 to 8cb04d7 Compare July 2, 2026 23:40

lmeyerov force-pushed the dev/gfql-seeded-traversal-index branch from a71cb31 to f19041c Compare July 4, 2026 17:16

lmeyerov force-pushed the docs/gfql-engine-docs branch from c6a034d to e96e27d Compare July 4, 2026 17:16

lmeyerov force-pushed the dev/gfql-seeded-traversal-index branch from f19041c to 1797940 Compare July 4, 2026 20:24

lmeyerov force-pushed the docs/gfql-engine-docs branch from 96d64be to 784a5e4 Compare July 4, 2026 20:24

lmeyerov force-pushed the dev/gfql-seeded-traversal-index branch from 1797940 to 747383c Compare July 4, 2026 22:23

lmeyerov force-pushed the docs/gfql-engine-docs branch from 784a5e4 to b9b5d6e Compare July 4, 2026 22:23

lmeyerov and others added 20 commits July 4, 2026 15:57

lmeyerov force-pushed the dev/gfql-seeded-traversal-index branch from 747383c to d45d782 Compare July 4, 2026 22:57

lmeyerov force-pushed the docs/gfql-engine-docs branch from b9b5d6e to 588853f Compare July 4, 2026 22:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(gfql): engine-selection guide (pandas/polars/cuDF/polars-gpu) + motivating comparison#1661

docs(gfql): engine-selection guide (pandas/polars/cuDF/polars-gpu) + motivating comparison#1661
lmeyerov wants to merge 20 commits into
dev/gfql-seeded-traversal-indexfrom
docs/gfql-engine-docs

lmeyerov commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lmeyerov commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How it was scoped

Review notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lmeyerov commented Jun 29, 2026 •

edited

Loading