docs(gfql): engine-selection guide (pandas/polars/cuDF/polars-gpu) + motivating comparison#1661
Open
lmeyerov wants to merge 20 commits into
Open
docs(gfql): engine-selection guide (pandas/polars/cuDF/polars-gpu) + motivating comparison#1661lmeyerov wants to merge 20 commits into
lmeyerov wants to merge 20 commits into
Conversation
ca5dfab to
b65ca7f
Compare
02e5834 to
0683690
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 1, 2026
The engine-selection guide (#1661) documented all four engines + a decision matrix but the CSR adjacency index — the strongest competitive claim and the exact answer to 'Neo4j has an index, does GFQL?' — was only a footnote. Adds a full guide: create_index/gfql_index_all/show_indexes/drop_index, index_policy (use/auto/force/off), gfql_explain, Cypher DDL + wire protocol, and the sourced numbers (flat-in-N 0.12ms @8M-117M edges; 9-28x vs Kuzu/Neo4j on selective lookups; CPU-wins-seeded vs GPU floor). Honest build-cost + parity-or-fallback section. Wires into the toctree + a seeded-lookup recommended path; shrinks the engines.rst F5 footnote to a cross-link. Persona-driven (round-1 user-testing: Priya/Neo4j-migrant + Maya's slow seeded lookup). Numbers already measured (benchmarks/gfql/index_*bench.py, dgx-spark). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
b65ca7f to
1257dac
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 1, 2026
The engine-selection guide (#1661) documented all four engines + a decision matrix but the CSR adjacency index — the strongest competitive claim and the exact answer to 'Neo4j has an index, does GFQL?' — was only a footnote. Adds a full guide: create_index/gfql_index_all/show_indexes/drop_index, index_policy (use/auto/force/off), gfql_explain, Cypher DDL + wire protocol, and the sourced numbers (flat-in-N 0.12ms @8M-117M edges; 9-28x vs Kuzu/Neo4j on selective lookups; CPU-wins-seeded vs GPU floor). Honest build-cost + parity-or-fallback section. Wires into the toctree + a seeded-lookup recommended path; shrinks the engines.rst F5 footnote to a cross-link. Persona-driven (round-1 user-testing: Priya/Neo4j-migrant + Maya's slow seeded lookup). Numbers already measured (benchmarks/gfql/index_*bench.py, dgx-spark). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cea4664 to
c7f3af4
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 1, 2026
The engine-selection guide (#1661) documented all four engines + a decision matrix but the CSR adjacency index — the strongest competitive claim and the exact answer to 'Neo4j has an index, does GFQL?' — was only a footnote. Adds a full guide: create_index/gfql_index_all/show_indexes/drop_index, index_policy (use/auto/force/off), gfql_explain, Cypher DDL + wire protocol, and the sourced numbers (flat-in-N 0.12ms @8M-117M edges; 9-28x vs Kuzu/Neo4j on selective lookups; CPU-wins-seeded vs GPU floor). Honest build-cost + parity-or-fallback section. Wires into the toctree + a seeded-lookup recommended path; shrinks the engines.rst F5 footnote to a cross-link. Persona-driven (round-1 user-testing: Priya/Neo4j-migrant + Maya's slow seeded lookup). Numbers already measured (benchmarks/gfql/index_*bench.py, dgx-spark). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
422731f to
1e31fc5
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
The engine-selection guide (#1661) documented all four engines + a decision matrix but the CSR adjacency index — the strongest competitive claim and the exact answer to 'Neo4j has an index, does GFQL?' — was only a footnote. Adds a full guide: create_index/gfql_index_all/show_indexes/drop_index, index_policy (use/auto/force/off), gfql_explain, Cypher DDL + wire protocol, and the sourced numbers (flat-in-N 0.12ms @8M-117M edges; 9-28x vs Kuzu/Neo4j on selective lookups; CPU-wins-seeded vs GPU floor). Honest build-cost + parity-or-fallback section. Wires into the toctree + a seeded-lookup recommended path; shrinks the engines.rst F5 footnote to a cross-link. Persona-driven (round-1 user-testing: Priya/Neo4j-migrant + Maya's slow seeded lookup). Numbers already measured (benchmarks/gfql/index_*bench.py, dgx-spark). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1e31fc5 to
13085fb
Compare
73fa242 to
1e0d542
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
The engine-selection guide (#1661) documented all four engines + a decision matrix but the CSR adjacency index — the strongest competitive claim and the exact answer to 'Neo4j has an index, does GFQL?' — was only a footnote. Adds a full guide: create_index/gfql_index_all/show_indexes/drop_index, index_policy (use/auto/force/off), gfql_explain, Cypher DDL + wire protocol, and the sourced numbers (flat-in-N 0.12ms @8M-117M edges; 9-28x vs Kuzu/Neo4j on selective lookups; CPU-wins-seeded vs GPU floor). Honest build-cost + parity-or-fallback section. Wires into the toctree + a seeded-lookup recommended path; shrinks the engines.rst F5 footnote to a cross-link. Persona-driven (round-1 user-testing: Priya/Neo4j-migrant + Maya's slow seeded lookup). Numbers already measured (benchmarks/gfql/index_*bench.py, dgx-spark). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
13085fb to
12476c6
Compare
1e0d542 to
bfdfc65
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
The engine-selection guide (#1661) documented all four engines + a decision matrix but the CSR adjacency index — the strongest competitive claim and the exact answer to 'Neo4j has an index, does GFQL?' — was only a footnote. Adds a full guide: create_index/gfql_index_all/show_indexes/drop_index, index_policy (use/auto/force/off), gfql_explain, Cypher DDL + wire protocol, and the sourced numbers (flat-in-N 0.12ms @8M-117M edges; 9-28x vs Kuzu/Neo4j on selective lookups; CPU-wins-seeded vs GPU floor). Honest build-cost + parity-or-fallback section. Wires into the toctree + a seeded-lookup recommended path; shrinks the engines.rst F5 footnote to a cross-link. Persona-driven (round-1 user-testing: Priya/Neo4j-migrant + Maya's slow seeded lookup). Numbers already measured (benchmarks/gfql/index_*bench.py, dgx-spark). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
12476c6 to
0aece20
Compare
399a5a6 to
6a01e2a
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
The engine-selection guide (#1661) documented all four engines + a decision matrix but the CSR adjacency index — the strongest competitive claim and the exact answer to 'Neo4j has an index, does GFQL?' — was only a footnote. Adds a full guide: create_index/gfql_index_all/show_indexes/drop_index, index_policy (use/auto/force/off), gfql_explain, Cypher DDL + wire protocol, and the sourced numbers (flat-in-N 0.12ms @8M-117M edges; 9-28x vs Kuzu/Neo4j on selective lookups; CPU-wins-seeded vs GPU floor). Honest build-cost + parity-or-fallback section. Wires into the toctree + a seeded-lookup recommended path; shrinks the engines.rst F5 footnote to a cross-link. Persona-driven (round-1 user-testing: Priya/Neo4j-migrant + Maya's slow seeded lookup). Numbers already measured (benchmarks/gfql/index_*bench.py, dgx-spark). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
0aece20 to
373463e
Compare
dc4f9da to
f9733bf
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
The engine-selection guide (#1661) documented all four engines + a decision matrix but the CSR adjacency index — the strongest competitive claim and the exact answer to 'Neo4j has an index, does GFQL?' — was only a footnote. Adds a full guide: create_index/gfql_index_all/show_indexes/drop_index, index_policy (use/auto/force/off), gfql_explain, Cypher DDL + wire protocol, and the sourced numbers (flat-in-N 0.12ms @8M-117M edges; 9-28x vs Kuzu/Neo4j on selective lookups; CPU-wins-seeded vs GPU floor). Honest build-cost + parity-or-fallback section. Wires into the toctree + a seeded-lookup recommended path; shrinks the engines.rst F5 footnote to a cross-link. Persona-driven (round-1 user-testing: Priya/Neo4j-migrant + Maya's slow seeded lookup). Numbers already measured (benchmarks/gfql/index_*bench.py, dgx-spark). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6128a8b to
1075938
Compare
f9733bf to
a71cb31
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
The engine-selection guide (#1661) documented all four engines + a decision matrix but the CSR adjacency index — the strongest competitive claim and the exact answer to 'Neo4j has an index, does GFQL?' — was only a footnote. Adds a full guide: create_index/gfql_index_all/show_indexes/drop_index, index_policy (use/auto/force/off), gfql_explain, Cypher DDL + wire protocol, and the sourced numbers (flat-in-N 0.12ms @8M-117M edges; 9-28x vs Kuzu/Neo4j on selective lookups; CPU-wins-seeded vs GPU floor). Honest build-cost + parity-or-fallback section. Wires into the toctree + a seeded-lookup recommended path; shrinks the engines.rst F5 footnote to a cross-link. Persona-driven (round-1 user-testing: Priya/Neo4j-migrant + Maya's slow seeded lookup). Numbers already measured (benchmarks/gfql/index_*bench.py, dgx-spark). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1075938 to
8cb04d7
Compare
a71cb31 to
f19041c
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 4, 2026
The engine-selection guide (#1661) documented all four engines + a decision matrix but the CSR adjacency index — the strongest competitive claim and the exact answer to 'Neo4j has an index, does GFQL?' — was only a footnote. Adds a full guide: create_index/gfql_index_all/show_indexes/drop_index, index_policy (use/auto/force/off), gfql_explain, Cypher DDL + wire protocol, and the sourced numbers (flat-in-N 0.12ms @8M-117M edges; 9-28x vs Kuzu/Neo4j on selective lookups; CPU-wins-seeded vs GPU floor). Honest build-cost + parity-or-fallback section. Wires into the toctree + a seeded-lookup recommended path; shrinks the engines.rst F5 footnote to a cross-link. Persona-driven (round-1 user-testing: Priya/Neo4j-migrant + Maya's slow seeded lookup). Numbers already measured (benchmarks/gfql/index_*bench.py, dgx-spark). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
c6a034d to
e96e27d
Compare
f19041c to
1797940
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 4, 2026
The engine-selection guide (#1661) documented all four engines + a decision matrix but the CSR adjacency index — the strongest competitive claim and the exact answer to 'Neo4j has an index, does GFQL?' — was only a footnote. Adds a full guide: create_index/gfql_index_all/show_indexes/drop_index, index_policy (use/auto/force/off), gfql_explain, Cypher DDL + wire protocol, and the sourced numbers (flat-in-N 0.12ms @8M-117M edges; 9-28x vs Kuzu/Neo4j on selective lookups; CPU-wins-seeded vs GPU floor). Honest build-cost + parity-or-fallback section. Wires into the toctree + a seeded-lookup recommended path; shrinks the engines.rst F5 footnote to a cross-link. Persona-driven (round-1 user-testing: Priya/Neo4j-migrant + Maya's slow seeded lookup). Numbers already measured (benchmarks/gfql/index_*bench.py, dgx-spark). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
96d64be to
784a5e4
Compare
1797940 to
747383c
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 4, 2026
The engine-selection guide (#1661) documented all four engines + a decision matrix but the CSR adjacency index — the strongest competitive claim and the exact answer to 'Neo4j has an index, does GFQL?' — was only a footnote. Adds a full guide: create_index/gfql_index_all/show_indexes/drop_index, index_policy (use/auto/force/off), gfql_explain, Cypher DDL + wire protocol, and the sourced numbers (flat-in-N 0.12ms @8M-117M edges; 9-28x vs Kuzu/Neo4j on selective lookups; CPU-wins-seeded vs GPU floor). Honest build-cost + parity-or-fallback section. Wires into the toctree + a seeded-lookup recommended path; shrinks the engines.rst F5 footnote to a cross-link. Persona-driven (round-1 user-testing: Priya/Neo4j-migrant + Maya's slow seeded lookup). Numbers already measured (benchmarks/gfql/index_*bench.py, dgx-spark). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
784a5e4 to
b9b5d6e
Compare
…motivating comparison New persona-tested "Choosing a GFQL Engine" page (gfql/engines.rst): the four interchangeable engines, the one-keyword engine='polars' speedup (11-47x over pandas on real graphs, no GPU), a motivating warm-median comparison table on real public graphs (LiveJournal 35M / Orkut 117M), a decision matrix (shape x size x hardware -> engine) with crossover/work-bound/memory-pressure/GPU-or-error footnotes, cuDF-vs-polars-gpu disambiguation (eager vs fused-lazy; cuDF not deprecated), an honest "when NOT to use Polars", the differential-parity guarantee, and methodology + reproducer scripts. Also: rewrote the top of gfql/performance.rst to lead with the engine comparison (de-marketed the prose flagged by the skeptic persona), wired the page into the GFQL toctree + recommended paths, and added polars/polars-gpu to the engine examples in quick.rst and about.rst (docs previously mentioned only pandas/cuDF). Driven by 4-persona doc user-testing (pandas data scientist, RAPIDS/cuDF user, performance engineer, skeptical evaluator). Docs-only; no code change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…numbers, output-frame note Applies fixes from a second persona user-testing pass on the rendered docs: - performance.rst: removed the surviving marketing tail (the skeptic persona's #1 residual) — "A New Era", "Empower Your Data Journey", "Join the Community", and the NVIDIA-investment-implies-performance line — replaced with a tight, de-superlatived "How GFQL is fast" (the real mechanisms) + a focused Next Steps. - engines.rst: added the cuDF-WINS row to the comparison table (2-hop/100K seeds, ~85M output rows: cuDF 6.0s) so cuDF winning is visible without reading footnotes (RAPIDS persona); added a prominent note that result frames match the engine (polars-gpu/polars return polars.DataFrame; .to_pandas() to convert) — the pandas+RAPIDS personas' top practical gotcha; fixed the LDBC sf1 figure attribution (it is from a separate benchmark, not the cited Orkut/LiveJournal source-of-truth) to keep every on-page number traceable; added run counts + unified-memory note to Methodology (perf-engineer persona). Docs-only; no code change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…) + fixes Ran the repo's documented user-testing protocol (test-amplification SKILL §0 "User-Workflow Exploration") clean-room — two passes (need-finding vs original docs, then QA on the produced docs) — and applied the deltas it surfaced that the earlier ad-hoc persona pass missed: Completeness (Pass A): finished the engine enumerations the ad-hoc pass deferred — overview.rst now names all four engines + auto's resolution rule + the opt-in/no-silent- fallback contract (was "GFQL automatically executes on GPU", which implied silent selection); notebooks/gpu.rst now points GPU readers to the engines page. Accuracy/QA (Pass B): reconciled the recurring "11-47x" headline to what the on-page table supports (-> "up to ~38x", Orkut 1-hop, traceable) across 9 sites; fixed cuDF "6-18x" -> "~15x (Orkut 1-hop)"; corrected a wrong "polars (CPU) is GPU-or-error" claim (only polars-gpu is — CPU polars raises NotImplementedError); dropped the deprecated `chain` from the engines.rst entrypoint line (gfql/hop only); scoped the ~87x kuzu claim to LiveJournal + named its reproducer; stopped the CSR-index footnote from over-promising an API page that doesn't document it yet; cited the orphaned [F4] footnote. Docs-only; no code change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… silent-coercion warning A §0 user-testing pass on polars-centric personas found a real P0 gap: nothing in the docs spoke to a user who is ALREADY on Polars, and the silent default-path downgrade was never warned. A graph built from polars.DataFrame run with the default engine='auto' is coerced to pandas (auto -> cudf for cuDF input, pandas for everything else incl. Polars; it never selects the Polars engine), so result._nodes comes back pandas and downstream pl.* breaks at runtime. Fixes: - engines.rst: a `.. warning::` "Already a Polars user? pass engine='polars' — the default does not" with a pl.DataFrame in -> engine='polars' -> pl.DataFrame out worked example; co-located the "catch" (crossover + NotImplementedError) under the one-liner. - overview.rst: spelled out that auto coerces a Polars-frame graph to pandas unless you pass engine='polars'. - Added Polars to the accepted-input lists in engines.rst / overview.rst / about.rst (was "pandas, cuDF" only). Artifact: plans/gfql-engine-docs/rounds/round-003/user_testing_playbook.md. Docs-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… + reproducer Leo pushed: polars beats pandas even below 1M? Correct. The "pandas wins below ~1M" claim was stale (a coarse early finding) and contradicted the fast-path work. Fresh CPU bench (benchmarks/gfql/index_crossover_bench.py, LiveJournal subsampled, warm-median, current stack): shape 10K 100K 1M 1-hop hop polars2.7x polars4.5x polars7.6x WHERE+ORDER polars3.0x polars3.0x polars18x trivial filter polars1.5x pandas2.0x pandas1.6x (sub-ms; immaterial) So CPU polars wins the common graph-query shapes (traversal / WHERE / aggregation) from ~10K edges up; the only pandas win is a trivial sub-millisecond equality mask where the absolute difference is immaterial. The real small-size floor is GPU-only (cuDF/polars-gpu kernel launch, work-bound) — NOT extended to GPU here (this bench is CPU-only; polars-gpu stays the rougher, conditional case via F2/F3/F4). Corrected: F1 (crossover ~10K not ~1M), the decision matrix (size col >~1M -> >~10K; the "<1M -> pandas" row -> "trivial sub-ms op -> pandas, immaterial"), the "When not to use Polars" first bullet, and the motivating-table note. Also reframed "Why opt-in?" so the rationale rests on the NIE-surface robustness (auto-polars could error where pandas works), not a perf regression — consistent with keeping auto on pandas. Docs-only + one CPU bench reproducer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ll 4 engines) The CSR index works on all four engines; benchmarked seeded 1-hop on LiveJournal 35M (guarded, index==scan): pandas ~0.13ms / polars ~0.16ms (numpy searchsorted) vs cuDF ~3ms (GPU kernel-launch floor) — the clean inverse of bulk. Pick the index for selective traversal + a CPU engine to drive it. Reproducer benchmarks/gfql/index_largegraph_bench.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ome from) §0 newcomer user-test: a first-timer landing on engines.rst cold hit the headline `g.gfql(query, engine='polars')` with no `g`/`query` defined (NameError on copy-paste); construction was buried (inside the coercion warning + the bottom install block) and there was no early pointer to getting-started. Add (1) an early "New to GFQL? build a graph first -> :doc:`about`" note, and (2) a 2-line self-contained preamble (graphistry.edges + a query) so the first example runs as-is. Reuses content already on the page; no restructure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The engine-selection guide (#1661) documented all four engines + a decision matrix but the CSR adjacency index — the strongest competitive claim and the exact answer to 'Neo4j has an index, does GFQL?' — was only a footnote. Adds a full guide: create_index/gfql_index_all/show_indexes/drop_index, index_policy (use/auto/force/off), gfql_explain, Cypher DDL + wire protocol, and the sourced numbers (flat-in-N 0.12ms @8M-117M edges; 9-28x vs Kuzu/Neo4j on selective lookups; CPU-wins-seeded vs GPU floor). Honest build-cost + parity-or-fallback section. Wires into the toctree + a seeded-lookup recommended path; shrinks the engines.rst F5 footnote to a cross-link. Persona-driven (round-1 user-testing: Priya/Neo4j-migrant + Maya's slow seeded lookup). Numbers already measured (benchmarks/gfql/index_*bench.py, dgx-spark). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s-links The end-to-end benchmark showed CPU-vs-GPU speedups with no statement that they return the SAME answer (skeptic persona P0-4) and no path to the broader engine story. Adds: 'same answer on every engine' parity note (release-gate: parity or NotImplementedError), a 'this is one workload vs one baseline' framing pointing to the 4-engine guide (engines) + the seeded-index guide (index_adjacency), and those two in the see-also list. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ok (persona P1) Two persona-driven additions to the engine guide: - 'GFQL vs external graph tools': honest positioning table (Neo4j/Kuzu/igraph/ networkx) with every number conditioned + '>'/did-not-finish/not-benchmarked markers kept, and the cyclic-join caveat we do NOT claim. Serves the skeptic (Sam) + Neo4j-migrant (Priya) personas. - 'Switching engines' cookbook: the one-keyword switch, .to_pandas() round-trip for pandas-only downstream code, mixing build-frame vs run-engine, and the auto-never-picks-polars note. Consolidates scattered one-liners (Maya/Tom). Uses only already-measured numbers. RST validated (docutils clean bar Sphinx :doc: roles). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ifecycle (round-2 personas E/F) Round-2 user-testing closed all round-1 P0s and fully served personas A-D, but E (Databricks GraphFrames/Spark) and F (Snowflake/Databricks + PuppyGraph) FAILED — their tools + decision axes were absent, and GraphFrames' motif queries mapped onto the one case we disclaim with no 'it runs' reassurance. - Add GraphFrames + PuppyGraph rows to the vs-external-tools table (qualitative, 'not benchmarked yet' markers): single-node-vs-cluster (100M+ on one machine; cluster only above the single-node ceiling) and warehouse-in-place-vs-pull- subgraph (GFQL adds PageRank/centrality PuppyGraph lacks; complement). Note motif/multi-way-join queries RUN but aren't yet perf-tuned. - Benchmark page: label the headline table as PIPELINE time and note the per-graph sections are full-lifecycle (incl ETL) — kills the 3.33s-vs-7.1s apparent contradiction a skeptic hits. Numbers unchanged (positioning is qualitative; head-to-heads are the later stacked benchmark PR). RST validated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
….rst doctests New section documents the two opt-in streaming modes with honest scope: - GFQL_POLARS_CPU_STREAMING=1 -> Polars streaming engine, disk-spill (CPU) - GFQL_POLARS_GPU_EXECUTOR=streaming -> cudf-polars streaming executor (GPU) Covers when to use (oversized intermediates), the opt-in trade-off (~0.86x small-size regress; parity-identical), a set-before-import example, and an explicit limits note: streaming covers the QUERY collect, but input still materializes at ingestion (a passed LazyFrame is collected), so out-of-core INPUT (lazy scan_parquet end-to-end) is work-in-progress — cross-links the Friendster discussion. Also marks the two illustrative one-keyword snippets (placeholder df/query) '.. doc-test: skip', clearing engines.rst's test-docs failures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… unsupported Cypher The quick-start ran a Cypher string 'MATCH (a)-[e]->(b) WHERE a.id IN $seeds RETURN a, e, b' that hits a known limitation (#1273: row lowering supports one MATCH source alias at a time), so it raised in test-docs and would mislead any reader who copy-pasted it. Replace with the canonical native seeded-traversal chain — [n({id: is_in(seeds)}), e_forward(), n()] — which is what this index page is actually about, uses the index automatically, and runs green. Also defines the previously-undefined my_seed_ids. Full doc-examples suite now passes locally. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The performance.rst opener references an undefined g/query and engine='polars', so the doc-example runner executed it and failed in every polars-less lane (test-docs + test-minimal-python, pre-existing red on this branch). It is shown for reading, not execution — mark '.. doc-test: skip', same treatment as the benchmark_graphframes snippets (911f4e3). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… phrasing - performance.rst said 'below ~1M edges pandas often wins', contradicting engines.rst's measured ~10K polars crossover one click away — aligned to the measured guidance. - engines.rst referenced :doc:`benchmark_graphframes`, a page that only lands in the stacked benchmarks PR (#1668) — Sphinx unknown-doc warning if this PR ships alone. Reworded; #1668 restores the live link. - 'NO-CHEATING' is internal methodology jargon — public page now says 'No silent fallback — parity-verified' (same guarantee, reader-facing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The entry said '11-47x' and 'the ~1M crossover' while the page it describes says 'up to ~38x' and a measured ~10K-edge CPU crossover — stale from an earlier draft of the docs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
test-docs' pdflatex pass rejects Unicode ≈ (U+2248) in the PDF build (./PyGraphistry.tex:6457: LaTeX Error) — a failure previously masked by the doc-example failure ahead of it in the same job. Identical change applied on both docs-tail branches so each CI tree builds (identical both-side changes merge cleanly). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The engine guide said the streaming flags are 'read at import time' — no longer true: they're read live and settable from Python (set_cpu_streaming / set_gpu_executor + the public GPU_EXECUTORS options, added on the polars-engine PR). Document the Python API alongside the existing env vars. doc-test:skip (the API lands with the polars PR this docs PR stacks on). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… reconcile Document PHASE 12 call_mode (auto/strict) in engines.rst: a new 'Analytics under Polars' subsection (umap/hypergraph/compute_cugraph run off-engine by default, coerce back to polars, warn once; polars-gpu bridges to cuDF GPU-or-error; strict declines), and reconcile the 'Parity and honesty' section — traversal/row ops stay parity-or-NIE (never bridge) while whole-graph analytics are the one mode-gated, warned exception. (P13.6 executor-mode knobs were already documented in the streaming section.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document why there is no per-call size cap on the off-engine bridge: the transient copy is the same allocation as running the analytic on engine='cudf' directly, a row count is a poor memory proxy, and the real cap belongs at the RMM/container/deployment layer. Point memory-conscious users at call_mode='strict' or RMM/container limits. (G5 decision: reject a row-cap knob as the wrong mechanism — see plan PHASE 13.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
747383c to
d45d782
Compare
b9b5d6e to
588853f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacks on #1658 (spine: #1666 → #1660 [engine+GPU target] → #1667 [followups] → #1658 [index] → this PR). Docs-only — no code change.
What
A persona-tested Choosing a GFQL Engine page documenting the four interchangeable engines (
pandas/polars/cudf/polars-gpu), which until now were undocumented (grep confirmed zero doc mentions ofpolars).docs/source/gfql/engines.rst(new) — numbers-first:engine='polars'speedup (up to ~38× over pandas on real graphs, no GPU)performance.rst— rewrote the top to lead with the engine comparison; de-marketed the prose flagged by the skeptic persona ("Unleashing", "Graph 500 levels", NVIDIA name-drop)quick.rst/about.rst— added polars/polars-gpu to the engine examples (previously pandas/cuDF only)How it was scoped
Driven by 4-persona doc user-testing (pandas data scientist, RAPIDS/cuDF user, performance engineer, skeptical evaluator). Each persona read the current docs cold; the union of their must-haves is the acceptance bar. A round-2 user-test against the rendered docs follows.
Numbers trace to guarded benchmark runs (
benchmarks/gfql/index_bulk_olap_bench.py); no figures invented.🤖 Generated with Claude Code
Review notes
Team-polish pass: the illustrative opener snippet is
doc-test: skip(was executing in polars-less CI lanes — this branch's pre-existing red); crossover guidance aligned to the measured ~10K-edge figure (was a stale ~1M claim contradicting engines.rst); public phrasing de-jargoned ('no silent fallback — parity-verified'); CHANGELOG entry aligned to the page's numbers.