docs(gfql): benchmarks vs Spark GraphFrames + LadybugDB (single-node, honest)#1668
Open
lmeyerov wants to merge 10 commits into
Open
docs(gfql): benchmarks vs Spark GraphFrames + LadybugDB (single-node, honest)#1668lmeyerov wants to merge 10 commits into
lmeyerov wants to merge 10 commits into
Conversation
lmeyerov
added a commit
that referenced
this pull request
Jul 1, 2026
…:skip The two GFQL/GraphFrames code blocks are illustrative (reference an undefined g/gf/seeds and engine="polars"), so the doc-example runner was executing them and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for reading, not execution. Removes this page's contribution to the test-docs failure on #1668. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov
added a commit
that referenced
this pull request
Jul 1, 2026
…:skip The two GFQL/GraphFrames code blocks are illustrative (reference an undefined g/gf/seeds and engine="polars"), so the doc-example runner was executing them and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for reading, not execution. Removes this page's contribution to the test-docs failure on #1668. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
e149bce to
095f52c
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 1, 2026
…:skip The two GFQL/GraphFrames code blocks are illustrative (reference an undefined g/gf/seeds and engine="polars"), so the doc-example runner was executing them and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for reading, not execution. Removes this page's contribution to the test-docs failure on #1668. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
095f52c to
e80003e
Compare
422731f to
1e31fc5
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 1, 2026
…:skip The two GFQL/GraphFrames code blocks are illustrative (reference an undefined g/gf/seeds and engine="polars"), so the doc-example runner was executing them and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for reading, not execution. Removes this page's contribution to the test-docs failure on #1668. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
e80003e to
d6e4cbf
Compare
1e31fc5 to
13085fb
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
…:skip The two GFQL/GraphFrames code blocks are illustrative (reference an undefined g/gf/seeds and engine="polars"), so the doc-example runner was executing them and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for reading, not execution. Removes this page's contribution to the test-docs failure on #1668. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
d6e4cbf to
f5fadc9
Compare
13085fb to
12476c6
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
…:skip The two GFQL/GraphFrames code blocks are illustrative (reference an undefined g/gf/seeds and engine="polars"), so the doc-example runner was executing them and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for reading, not execution. Removes this page's contribution to the test-docs failure on #1668. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f5fadc9 to
3279664
Compare
12476c6 to
0aece20
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
…:skip The two GFQL/GraphFrames code blocks are illustrative (reference an undefined g/gf/seeds and engine="polars"), so the doc-example runner was executing them and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for reading, not execution. Removes this page's contribution to the test-docs failure on #1668. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3279664 to
b970c4a
Compare
0aece20 to
373463e
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
…:skip The two GFQL/GraphFrames code blocks are illustrative (reference an undefined g/gf/seeds and engine="polars"), so the doc-example runner was executing them and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for reading, not execution. Removes this page's contribution to the test-docs failure on #1668. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
b970c4a to
c04e583
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
… phrasing - performance.rst said 'below ~1M edges pandas often wins', contradicting engines.rst's measured ~10K polars crossover one click away — aligned to the measured guidance. - engines.rst referenced :doc:`benchmark_graphframes`, a page that only lands in the stacked benchmarks PR (#1668) — Sphinx unknown-doc warning if this PR ships alone. Reworded; #1668 restores the live link. - 'NO-CHEATING' is internal methodology jargon — public page now says 'No silent fallback — parity-verified' (same guarantee, reader-facing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
… phrasing - performance.rst said 'below ~1M edges pandas often wins', contradicting engines.rst's measured ~10K polars crossover one click away — aligned to the measured guidance. - engines.rst referenced :doc:`benchmark_graphframes`, a page that only lands in the stacked benchmarks PR (#1668) — Sphinx unknown-doc warning if this PR ships alone. Reworded; #1668 restores the live link. - 'NO-CHEATING' is internal methodology jargon — public page now says 'No silent fallback — parity-verified' (same guarantee, reader-facing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6128a8b to
1075938
Compare
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
…:skip The two GFQL/GraphFrames code blocks are illustrative (reference an undefined g/gf/seeds and engine="polars"), so the doc-example runner was executing them and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for reading, not execution. Removes this page's contribution to the test-docs failure on #1668. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ec4ed70 to
c96944a
Compare
…est) New page docs/source/gfql/benchmark_graphframes.rst comparing GFQL polars (CPU) and polars-gpu (GPU) against Spark GraphFrames (local[*], single node) on filter / 1-2 hop / PageRank over SNAP LiveJournal (35M) and Orkut (117M), with a committed reproducible harness (benchmarks/gfql/bench_graphframes.py). Findings, stated honestly (numbers = median of 5 after 2 warmups; result-size parity enforced per task): - filter/traversal: GFQL wins 2-43x even on CPU (no JVM/scheduler/shuffle overhead; single-node columnar is the right tool for sub-second graph queries). - PageRank: mixed and disclosed — GFQL's CPU/igraph path is SLOWER than GraphFrames (0.23-0.33x); only the GPU/cugraph path wins (~10-15x). Guidance: reach for the GPU engine for whole-graph analytics. - PageRank cross-engine parity verified: Spearman rho = 1.00, top-100 overlap 100/100 across igraph/cugraph/GraphFrames (saved artifact). - Friendster (1.8B edges): documented single-node memory ceiling — every engine (GFQL pandas-load OOM, GFQL cudf-lean swap, GraphFrames thrash) exceeds one 119GB node; reported as a wall, not dropped. Harness: shared --filter-threshold for bit-identical filter parity; node-count parity for hops; guarded per-cell (OOM/skip continues); warm-median + cold-load; GraphFrames 0.8.4-spark3.5-s_2.12 / PySpark 3.5.1. Results rendered from saved JSON (_static/graphframes/). engines.rst head-to-head row updated to link this page; toctree entry added. Persona-tested (Raj/Sam/Lena): maxIter/tolerance disclosed, single-node ceiling stated, blocked-not-interleaved noted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…:skip The two GFQL/GraphFrames code blocks are illustrative (reference an undefined g/gf/seeds and engine="polars"), so the doc-example runner was executing them and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for reading, not execution. Removes this page's contribution to the test-docs failure on #1668. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t an engine ceiling The Friendster OOM was our harness's EAGER in-memory load (pandas.read_parquet -> ~29GB frame + degree build), before the query engine ran — not a hard limit of Polars/cudf-polars. Both ship larger-than-memory streaming paths this harness did not exercise: - CPU: GFQL_POLARS_CPU_STREAMING=1 -> collect(engine='streaming'), disk-spill - GPU: GFQL_POLARS_GPU_EXECUTOR=streaming -> cudf-polars streaming executor Reframe the Friendster section as 'where the eager-load harness stops; streaming is the untested next step', fix the now-inaccurate 'GFQL does not spill to disk' caveat, and soften the 'single-node ceiling' language throughout. The proper larger-than-memory test (lazy scan_parquet + streaming collect at 1.8B) is follow-up work, not a conceded limitation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ny-validated)
Adds a GFQL-native port of the LadybugDB vs Kuzu benchmark
(github.com/LadybugDB/kuzu-ladybug-benchmark) so GFQL polars/cudf can be
compared head-to-head against Ladybug 0.18.0 / Kuzu 0.11.3 on the SAME synthetic
Item/Owns suite. Maps their ops onto GFQL primitives, several of which are
direct analogues of our work:
- op9 out-degree for seeded nodes == our CSR edge_out_adj seeded index
- op11 scan-rel rowid == columnar edge scan / Arrow return
- op13 Arrow CSR export == create_index('edge_out_adj')
- op5 id range query == the "do we need another index?" question
Validated on tiny CPU data (1K nodes / 5K edges, engine=polars): all 8 ops
correct vs a pandas oracle (range=501, point=1). Full 5M/20M-scale runs + the
lbug/kuzu head-to-head + timings are reserved for the GPU bench box. Harness has
a size guard (refuses >2M nodes / 8M edges on non-cudf) and a sys.path guard so
it prefers the working-tree graphistry over a stale pip install.
engines.rst: adds a LadybugDB row (actively-maintained Kuzu fork; ART/hash index
choice; out-of-core billion-scale) — honestly noting Ladybug's out-of-core
billion-scale as a genuine complement/gap (GFQL is in-memory; single-node ceiling
at ~1.8B), with the head-to-head marked in progress.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Correct the LadybugDB row: GFQL is in-memory by DEFAULT but not limited to it — Polars streaming (GFQL_POLARS_CPU_STREAMING=1) and the cudf-polars streaming executor (GFQL_POLARS_GPU_EXECUTOR=streaming) are larger-than-memory paths. Frame out-of-core as a not-yet-benchmarked head-to-head, not a flat GFQL gap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…peline path) Runs Ladybug's benchmark ops as GFQL Cypher MATCH...RETURN (the honest row pipeline, not df shortcuts) at their 5M/20M size. Scorecard vs Ladybug's published numbers: GFQL WINS full_scan (14.9x) + scan_rel (cudf 3.3-3.5x); LOSES count (~770x), point (~675x), range (~27x) — all Cypher-lowering/row-pipeline overhead (count materializes instead of len(edges); point/range full-scan the pipeline). Roadmap in plans/.../ladybug-receipts/. Feeds the row-pipeline optimization (shared with #1670). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…call conversion) The harness built the graph in pandas for ALL engines, so every engine='polars'/ 'cudf' call re-converted the 5M-row string-column frame (~200ms pandas->polars) — swamping sub-10ms queries and making polars/cudf look 27-675x slower than reality. Proven: same point filter = 209ms (pandas-built) vs 1.96ms (polars-built), 106x. Now each engine is benchmarked on a graph built in its OWN native frame type (pandas/polars/cuDF), constructed once outside the timing loop — the honest comparison (a polars user keeps polars data; Ladybug queries its own native store). Corrected native-per-engine result (5M/20M, polars): full_scan 55.6ms (WIN 68x), range 5.55ms (WIN 1.4x, was falsely 27x LOSS), point 2.66ms (vs LB 0.3ms, close). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ine, honest) Replaces the "in progress" placeholder with the measured 5M/20M scorecard (GFQL running Ladybug's ops as the identical Cypher MATCH...RETURN row pipeline, each engine on its native frames). GFQL wins the scan-shaped ops: full_scan ~65x, range ~1.2x, scan_rel ~3.5-3.7x (cuDF). Point lookup ~4ms vs LB's ~0.3ms index seek (close; a resident adjacency index closes it), and rel COUNT(*) is LB's O(1) cached count vs GFQL's O(E) endpoint-validated scan (dataframe = no referential integrity). Honest: name the two ops a persistent-index store still wins. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…els, ranges Review-wave findings on the benchmarks tail: - engines.rst LadybugDB row presented a cross-machine comparison as 'head-to-head': the Ladybug figures are THEIR published results on THEIR hardware, GFQL ran on our DGX Spark. Now disclosed in the row + a Methodology bullet naming the harness; 'index closes it' softened to 'should close it' (unmeasured, tracked in #1676). - GraphFrames headline '2-43x' excluded the 1.3x Orkut 2-hop shown in the same page's table — now '1.3-43x (most cells 2x+)' here and in engines.rst. - Fix nested inline markup in the Orkut caption (bold inside an unclosed italic span mis-renders); de-jargon the persona note; 'dgx-spark' hostname -> 'NVIDIA DGX Spark' product name. - bench_ladybug.py: --validate printed the oracle but never compared — now asserts result sizes (timing void on mismatch). The raw-dataframe ops (count_rel/out_degree_seeded/scan_rel*) always ran on the pandas ingest frames yet were labeled system='gfql-{engine}' — an --engine cudf run reported pandas timings under a GPU label; now labeled 'gfql-pandas-df' (engine-native Cypher timings live in bench_ladybug_cypher.py). - bench_ladybug_cypher.py: cross-machine method disclosed in the docstring (gitignored plans/ receipt citation dropped); med() now returns+checks result size per rep and main() enforces cross-engine size parity (VOIDs a mismatched cell) — was timing-only. - engines.rst streaming note: align wording with the base PR (link to the GraphFrames page rides this PR's own comparison rows). Validated: both harnesses smoke-run tiny (pandas+polars; oracle asserts pass, NIE reported honestly); ranges/captions proofread against results.json. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same class as the ≈ fix: test-docs' PDF pass errors on the Spearman ρ in the GraphFrames page (./PyGraphistry.tex:8184). Swept the whole docs-tail diff for remaining non-ASCII: em/en-dashes, ×, →, … all already pass on the green base runs; ρ was the only newcomer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov
added a commit
that referenced
this pull request
Jul 2, 2026
… phrasing - performance.rst said 'below ~1M edges pandas often wins', contradicting engines.rst's measured ~10K polars crossover one click away — aligned to the measured guidance. - engines.rst referenced :doc:`benchmark_graphframes`, a page that only lands in the stacked benchmarks PR (#1668) — Sphinx unknown-doc warning if this PR ships alone. Reworded; #1668 restores the live link. - 'NO-CHEATING' is internal methodology jargon — public page now says 'No silent fallback — parity-verified' (same guarantee, reader-facing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1075938 to
8cb04d7
Compare
359d3ee to
eadc84b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacks on #1661 (docs/gfql-engine-docs). Two competitor benchmarks in one PR (same genre: external-competitor harness + an
engines.rstcomparison row each). 1) GraphFrames: a new benchmark page comparing GFQL (polars CPU + polars-gpu GPU) against Apache Spark GraphFrames (local[*], single node) on filter / 1-2 hop / PageRank over SNAP LiveJournal (35M) and Orkut (117M), plus a committed reproducible harness.Findings (honest — we volunteer where GFQL loses)
Median of 5 after 2 warmups; result-size parity enforced per task (identical answers on every engine).
What's included
docs/source/gfql/benchmark_graphframes.rst(+ toctree entry;engines.rsthead-to-head row updated to link it)benchmarks/gfql/bench_graphframes.py(+ DESIGN) — guarded per-cell, shared--filter-thresholdfor bit-identical filter parity, node-count parity for hops, warm-median + cold-load_static/graphframes/(page renders from saved JSON; does not rerun)Method / fairness (disclosed in-page)
local[*]is Spark's weakest config (single-node regime, where GFQL lives); every Spark task ends in a materializing.count(); the pandas→polars conversion is charged to GFQL inside the timed region; PageRankmaxIter=20(GraphFrames) vs library tolerance (igraph/cugraph) disclosed; runs blocked not interleaved on a shared box. Persona-tested (Databricks GraphFrames / perf-skeptic / warehouse-PuppyGraph).Environment: DGX
dgx-spark, GB10 GPU, single node; GraphFrames0.8.4-spark3.5-s_2.12, PySpark3.5.1.🤖 Generated with Claude Code
2) LadybugDB (folded from the former
feat/gfql-ladybug-benchbranch)GFQL vs LadybugDB (the actively-maintained Kuzu fork) on Ladybug's own benchmark ops, run as real Cypher
MATCH … RETURNthrough the row pipeline (no dataframe shortcuts):count(*)(~1 ms via feat(gfql/polars): engine followups — native multi-hop, to_fixed_point, undirected, min_hops>1, more predicates + NIE→native #1667's count_table). Ladybug wins the two ops backed by persistent structure: point lookup (~0.3 ms index-seek vs our ~4 ms columnar scan; node-id index tracked in GFQL: route Cypher chain node-id filter through resident node_id index (O(1) point / O(log) range seek) #1676) and relationshipCOUNT(*)(O(1) cached count vs our O(E) endpoint-validated scan — a dataframe has no referential integrity).benchmarks/gfql/bench_ladybug_cypher.py(Cypher row-pipeline path; per-rep + cross-engine result-size parity enforced — a mismatched cell is VOIDed) andbench_ladybug.py(dataframe-level baselines;--validateasserts against a pandas oracle; raw-frame ops labeledgfql-pandas-df, never under an engine label they didn't run on).Review notes
Team-polish pass (
03c6e5a7): headline range widened to 1.3–43× (the Orkut 2-hop 1.3× was outside the old "2–43×"); cross-machine disclosure added to the LadybugDB row + Methodology; caption markup fix; persona jargon removed; both Ladybug harnesses gained real oracle/parity assertions.