docs(gfql): benchmarks vs Spark GraphFrames + LadybugDB (single-node, honest) by lmeyerov · Pull Request #1668 · graphistry/pygraphistry

lmeyerov · 2026-07-01T04:16:16Z

Stacks on #1661 (docs/gfql-engine-docs). Two competitor benchmarks in one PR (same genre: external-competitor harness + an engines.rst comparison row each). 1) GraphFrames: a new benchmark page comparing GFQL (polars CPU + polars-gpu GPU) against Apache Spark GraphFrames (local[*], single node) on filter / 1-2 hop / PageRank over SNAP LiveJournal (35M) and Orkut (117M), plus a committed reproducible harness.

Findings (honest — we volunteer where GFQL loses)

Median of 5 after 2 warmups; result-size parity enforced per task (identical answers on every engine).

	filter	1-hop	2-hop	PageRank
LiveJournal best GFQL vs GraphFrames	43×	7.4×	2.3×	GPU 14.7× / CPU 0.33× (loses)
Orkut best GFQL vs GraphFrames	42×	8.7×	1.3×	GPU 10.5× / CPU 0.23× (loses)

Filter/traversal: GFQL wins 2–43× even on CPU — no JVM startup, task-serialization, or shuffle; a single-node columnar engine is the right tool for sub-second graph queries.
PageRank is mixed and disclosed: GFQL's CPU/igraph path is slower than GraphFrames; only the GPU/cugraph path wins (~10–15×). Guidance: use the GPU engine for whole-graph analytics.
PageRank cross-engine parity verified: Spearman ρ = 1.00, top-100 overlap 100/100 across igraph/cugraph/GraphFrames (saved artifact).
Friendster (1.8B edges): documented single-node memory ceiling — GFQL pandas-load OOMs, GFQL cudf-lean swaps, GraphFrames thrashes; all exceed one 119 GB node. Reported as a wall, not dropped — it's exactly the "when to go back to a cluster / bigger node" boundary the page describes.

What's included

docs/source/gfql/benchmark_graphframes.rst (+ toctree entry; engines.rst head-to-head row updated to link it)
benchmarks/gfql/bench_graphframes.py (+ DESIGN) — guarded per-cell, shared --filter-threshold for bit-identical filter parity, node-count parity for hops, warm-median + cold-load
Saved results + parity JSON under _static/graphframes/ (page renders from saved JSON; does not rerun)

Method / fairness (disclosed in-page)

local[*] is Spark's weakest config (single-node regime, where GFQL lives); every Spark task ends in a materializing .count(); the pandas→polars conversion is charged to GFQL inside the timed region; PageRank maxIter=20 (GraphFrames) vs library tolerance (igraph/cugraph) disclosed; runs blocked not interleaved on a shared box. Persona-tested (Databricks GraphFrames / perf-skeptic / warehouse-PuppyGraph).

Environment: DGX dgx-spark, GB10 GPU, single node; GraphFrames 0.8.4-spark3.5-s_2.12, PySpark 3.5.1.

🤖 Generated with Claude Code

2) LadybugDB (folded from the former `feat/gfql-ladybug-bench` branch)

GFQL vs LadybugDB (the actively-maintained Kuzu fork) on Ladybug's own benchmark ops, run as real Cypher MATCH … RETURN through the row pipeline (no dataframe shortcuts):

Disclosed cross-machine comparison: the Ladybug side is their published 5M/20M numbers on their hardware; the GFQL side ran on an NVIDIA DGX Spark GB10 (native frames per engine, built once — an earlier harness version paid a ~200 ms pandas→polars conversion per call, which mis-measured polars/cudf by 27–675×; fixed and called out in the docs Methodology).
Results (indicative given cross-machine): GFQL wins scan-shaped ops — full node scan ~65× (polars), id range ~1.2×, relationship scans ~3.5–3.7× (cuDF), node count(*) (~1 ms via feat(gfql/polars): engine followups — native multi-hop, to_fixed_point, undirected, min_hops>1, more predicates + NIE→native #1667's count_table). Ladybug wins the two ops backed by persistent structure: point lookup (~0.3 ms index-seek vs our ~4 ms columnar scan; node-id index tracked in GFQL: route Cypher chain node-id filter through resident node_id index (O(1) point / O(log) range seek) #1676) and relationship COUNT(*) (O(1) cached count vs our O(E) endpoint-validated scan — a dataframe has no referential integrity).
Harnesses: benchmarks/gfql/bench_ladybug_cypher.py (Cypher row-pipeline path; per-rep + cross-engine result-size parity enforced — a mismatched cell is VOIDed) and bench_ladybug.py (dataframe-level baselines; --validate asserts against a pandas oracle; raw-frame ops labeled gfql-pandas-df, never under an engine label they didn't run on).

Review notes

Team-polish pass (03c6e5a7): headline range widened to 1.3–43× (the Orkut 2-hop 1.3× was outside the old "2–43×"); cross-machine disclosure added to the LadybugDB row + Methodology; caption markup fix; persona jargon removed; both Ladybug harnesses gained real oracle/parity assertions.

…:skip The two GFQL/GraphFrames code blocks are illustrative (reference an undefined g/gf/seeds and engine="polars"), so the doc-example runner was executing them and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for reading, not execution. Removes this page's contribution to the test-docs failure on #1668. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… phrasing - performance.rst said 'below ~1M edges pandas often wins', contradicting engines.rst's measured ~10K polars crossover one click away — aligned to the measured guidance. - engines.rst referenced :doc:`benchmark_graphframes`, a page that only lands in the stacked benchmarks PR (#1668) — Sphinx unknown-doc warning if this PR ships alone. Reworded; #1668 restores the live link. - 'NO-CHEATING' is internal methodology jargon — public page now says 'No silent fallback — parity-verified' (same guarantee, reader-facing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…:skip The two GFQL/GraphFrames code blocks are illustrative (reference an undefined g/gf/seeds and engine="polars"), so the doc-example runner was executing them and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for reading, not execution. Removes this page's contribution to the test-docs failure on #1668. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…est) New page docs/source/gfql/benchmark_graphframes.rst comparing GFQL polars (CPU) and polars-gpu (GPU) against Spark GraphFrames (local[*], single node) on filter / 1-2 hop / PageRank over SNAP LiveJournal (35M) and Orkut (117M), with a committed reproducible harness (benchmarks/gfql/bench_graphframes.py). Findings, stated honestly (numbers = median of 5 after 2 warmups; result-size parity enforced per task): - filter/traversal: GFQL wins 2-43x even on CPU (no JVM/scheduler/shuffle overhead; single-node columnar is the right tool for sub-second graph queries). - PageRank: mixed and disclosed — GFQL's CPU/igraph path is SLOWER than GraphFrames (0.23-0.33x); only the GPU/cugraph path wins (~10-15x). Guidance: reach for the GPU engine for whole-graph analytics. - PageRank cross-engine parity verified: Spearman rho = 1.00, top-100 overlap 100/100 across igraph/cugraph/GraphFrames (saved artifact). - Friendster (1.8B edges): documented single-node memory ceiling — every engine (GFQL pandas-load OOM, GFQL cudf-lean swap, GraphFrames thrash) exceeds one 119GB node; reported as a wall, not dropped. Harness: shared --filter-threshold for bit-identical filter parity; node-count parity for hops; guarded per-cell (OOM/skip continues); warm-median + cold-load; GraphFrames 0.8.4-spark3.5-s_2.12 / PySpark 3.5.1. Results rendered from saved JSON (_static/graphframes/). engines.rst head-to-head row updated to link this page; toctree entry added. Persona-tested (Raj/Sam/Lena): maxIter/tolerance disclosed, single-node ceiling stated, blocked-not-interleaved noted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…:skip The two GFQL/GraphFrames code blocks are illustrative (reference an undefined g/gf/seeds and engine="polars"), so the doc-example runner was executing them and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for reading, not execution. Removes this page's contribution to the test-docs failure on #1668. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…t an engine ceiling The Friendster OOM was our harness's EAGER in-memory load (pandas.read_parquet -> ~29GB frame + degree build), before the query engine ran — not a hard limit of Polars/cudf-polars. Both ship larger-than-memory streaming paths this harness did not exercise: - CPU: GFQL_POLARS_CPU_STREAMING=1 -> collect(engine='streaming'), disk-spill - GPU: GFQL_POLARS_GPU_EXECUTOR=streaming -> cudf-polars streaming executor Reframe the Friendster section as 'where the eager-load harness stops; streaming is the untested next step', fix the now-inaccurate 'GFQL does not spill to disk' caveat, and soften the 'single-node ceiling' language throughout. The proper larger-than-memory test (lazy scan_parquet + streaming collect at 1.8B) is follow-up work, not a conceded limitation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ny-validated) Adds a GFQL-native port of the LadybugDB vs Kuzu benchmark (github.com/LadybugDB/kuzu-ladybug-benchmark) so GFQL polars/cudf can be compared head-to-head against Ladybug 0.18.0 / Kuzu 0.11.3 on the SAME synthetic Item/Owns suite. Maps their ops onto GFQL primitives, several of which are direct analogues of our work: - op9 out-degree for seeded nodes == our CSR edge_out_adj seeded index - op11 scan-rel rowid == columnar edge scan / Arrow return - op13 Arrow CSR export == create_index('edge_out_adj') - op5 id range query == the "do we need another index?" question Validated on tiny CPU data (1K nodes / 5K edges, engine=polars): all 8 ops correct vs a pandas oracle (range=501, point=1). Full 5M/20M-scale runs + the lbug/kuzu head-to-head + timings are reserved for the GPU bench box. Harness has a size guard (refuses >2M nodes / 8M edges on non-cudf) and a sys.path guard so it prefers the working-tree graphistry over a stale pip install. engines.rst: adds a LadybugDB row (actively-maintained Kuzu fork; ART/hash index choice; out-of-core billion-scale) — honestly noting Ladybug's out-of-core billion-scale as a genuine complement/gap (GFQL is in-memory; single-node ceiling at ~1.8B), with the head-to-head marked in progress. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Correct the LadybugDB row: GFQL is in-memory by DEFAULT but not limited to it — Polars streaming (GFQL_POLARS_CPU_STREAMING=1) and the cudf-polars streaming executor (GFQL_POLARS_GPU_EXECUTOR=streaming) are larger-than-memory paths. Frame out-of-core as a not-yet-benchmarked head-to-head, not a flat GFQL gap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…peline path) Runs Ladybug's benchmark ops as GFQL Cypher MATCH...RETURN (the honest row pipeline, not df shortcuts) at their 5M/20M size. Scorecard vs Ladybug's published numbers: GFQL WINS full_scan (14.9x) + scan_rel (cudf 3.3-3.5x); LOSES count (~770x), point (~675x), range (~27x) — all Cypher-lowering/row-pipeline overhead (count materializes instead of len(edges); point/range full-scan the pipeline). Roadmap in plans/.../ladybug-receipts/. Feeds the row-pipeline optimization (shared with #1670). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…call conversion) The harness built the graph in pandas for ALL engines, so every engine='polars'/ 'cudf' call re-converted the 5M-row string-column frame (~200ms pandas->polars) — swamping sub-10ms queries and making polars/cudf look 27-675x slower than reality. Proven: same point filter = 209ms (pandas-built) vs 1.96ms (polars-built), 106x. Now each engine is benchmarked on a graph built in its OWN native frame type (pandas/polars/cuDF), constructed once outside the timing loop — the honest comparison (a polars user keeps polars data; Ladybug queries its own native store). Corrected native-per-engine result (5M/20M, polars): full_scan 55.6ms (WIN 68x), range 5.55ms (WIN 1.4x, was falsely 27x LOSS), point 2.66ms (vs LB 0.3ms, close). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ine, honest) Replaces the "in progress" placeholder with the measured 5M/20M scorecard (GFQL running Ladybug's ops as the identical Cypher MATCH...RETURN row pipeline, each engine on its native frames). GFQL wins the scan-shaped ops: full_scan ~65x, range ~1.2x, scan_rel ~3.5-3.7x (cuDF). Point lookup ~4ms vs LB's ~0.3ms index seek (close; a resident adjacency index closes it), and rel COUNT(*) is LB's O(1) cached count vs GFQL's O(E) endpoint-validated scan (dataframe = no referential integrity). Honest: name the two ops a persistent-index store still wins. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…els, ranges Review-wave findings on the benchmarks tail: - engines.rst LadybugDB row presented a cross-machine comparison as 'head-to-head': the Ladybug figures are THEIR published results on THEIR hardware, GFQL ran on our DGX Spark. Now disclosed in the row + a Methodology bullet naming the harness; 'index closes it' softened to 'should close it' (unmeasured, tracked in #1676). - GraphFrames headline '2-43x' excluded the 1.3x Orkut 2-hop shown in the same page's table — now '1.3-43x (most cells 2x+)' here and in engines.rst. - Fix nested inline markup in the Orkut caption (bold inside an unclosed italic span mis-renders); de-jargon the persona note; 'dgx-spark' hostname -> 'NVIDIA DGX Spark' product name. - bench_ladybug.py: --validate printed the oracle but never compared — now asserts result sizes (timing void on mismatch). The raw-dataframe ops (count_rel/out_degree_seeded/scan_rel*) always ran on the pandas ingest frames yet were labeled system='gfql-{engine}' — an --engine cudf run reported pandas timings under a GPU label; now labeled 'gfql-pandas-df' (engine-native Cypher timings live in bench_ladybug_cypher.py). - bench_ladybug_cypher.py: cross-machine method disclosed in the docstring (gitignored plans/ receipt citation dropped); med() now returns+checks result size per rep and main() enforces cross-engine size parity (VOIDs a mismatched cell) — was timing-only. - engines.rst streaming note: align wording with the base PR (link to the GraphFrames page rides this PR's own comparison rows). Validated: both harnesses smoke-run tiny (pandas+polars; oracle asserts pass, NIE reported honestly); ranges/captions proofread against results.json. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Same class as the ≈ fix: test-docs' PDF pass errors on the Spearman ρ in the GraphFrames page (./PyGraphistry.tex:8184). Swept the whole docs-tail diff for remaining non-ASCII: em/en-dashes, ×, →, … all already pass on the green base runs; ρ was the only newcomer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… phrasing - performance.rst said 'below ~1M edges pandas often wins', contradicting engines.rst's measured ~10K polars crossover one click away — aligned to the measured guidance. - engines.rst referenced :doc:`benchmark_graphframes`, a page that only lands in the stacked benchmarks PR (#1668) — Sphinx unknown-doc warning if this PR ships alone. Reworded; #1668 restores the live link. - 'NO-CHEATING' is internal methodology jargon — public page now says 'No silent fallback — parity-verified' (same guarantee, reader-facing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lmeyerov force-pushed the feat/gfql-graphframes-bench branch from e149bce to 095f52c Compare July 1, 2026 07:16

lmeyerov force-pushed the feat/gfql-graphframes-bench branch from 095f52c to e80003e Compare July 1, 2026 07:44

lmeyerov force-pushed the docs/gfql-engine-docs branch from 422731f to 1e31fc5 Compare July 1, 2026 09:56

lmeyerov force-pushed the feat/gfql-graphframes-bench branch from e80003e to d6e4cbf Compare July 1, 2026 09:56

lmeyerov force-pushed the docs/gfql-engine-docs branch from 1e31fc5 to 13085fb Compare July 2, 2026 05:29

lmeyerov force-pushed the feat/gfql-graphframes-bench branch from d6e4cbf to f5fadc9 Compare July 2, 2026 05:29

lmeyerov force-pushed the docs/gfql-engine-docs branch from 13085fb to 12476c6 Compare July 2, 2026 06:50

lmeyerov force-pushed the feat/gfql-graphframes-bench branch from f5fadc9 to 3279664 Compare July 2, 2026 06:51

lmeyerov force-pushed the docs/gfql-engine-docs branch from 12476c6 to 0aece20 Compare July 2, 2026 16:19

lmeyerov force-pushed the feat/gfql-graphframes-bench branch from 3279664 to b970c4a Compare July 2, 2026 16:19

lmeyerov force-pushed the docs/gfql-engine-docs branch from 0aece20 to 373463e Compare July 2, 2026 16:34

lmeyerov force-pushed the feat/gfql-graphframes-bench branch from b970c4a to c04e583 Compare July 2, 2026 16:34

lmeyerov changed the title ~~docs(gfql): benchmark GFQL vs Spark GraphFrames (single-node, honest)~~ docs(gfql): benchmarks vs Spark GraphFrames + LadybugDB (single-node, honest) Jul 2, 2026

lmeyerov force-pushed the docs/gfql-engine-docs branch from 6128a8b to 1075938 Compare July 2, 2026 17:48

lmeyerov force-pushed the feat/gfql-graphframes-bench branch from ec4ed70 to c96944a Compare July 2, 2026 17:48

lmeyerov and others added 4 commits July 2, 2026 16:38

lmeyerov and others added 6 commits July 2, 2026 16:38

lmeyerov force-pushed the docs/gfql-engine-docs branch from 1075938 to 8cb04d7 Compare July 2, 2026 23:40

lmeyerov force-pushed the feat/gfql-graphframes-bench branch from 359d3ee to eadc84b Compare July 2, 2026 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(gfql): benchmarks vs Spark GraphFrames + LadybugDB (single-node, honest)#1668

docs(gfql): benchmarks vs Spark GraphFrames + LadybugDB (single-node, honest)#1668
lmeyerov wants to merge 10 commits into
docs/gfql-engine-docsfrom
feat/gfql-graphframes-bench

lmeyerov commented Jul 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lmeyerov commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Findings (honest — we volunteer where GFQL loses)

What's included

Method / fairness (disclosed in-page)

2) LadybugDB (folded from the former feat/gfql-ladybug-bench branch)

Review notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lmeyerov commented Jul 1, 2026 •

edited

Loading

2) LadybugDB (folded from the former `feat/gfql-ladybug-bench` branch)