Skip to content

docs(gfql): benchmarks vs Spark GraphFrames + LadybugDB (single-node, honest)#1668

Open
lmeyerov wants to merge 10 commits into
docs/gfql-engine-docsfrom
feat/gfql-graphframes-bench
Open

docs(gfql): benchmarks vs Spark GraphFrames + LadybugDB (single-node, honest)#1668
lmeyerov wants to merge 10 commits into
docs/gfql-engine-docsfrom
feat/gfql-graphframes-bench

Conversation

@lmeyerov

@lmeyerov lmeyerov commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Stacks on #1661 (docs/gfql-engine-docs). Two competitor benchmarks in one PR (same genre: external-competitor harness + an engines.rst comparison row each). 1) GraphFrames: a new benchmark page comparing GFQL (polars CPU + polars-gpu GPU) against Apache Spark GraphFrames (local[*], single node) on filter / 1-2 hop / PageRank over SNAP LiveJournal (35M) and Orkut (117M), plus a committed reproducible harness.

Findings (honest — we volunteer where GFQL loses)

Median of 5 after 2 warmups; result-size parity enforced per task (identical answers on every engine).

filter 1-hop 2-hop PageRank
LiveJournal best GFQL vs GraphFrames 43× 7.4× 2.3× GPU 14.7× / CPU 0.33× (loses)
Orkut best GFQL vs GraphFrames 42× 8.7× 1.3× GPU 10.5× / CPU 0.23× (loses)
  1. Filter/traversal: GFQL wins 2–43× even on CPU — no JVM startup, task-serialization, or shuffle; a single-node columnar engine is the right tool for sub-second graph queries.
  2. PageRank is mixed and disclosed: GFQL's CPU/igraph path is slower than GraphFrames; only the GPU/cugraph path wins (~10–15×). Guidance: use the GPU engine for whole-graph analytics.
  3. PageRank cross-engine parity verified: Spearman ρ = 1.00, top-100 overlap 100/100 across igraph/cugraph/GraphFrames (saved artifact).
  4. Friendster (1.8B edges): documented single-node memory ceiling — GFQL pandas-load OOMs, GFQL cudf-lean swaps, GraphFrames thrashes; all exceed one 119 GB node. Reported as a wall, not dropped — it's exactly the "when to go back to a cluster / bigger node" boundary the page describes.

What's included

  • docs/source/gfql/benchmark_graphframes.rst (+ toctree entry; engines.rst head-to-head row updated to link it)
  • benchmarks/gfql/bench_graphframes.py (+ DESIGN) — guarded per-cell, shared --filter-threshold for bit-identical filter parity, node-count parity for hops, warm-median + cold-load
  • Saved results + parity JSON under _static/graphframes/ (page renders from saved JSON; does not rerun)

Method / fairness (disclosed in-page)

local[*] is Spark's weakest config (single-node regime, where GFQL lives); every Spark task ends in a materializing .count(); the pandas→polars conversion is charged to GFQL inside the timed region; PageRank maxIter=20 (GraphFrames) vs library tolerance (igraph/cugraph) disclosed; runs blocked not interleaved on a shared box. Persona-tested (Databricks GraphFrames / perf-skeptic / warehouse-PuppyGraph).

Environment: DGX dgx-spark, GB10 GPU, single node; GraphFrames 0.8.4-spark3.5-s_2.12, PySpark 3.5.1.

🤖 Generated with Claude Code

2) LadybugDB (folded from the former feat/gfql-ladybug-bench branch)

GFQL vs LadybugDB (the actively-maintained Kuzu fork) on Ladybug's own benchmark ops, run as real Cypher MATCH … RETURN through the row pipeline (no dataframe shortcuts):

  • Disclosed cross-machine comparison: the Ladybug side is their published 5M/20M numbers on their hardware; the GFQL side ran on an NVIDIA DGX Spark GB10 (native frames per engine, built once — an earlier harness version paid a ~200 ms pandas→polars conversion per call, which mis-measured polars/cudf by 27–675×; fixed and called out in the docs Methodology).
  • Results (indicative given cross-machine): GFQL wins scan-shaped ops — full node scan ~65× (polars), id range ~1.2×, relationship scans ~3.5–3.7× (cuDF), node count(*) (~1 ms via feat(gfql/polars): engine followups — native multi-hop, to_fixed_point, undirected, min_hops>1, more predicates + NIE→native #1667's count_table). Ladybug wins the two ops backed by persistent structure: point lookup (~0.3 ms index-seek vs our ~4 ms columnar scan; node-id index tracked in GFQL: route Cypher chain node-id filter through resident node_id index (O(1) point / O(log) range seek) #1676) and relationship COUNT(*) (O(1) cached count vs our O(E) endpoint-validated scan — a dataframe has no referential integrity).
  • Harnesses: benchmarks/gfql/bench_ladybug_cypher.py (Cypher row-pipeline path; per-rep + cross-engine result-size parity enforced — a mismatched cell is VOIDed) and bench_ladybug.py (dataframe-level baselines; --validate asserts against a pandas oracle; raw-frame ops labeled gfql-pandas-df, never under an engine label they didn't run on).

Review notes

Team-polish pass (03c6e5a7): headline range widened to 1.3–43× (the Orkut 2-hop 1.3× was outside the old "2–43×"); cross-machine disclosure added to the LadybugDB row + Methodology; caption markup fix; persona jargon removed; both Ladybug harnesses gained real oracle/parity assertions.

lmeyerov added a commit that referenced this pull request Jul 1, 2026
…:skip

The two GFQL/GraphFrames code blocks are illustrative (reference an undefined
g/gf/seeds and engine="polars"), so the doc-example runner was executing them
and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for
reading, not execution. Removes this page's contribution to the test-docs
failure on #1668.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov added a commit that referenced this pull request Jul 1, 2026
…:skip

The two GFQL/GraphFrames code blocks are illustrative (reference an undefined
g/gf/seeds and engine="polars"), so the doc-example runner was executing them
and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for
reading, not execution. Removes this page's contribution to the test-docs
failure on #1668.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the feat/gfql-graphframes-bench branch from e149bce to 095f52c Compare July 1, 2026 07:16
lmeyerov added a commit that referenced this pull request Jul 1, 2026
…:skip

The two GFQL/GraphFrames code blocks are illustrative (reference an undefined
g/gf/seeds and engine="polars"), so the doc-example runner was executing them
and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for
reading, not execution. Removes this page's contribution to the test-docs
failure on #1668.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the feat/gfql-graphframes-bench branch from 095f52c to e80003e Compare July 1, 2026 07:44
@lmeyerov lmeyerov force-pushed the docs/gfql-engine-docs branch from 422731f to 1e31fc5 Compare July 1, 2026 09:56
lmeyerov added a commit that referenced this pull request Jul 1, 2026
…:skip

The two GFQL/GraphFrames code blocks are illustrative (reference an undefined
g/gf/seeds and engine="polars"), so the doc-example runner was executing them
and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for
reading, not execution. Removes this page's contribution to the test-docs
failure on #1668.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the feat/gfql-graphframes-bench branch from e80003e to d6e4cbf Compare July 1, 2026 09:56
@lmeyerov lmeyerov force-pushed the docs/gfql-engine-docs branch from 1e31fc5 to 13085fb Compare July 2, 2026 05:29
lmeyerov added a commit that referenced this pull request Jul 2, 2026
…:skip

The two GFQL/GraphFrames code blocks are illustrative (reference an undefined
g/gf/seeds and engine="polars"), so the doc-example runner was executing them
and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for
reading, not execution. Removes this page's contribution to the test-docs
failure on #1668.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the feat/gfql-graphframes-bench branch from d6e4cbf to f5fadc9 Compare July 2, 2026 05:29
@lmeyerov lmeyerov force-pushed the docs/gfql-engine-docs branch from 13085fb to 12476c6 Compare July 2, 2026 06:50
lmeyerov added a commit that referenced this pull request Jul 2, 2026
…:skip

The two GFQL/GraphFrames code blocks are illustrative (reference an undefined
g/gf/seeds and engine="polars"), so the doc-example runner was executing them
and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for
reading, not execution. Removes this page's contribution to the test-docs
failure on #1668.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the feat/gfql-graphframes-bench branch from f5fadc9 to 3279664 Compare July 2, 2026 06:51
@lmeyerov lmeyerov force-pushed the docs/gfql-engine-docs branch from 12476c6 to 0aece20 Compare July 2, 2026 16:19
lmeyerov added a commit that referenced this pull request Jul 2, 2026
…:skip

The two GFQL/GraphFrames code blocks are illustrative (reference an undefined
g/gf/seeds and engine="polars"), so the doc-example runner was executing them
and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for
reading, not execution. Removes this page's contribution to the test-docs
failure on #1668.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the feat/gfql-graphframes-bench branch from 3279664 to b970c4a Compare July 2, 2026 16:19
@lmeyerov lmeyerov force-pushed the docs/gfql-engine-docs branch from 0aece20 to 373463e Compare July 2, 2026 16:34
lmeyerov added a commit that referenced this pull request Jul 2, 2026
…:skip

The two GFQL/GraphFrames code blocks are illustrative (reference an undefined
g/gf/seeds and engine="polars"), so the doc-example runner was executing them
and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for
reading, not execution. Removes this page's contribution to the test-docs
failure on #1668.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the feat/gfql-graphframes-bench branch from b970c4a to c04e583 Compare July 2, 2026 16:34
lmeyerov added a commit that referenced this pull request Jul 2, 2026
… phrasing

- performance.rst said 'below ~1M edges pandas often wins', contradicting
  engines.rst's measured ~10K polars crossover one click away — aligned to
  the measured guidance.
- engines.rst referenced :doc:`benchmark_graphframes`, a page that only
  lands in the stacked benchmarks PR (#1668) — Sphinx unknown-doc warning if
  this PR ships alone. Reworded; #1668 restores the live link.
- 'NO-CHEATING' is internal methodology jargon — public page now says
  'No silent fallback — parity-verified' (same guarantee, reader-facing).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov changed the title docs(gfql): benchmark GFQL vs Spark GraphFrames (single-node, honest) docs(gfql): benchmarks vs Spark GraphFrames + LadybugDB (single-node, honest) Jul 2, 2026
lmeyerov added a commit that referenced this pull request Jul 2, 2026
… phrasing

- performance.rst said 'below ~1M edges pandas often wins', contradicting
  engines.rst's measured ~10K polars crossover one click away — aligned to
  the measured guidance.
- engines.rst referenced :doc:`benchmark_graphframes`, a page that only
  lands in the stacked benchmarks PR (#1668) — Sphinx unknown-doc warning if
  this PR ships alone. Reworded; #1668 restores the live link.
- 'NO-CHEATING' is internal methodology jargon — public page now says
  'No silent fallback — parity-verified' (same guarantee, reader-facing).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the docs/gfql-engine-docs branch from 6128a8b to 1075938 Compare July 2, 2026 17:48
lmeyerov added a commit that referenced this pull request Jul 2, 2026
…:skip

The two GFQL/GraphFrames code blocks are illustrative (reference an undefined
g/gf/seeds and engine="polars"), so the doc-example runner was executing them
and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for
reading, not execution. Removes this page's contribution to the test-docs
failure on #1668.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the feat/gfql-graphframes-bench branch from ec4ed70 to c96944a Compare July 2, 2026 17:48
lmeyerov and others added 4 commits July 2, 2026 16:38
…est)

New page docs/source/gfql/benchmark_graphframes.rst comparing GFQL polars
(CPU) and polars-gpu (GPU) against Spark GraphFrames (local[*], single node)
on filter / 1-2 hop / PageRank over SNAP LiveJournal (35M) and Orkut (117M),
with a committed reproducible harness (benchmarks/gfql/bench_graphframes.py).

Findings, stated honestly (numbers = median of 5 after 2 warmups; result-size
parity enforced per task):
- filter/traversal: GFQL wins 2-43x even on CPU (no JVM/scheduler/shuffle
  overhead; single-node columnar is the right tool for sub-second graph queries).
- PageRank: mixed and disclosed — GFQL's CPU/igraph path is SLOWER than
  GraphFrames (0.23-0.33x); only the GPU/cugraph path wins (~10-15x). Guidance:
  reach for the GPU engine for whole-graph analytics.
- PageRank cross-engine parity verified: Spearman rho = 1.00, top-100 overlap
  100/100 across igraph/cugraph/GraphFrames (saved artifact).
- Friendster (1.8B edges): documented single-node memory ceiling — every engine
  (GFQL pandas-load OOM, GFQL cudf-lean swap, GraphFrames thrash) exceeds one
  119GB node; reported as a wall, not dropped.

Harness: shared --filter-threshold for bit-identical filter parity; node-count
parity for hops; guarded per-cell (OOM/skip continues); warm-median + cold-load;
GraphFrames 0.8.4-spark3.5-s_2.12 / PySpark 3.5.1. Results rendered from saved
JSON (_static/graphframes/). engines.rst head-to-head row updated to link this
page; toctree entry added. Persona-tested (Raj/Sam/Lena): maxIter/tolerance
disclosed, single-node ceiling stated, blocked-not-interleaved noted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…:skip

The two GFQL/GraphFrames code blocks are illustrative (reference an undefined
g/gf/seeds and engine="polars"), so the doc-example runner was executing them
and failing test-docs. Mark both with '.. doc-test: skip' — they are shown for
reading, not execution. Removes this page's contribution to the test-docs
failure on #1668.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t an engine ceiling

The Friendster OOM was our harness's EAGER in-memory load (pandas.read_parquet ->
~29GB frame + degree build), before the query engine ran — not a hard limit of
Polars/cudf-polars. Both ship larger-than-memory streaming paths this harness did
not exercise:
  - CPU: GFQL_POLARS_CPU_STREAMING=1 -> collect(engine='streaming'), disk-spill
  - GPU: GFQL_POLARS_GPU_EXECUTOR=streaming -> cudf-polars streaming executor
Reframe the Friendster section as 'where the eager-load harness stops; streaming
is the untested next step', fix the now-inaccurate 'GFQL does not spill to disk'
caveat, and soften the 'single-node ceiling' language throughout. The proper
larger-than-memory test (lazy scan_parquet + streaming collect at 1.8B) is
follow-up work, not a conceded limitation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ny-validated)

Adds a GFQL-native port of the LadybugDB vs Kuzu benchmark
(github.com/LadybugDB/kuzu-ladybug-benchmark) so GFQL polars/cudf can be
compared head-to-head against Ladybug 0.18.0 / Kuzu 0.11.3 on the SAME synthetic
Item/Owns suite. Maps their ops onto GFQL primitives, several of which are
direct analogues of our work:
  - op9  out-degree for seeded nodes == our CSR edge_out_adj seeded index
  - op11 scan-rel rowid              == columnar edge scan / Arrow return
  - op13 Arrow CSR export            == create_index('edge_out_adj')
  - op5  id range query              == the "do we need another index?" question

Validated on tiny CPU data (1K nodes / 5K edges, engine=polars): all 8 ops
correct vs a pandas oracle (range=501, point=1). Full 5M/20M-scale runs + the
lbug/kuzu head-to-head + timings are reserved for the GPU bench box. Harness has
a size guard (refuses >2M nodes / 8M edges on non-cudf) and a sys.path guard so
it prefers the working-tree graphistry over a stale pip install.

engines.rst: adds a LadybugDB row (actively-maintained Kuzu fork; ART/hash index
choice; out-of-core billion-scale) — honestly noting Ladybug's out-of-core
billion-scale as a genuine complement/gap (GFQL is in-memory; single-node ceiling
at ~1.8B), with the head-to-head marked in progress.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov and others added 6 commits July 2, 2026 16:38
Correct the LadybugDB row: GFQL is in-memory by DEFAULT but not limited to it —
Polars streaming (GFQL_POLARS_CPU_STREAMING=1) and the cudf-polars streaming
executor (GFQL_POLARS_GPU_EXECUTOR=streaming) are larger-than-memory paths. Frame
out-of-core as a not-yet-benchmarked head-to-head, not a flat GFQL gap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…peline path)

Runs Ladybug's benchmark ops as GFQL Cypher MATCH...RETURN (the honest row
pipeline, not df shortcuts) at their 5M/20M size. Scorecard vs Ladybug's
published numbers: GFQL WINS full_scan (14.9x) + scan_rel (cudf 3.3-3.5x); LOSES
count (~770x), point (~675x), range (~27x) — all Cypher-lowering/row-pipeline
overhead (count materializes instead of len(edges); point/range full-scan the
pipeline). Roadmap in plans/.../ladybug-receipts/. Feeds the row-pipeline
optimization (shared with #1670).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…call conversion)

The harness built the graph in pandas for ALL engines, so every engine='polars'/
'cudf' call re-converted the 5M-row string-column frame (~200ms pandas->polars) —
swamping sub-10ms queries and making polars/cudf look 27-675x slower than reality.
Proven: same point filter = 209ms (pandas-built) vs 1.96ms (polars-built), 106x.

Now each engine is benchmarked on a graph built in its OWN native frame type
(pandas/polars/cuDF), constructed once outside the timing loop — the honest
comparison (a polars user keeps polars data; Ladybug queries its own native store).
Corrected native-per-engine result (5M/20M, polars): full_scan 55.6ms (WIN 68x),
range 5.55ms (WIN 1.4x, was falsely 27x LOSS), point 2.66ms (vs LB 0.3ms, close).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ine, honest)

Replaces the "in progress" placeholder with the measured 5M/20M scorecard (GFQL
running Ladybug's ops as the identical Cypher MATCH...RETURN row pipeline, each
engine on its native frames). GFQL wins the scan-shaped ops: full_scan ~65x,
range ~1.2x, scan_rel ~3.5-3.7x (cuDF). Point lookup ~4ms vs LB's ~0.3ms index
seek (close; a resident adjacency index closes it), and rel COUNT(*) is LB's O(1)
cached count vs GFQL's O(E) endpoint-validated scan (dataframe = no referential
integrity). Honest: name the two ops a persistent-index store still wins.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…els, ranges

Review-wave findings on the benchmarks tail:
- engines.rst LadybugDB row presented a cross-machine comparison as
  'head-to-head': the Ladybug figures are THEIR published results on THEIR
  hardware, GFQL ran on our DGX Spark. Now disclosed in the row + a
  Methodology bullet naming the harness; 'index closes it' softened to
  'should close it' (unmeasured, tracked in #1676).
- GraphFrames headline '2-43x' excluded the 1.3x Orkut 2-hop shown in the
  same page's table — now '1.3-43x (most cells 2x+)' here and in engines.rst.
- Fix nested inline markup in the Orkut caption (bold inside an unclosed
  italic span mis-renders); de-jargon the persona note; 'dgx-spark' hostname
  -> 'NVIDIA DGX Spark' product name.
- bench_ladybug.py: --validate printed the oracle but never compared —
  now asserts result sizes (timing void on mismatch). The raw-dataframe ops
  (count_rel/out_degree_seeded/scan_rel*) always ran on the pandas ingest
  frames yet were labeled system='gfql-{engine}' — an --engine cudf run
  reported pandas timings under a GPU label; now labeled 'gfql-pandas-df'
  (engine-native Cypher timings live in bench_ladybug_cypher.py).
- bench_ladybug_cypher.py: cross-machine method disclosed in the docstring
  (gitignored plans/ receipt citation dropped); med() now returns+checks
  result size per rep and main() enforces cross-engine size parity (VOIDs a
  mismatched cell) — was timing-only.
- engines.rst streaming note: align wording with the base PR (link to the
  GraphFrames page rides this PR's own comparison rows).

Validated: both harnesses smoke-run tiny (pandas+polars; oracle asserts pass,
NIE reported honestly); ranges/captions proofread against results.json.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same class as the ≈ fix: test-docs' PDF pass errors on the Spearman ρ in the
GraphFrames page (./PyGraphistry.tex:8184). Swept the whole docs-tail diff
for remaining non-ASCII: em/en-dashes, ×, →, … all already pass on the green
base runs; ρ was the only newcomer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov added a commit that referenced this pull request Jul 2, 2026
… phrasing

- performance.rst said 'below ~1M edges pandas often wins', contradicting
  engines.rst's measured ~10K polars crossover one click away — aligned to
  the measured guidance.
- engines.rst referenced :doc:`benchmark_graphframes`, a page that only
  lands in the stacked benchmarks PR (#1668) — Sphinx unknown-doc warning if
  this PR ships alone. Reworded; #1668 restores the live link.
- 'NO-CHEATING' is internal methodology jargon — public page now says
  'No silent fallback — parity-verified' (same guarantee, reader-facing).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the docs/gfql-engine-docs branch from 1075938 to 8cb04d7 Compare July 2, 2026 23:40
@lmeyerov lmeyerov force-pushed the feat/gfql-graphframes-bench branch from 359d3ee to eadc84b Compare July 2, 2026 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant