Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
a3463b9
docs(gfql): engine-selection guide (pandas/polars/cuDF/polars-gpu) + …
lmeyerov Jun 29, 2026
38ff068
docs(gfql): round-2 user-testing fixes — de-market perf tail, honest …
lmeyerov Jun 29, 2026
3d6fd93
docs(gfql): apply test-amplification §0 protocol findings (clean-room…
lmeyerov Jun 29, 2026
e1494ee
docs(gfql): address the existing-Polars-user persona (§0 round-003) —…
lmeyerov Jun 29, 2026
467e63e
docs(gfql): correct the CPU crossover (~10K, not ~1M) with fresh data…
lmeyerov Jun 29, 2026
3396eb2
docs(gfql): F5 — per-engine CSR index numbers (seeded = CPU's game, a…
lmeyerov Jun 29, 2026
5c79475
docs(gfql): engines page — orient the cold newcomer (where does `g` c…
lmeyerov Jun 29, 2026
beab161
docs(gfql): seeded-traversal CSR adjacency index guide (persona P0-3)
lmeyerov Jul 1, 2026
5e9640a
docs(gfql): benchmark page — add parity statement + engine/index cros…
lmeyerov Jul 1, 2026
e19f7f1
docs(gfql): engines page — vs-external-tools table + switching cookbo…
lmeyerov Jul 1, 2026
9e875ce
docs(gfql): position vs GraphFrames + PuppyGraph; label pipeline-vs-l…
lmeyerov Jul 1, 2026
392a15f
docs(gfql): add 'Larger-than-memory: streaming' section + fix engines…
lmeyerov Jul 1, 2026
aa2ae0d
docs(gfql): fix index_adjacency quick-start — working chain form, not…
lmeyerov Jul 1, 2026
aca31e2
docs(gfql): mark illustrative engine-switch snippet doc-test:skip
lmeyerov Jul 2, 2026
afe99d6
docs(gfql): review fixes — crossover consistency, forward ref, public…
lmeyerov Jul 2, 2026
c522269
docs(changelog): align engine-docs entry with the measured numbers
lmeyerov Jul 2, 2026
627d183
docs(gfql): replace U+2248 with ASCII '~' — pdflatex chokes on ≈
lmeyerov Jul 2, 2026
9b7b334
docs(gfql): document Python-settable polars streaming/executor config
lmeyerov Jul 4, 2026
04101f5
docs(gfql): document off-engine call() modality (call_mode) + honesty…
lmeyerov Jul 4, 2026
588853f
docs(gfql): memory note for call_mode auto bridge (G5 decision)
lmeyerov Jul 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
## [Development]
<!-- Do Not Erase This Section - Used for tracking unreleased changes -->

### Documentation
- **GFQL engine-selection docs (pandas / polars / cuDF / polars-gpu)**: New :doc:`Choosing a GFQL Engine <gfql/engines>` page — a numbers-first, persona-tested guide to the four interchangeable engines. Adds the one-keyword `engine='polars'` speedup (up to ~38× over pandas on real graphs, no GPU), a motivating warm-median comparison table on real public graphs (LiveJournal 35M / Orkut 117M), a decision matrix (workload shape × size × hardware → engine, with the measured ~10K-edge CPU crossover, GPU-work-bound rule, polars-gpu memory-pressure caveat, and GPU-or-error contract), a cuDF-vs-polars-gpu disambiguation (eager-op vs fused-lazy; cuDF is not deprecated), an honest "when *not* to use Polars" section, the differential-parity guarantee, and a methodology + reproducer-script disclosure. Rewrote the top of `gfql/performance.rst` to lead with the engine comparison (de-marketed the prose), wired the new page into the GFQL toctree + recommended paths, and added Polars/polars-gpu to the engine examples in `gfql/quick.rst` and `gfql/about.rst` (previously only pandas/cuDF were documented). Driven by 4-persona doc user-testing (pandas DS, RAPIDS user, perf engineer, skeptical evaluator).

### Added
- **GFQL polars execution config is Python-settable and live**: `set_cpu_streaming(bool)` and `set_gpu_executor('in-memory'|'streaming')` in `graphistry.compute.gfql.lazy` (plus the public `GPU_EXECUTORS` options and `GpuExecutor` type) set the CPU-streaming / GPU-executor knobs from Python. They resolve **Python override > environment variable > default**, read **live** per collect — previously these were env-only (`GFQL_POLARS_CPU_STREAMING` / `GFQL_POLARS_GPU_EXECUTOR`) and frozen at import, so neither a Python setting nor a post-import env change took effect. `None` resets a setter to env/default.
- **GFQL engine conversion honors the `validate`/`warn` convention**: `Engine.df_to_engine(df, engine, *, validate=, warn=)` threads the repo-wide `validate` (`'strict'`/`'strict-fast'`/`'autofix'`; `True`→strict, `False`→autofix) + `warn` protocol into the pandas→polars and pandas→cuDF converters. On a mixed-type object column that Arrow/polars/cuDF cannot represent, `strict` raises (`NotImplementedError` for polars, `ArrowConversionError` for cuDF) and `autofix` coerces the column to string and warns — the same convention as `plot()`/`upload()`. Each engine keeps its established default (polars `strict` = parity-or-raise; cuDF `autofix` = its shipped best-effort coercion, now `warn`-suppressible).
Expand Down
64 changes: 64 additions & 0 deletions benchmarks/gfql/index_crossover_bench.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#!/usr/bin/env python3
"""Small-N pandas-vs-polars CROSSOVER bench (CPU). Answers "where does polars start
beating pandas?" per workload SHAPE, on a real graph subsampled to N edges.

The crossover is shape-dependent: row-pipeline shapes (filter / WHERE+ORDER) cross over
much earlier than traversal (chain orchestration is the residual small-N fixed cost).
CPU only (the crossover question is pandas-CPU vs polars-CPU); no GPU needed.

Env: PARQUET=/data/edges.parquet EDGES=10000,100000,1000000 REPS=15 WARM=3 OUT=/tmp/x.jsonl
"""
from __future__ import annotations
import json, os, statistics, time
import numpy as np
import pandas as pd
import graphistry
from graphistry.compute.ast import n, e_forward


def med(fn, reps, warm):
for _ in range(warm):
fn()
ts = []
for _ in range(reps):
t = time.perf_counter(); fn(); ts.append((time.perf_counter() - t) * 1e3)
ts.sort()
return statistics.median(ts)


def main():
edf_full = pd.read_parquet(os.environ["PARQUET"]).astype({"src": np.int64, "dst": np.int64})
sizes = [int(x) for x in os.environ.get("EDGES", "10000,100000,1000000").split(",")]
reps = int(os.environ.get("REPS", "15")); warm = int(os.environ.get("WARM", "3"))
outf = open(os.environ["OUT"], "a") if os.environ.get("OUT") else None
print(f"{'shape':10} {'edges':>9} {'pandas_ms':>10} {'polars_ms':>10} {'polars_speedup':>15}")
for E in sizes:
edf = edf_full.head(E).reset_index(drop=True)
nodes = np.unique(np.concatenate([edf["src"].values, edf["dst"].values]))
ndf = pd.DataFrame({"id": nodes, "val": (nodes % 100).astype(np.int64)})
g = graphistry.nodes(ndf, "id").edges(edf, "src", "dst")
seeds = nodes[: max(1, len(nodes) // 100)].tolist() # ~1% frontier
shapes = {
"filter": lambda eng: g.gfql([n({"val": 50})], engine=eng),
"hop1": lambda eng: g.gfql([n({"id": seeds}), e_forward()], engine=eng),
"where_ord": lambda eng: g.gfql(
"MATCH (a) WHERE a.val > 50 RETURN a.id ORDER BY a.id LIMIT 100", engine=eng),
}
for name, fn in shapes.items():
try:
rp = fn("pandas"); rl = fn("polars") # warm + sanity
pm = med(lambda: fn("pandas"), reps, warm)
lm = med(lambda: fn("polars"), reps, warm)
sp = pm / lm if lm else float("nan")
print(f"{name:10} {E:>9} {pm:>10.3f} {lm:>10.3f} {('polars '+format(sp,'.2f')+'x') if sp>=1 else ('PANDAS '+format(1/sp,'.2f')+'x'):>15}")
if outf:
outf.write(json.dumps(dict(shape=name, edges=E, pandas_ms=pm, polars_ms=lm,
polars_speedup=sp)) + "\n"); outf.flush()
except Exception as ex:
print(f"{name:10} {E:>9} FAILED {type(ex).__name__}: {ex}")
if outf:
outf.close()


if __name__ == "__main__":
main()
25 changes: 17 additions & 8 deletions docs/source/gfql/about.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ GFQL fills a critical gap in the data community by providing an in-process, high

**Key Benefits:**

- **Dataframe-Native:** Works directly with Pandas, cuDF, and other dataframe libraries.
- **Dataframe-Native:** Works directly with Pandas, Polars, cuDF, and other dataframe libraries.
- **High Performance:** Optimized for both CPU and GPU execution.
- **Ease of Use:** No need for external databases or new infrastructure.
- **Interoperability:** Integrates with the Python data science ecosystem, including PyGraphistry for visualization.
Expand Down Expand Up @@ -372,21 +372,30 @@ GFQL is optimized for GPU acceleration using ``cudf`` and ``rapids``. When using
- GFQL detects ``cudf`` dataframes and runs the query on the GPU.
- Achieves significant performance improvements on large datasets.

7. Forcing GPU Mode
~~~~~~~~~~~~~~~~~~~~
7. Selecting an Engine (CPU and GPU)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can explicitly set the engine to ensure GPU execution.
You can explicitly set the execution engine. The same query returns identical
results on every engine — see :doc:`Choosing an Engine <engines>`.

**Example: Force GFQL to use GPU engine**
**Example: CPU columnar speedup (no GPU)**

::

g_result = g_gpu.gfql([ ... ], engine='cudf')
g_result = g.gfql([ ... ], engine='polars') # up to ~38x over pandas on real graphs

**Example: Force GFQL to use a GPU engine**

::

g_result = g_gpu.gfql([ ... ], engine='cudf') # NVIDIA GPU, eager
g_result = g_gpu.gfql([ ... ], engine='polars-gpu') # NVIDIA GPU, fused plan

**Explanation:**

- ``engine='cudf'`` forces the use of the GPU-accelerated engine.
- Useful when you want to ensure the query runs on the GPU.
- ``engine='polars'`` runs the columnar CPU engine — the biggest win without a GPU.
- ``engine='cudf'`` / ``'polars-gpu'`` force GPU-accelerated execution.
- Useful when you want to ensure the query runs on a specific engine.

Integration with PyData Ecosystem
---------------------------------
Expand Down
20 changes: 19 additions & 1 deletion docs/source/gfql/benchmark_filter_pagerank.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,10 @@ no database required. This benchmark compares **Graphistry's local Cypher**
- **3.33s**
- **>56x**

*Warm median of 5 runs, 2 warmup iterations. DGX dgx-spark, GB10 GPU.*
*Pipeline time (search + PageRank + search), warm median of 5 runs, 2 warmup iterations. DGX
dgx-spark, GB10 GPU. The per-graph sections below report full-lifecycle totals that also include
one-time ETL/load — hence the slightly larger numbers there (e.g. GPlus GPU 3.33s pipeline vs
~7.1s lifecycle).*

The pipeline
------------
Expand Down Expand Up @@ -173,8 +176,23 @@ pandas / cuDF). That is what makes the CPU-to-GPU switch a configuration
flag (``engine="cudf"``) rather than a rewrite, and what keeps ETL, search,
and analytics in the same in-process pipeline.

**Same answer on every engine.** The CPU and GPU results above are not just
comparable — they are *identical*. Differential parity across ``pandas`` /
``polars`` / ``cudf`` / ``polars-gpu`` is a GFQL release gate: an engine either
returns the same result or raises ``NotImplementedError`` — never a silently
different answer. So the speedups here are a pure hardware/engine choice, not a
change in what the query means.

This page is one workload (a filter → PageRank → filter pipeline) against one
external baseline (Neo4j+GDS). For the full four-engine picture — when Polars
beats pandas on CPU, when the GPU pulls ahead, and how to choose — see
:doc:`engines`. For sub-millisecond *seeded* lookups that beat Kuzu and Neo4j
by 9–28×, see :doc:`index_adjacency`.

For more on the GFQL design and supported surface:

- :doc:`engines` — choosing pandas / Polars / cuDF / Polars-GPU
- :doc:`index_adjacency` — seeded-traversal CSR adjacency index
- :doc:`cypher` — Cypher syntax through ``g.gfql("MATCH ...")``
- :doc:`overview` — GFQL design, features, and GPU acceleration
- :doc:`about` — 10-minute introduction to GFQL
Expand Down
Loading
Loading