graphistry · lmeyerov · Jun 29, 2026 · Jun 29, 2026 · Jun 29, 2026 · Jun 29, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,9 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
 ## [Development]
 <!-- Do Not Erase This Section - Used for tracking unreleased changes -->
 
+### Documentation
+- **GFQL engine-selection docs (pandas / polars / cuDF / polars-gpu)**: New :doc:`Choosing a GFQL Engine <gfql/engines>` page — a numbers-first, persona-tested guide to the four interchangeable engines. Adds the one-keyword `engine='polars'` speedup (up to ~38× over pandas on real graphs, no GPU), a motivating warm-median comparison table on real public graphs (LiveJournal 35M / Orkut 117M), a decision matrix (workload shape × size × hardware → engine, with the measured ~10K-edge CPU crossover, GPU-work-bound rule, polars-gpu memory-pressure caveat, and GPU-or-error contract), a cuDF-vs-polars-gpu disambiguation (eager-op vs fused-lazy; cuDF is not deprecated), an honest "when *not* to use Polars" section, the differential-parity guarantee, and a methodology + reproducer-script disclosure. Rewrote the top of `gfql/performance.rst` to lead with the engine comparison (de-marketed the prose), wired the new page into the GFQL toctree + recommended paths, and added Polars/polars-gpu to the engine examples in `gfql/quick.rst` and `gfql/about.rst` (previously only pandas/cuDF were documented). Driven by 4-persona doc user-testing (pandas DS, RAPIDS user, perf engineer, skeptical evaluator).
+
 ### Added
 - **GFQL polars execution config is Python-settable and live**: `set_cpu_streaming(bool)` and `set_gpu_executor('in-memory'|'streaming')` in `graphistry.compute.gfql.lazy` (plus the public `GPU_EXECUTORS` options and `GpuExecutor` type) set the CPU-streaming / GPU-executor knobs from Python. They resolve **Python override > environment variable > default**, read **live** per collect — previously these were env-only (`GFQL_POLARS_CPU_STREAMING` / `GFQL_POLARS_GPU_EXECUTOR`) and frozen at import, so neither a Python setting nor a post-import env change took effect. `None` resets a setter to env/default.
 - **GFQL engine conversion honors the `validate`/`warn` convention**: `Engine.df_to_engine(df, engine, *, validate=, warn=)` threads the repo-wide `validate` (`'strict'`/`'strict-fast'`/`'autofix'`; `True`→strict, `False`→autofix) + `warn` protocol into the pandas→polars and pandas→cuDF converters. On a mixed-type object column that Arrow/polars/cuDF cannot represent, `strict` raises (`NotImplementedError` for polars, `ArrowConversionError` for cuDF) and `autofix` coerces the column to string and warns — the same convention as `plot()`/`upload()`. Each engine keeps its established default (polars `strict` = parity-or-raise; cuDF `autofix` = its shipped best-effort coercion, now `warn`-suppressible).

diff --git a/benchmarks/gfql/index_crossover_bench.py b/benchmarks/gfql/index_crossover_bench.py
@@ -0,0 +1,64 @@
+#!/usr/bin/env python3
+"""Small-N pandas-vs-polars CROSSOVER bench (CPU). Answers "where does polars start
+beating pandas?" per workload SHAPE, on a real graph subsampled to N edges.
+
+The crossover is shape-dependent: row-pipeline shapes (filter / WHERE+ORDER) cross over
+much earlier than traversal (chain orchestration is the residual small-N fixed cost).
+CPU only (the crossover question is pandas-CPU vs polars-CPU); no GPU needed.
+
+Env: PARQUET=/data/edges.parquet  EDGES=10000,100000,1000000  REPS=15  WARM=3  OUT=/tmp/x.jsonl
+"""
+from __future__ import annotations
+import json, os, statistics, time
+import numpy as np
+import pandas as pd
+import graphistry
+from graphistry.compute.ast import n, e_forward
+
+
+def med(fn, reps, warm):
+    for _ in range(warm):
+        fn()
+    ts = []
+    for _ in range(reps):
+        t = time.perf_counter(); fn(); ts.append((time.perf_counter() - t) * 1e3)
+    ts.sort()
+    return statistics.median(ts)
+
+
+def main():
+    edf_full = pd.read_parquet(os.environ["PARQUET"]).astype({"src": np.int64, "dst": np.int64})
+    sizes = [int(x) for x in os.environ.get("EDGES", "10000,100000,1000000").split(",")]
+    reps = int(os.environ.get("REPS", "15")); warm = int(os.environ.get("WARM", "3"))
+    outf = open(os.environ["OUT"], "a") if os.environ.get("OUT") else None
+    print(f"{'shape':10} {'edges':>9} {'pandas_ms':>10} {'polars_ms':>10} {'polars_speedup':>15}")
+    for E in sizes:
+        edf = edf_full.head(E).reset_index(drop=True)
+        nodes = np.unique(np.concatenate([edf["src"].values, edf["dst"].values]))
+        ndf = pd.DataFrame({"id": nodes, "val": (nodes % 100).astype(np.int64)})
+        g = graphistry.nodes(ndf, "id").edges(edf, "src", "dst")
+        seeds = nodes[: max(1, len(nodes) // 100)].tolist()  # ~1% frontier
+        shapes = {
+            "filter": lambda eng: g.gfql([n({"val": 50})], engine=eng),
+            "hop1": lambda eng: g.gfql([n({"id": seeds}), e_forward()], engine=eng),
+            "where_ord": lambda eng: g.gfql(
+                "MATCH (a) WHERE a.val > 50 RETURN a.id ORDER BY a.id LIMIT 100", engine=eng),
+        }
+        for name, fn in shapes.items():
+            try:
+                rp = fn("pandas"); rl = fn("polars")  # warm + sanity
+                pm = med(lambda: fn("pandas"), reps, warm)
+                lm = med(lambda: fn("polars"), reps, warm)
+                sp = pm / lm if lm else float("nan")
+                print(f"{name:10} {E:>9} {pm:>10.3f} {lm:>10.3f} {('polars '+format(sp,'.2f')+'x') if sp>=1 else ('PANDAS '+format(1/sp,'.2f')+'x'):>15}")
+                if outf:
+                    outf.write(json.dumps(dict(shape=name, edges=E, pandas_ms=pm, polars_ms=lm,
+                                               polars_speedup=sp)) + "\n"); outf.flush()
+            except Exception as ex:
+                print(f"{name:10} {E:>9}  FAILED {type(ex).__name__}: {ex}")
+    if outf:
+        outf.close()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/docs/source/gfql/about.rst b/docs/source/gfql/about.rst
@@ -27,7 +27,7 @@ GFQL fills a critical gap in the data community by providing an in-process, high
 
 **Key Benefits:**
 
-- **Dataframe-Native:** Works directly with Pandas, cuDF, and other dataframe libraries.
+- **Dataframe-Native:** Works directly with Pandas, Polars, cuDF, and other dataframe libraries.
 - **High Performance:** Optimized for both CPU and GPU execution.
 - **Ease of Use:** No need for external databases or new infrastructure.
 - **Interoperability:** Integrates with the Python data science ecosystem, including PyGraphistry for visualization.
@@ -372,21 +372,30 @@ GFQL is optimized for GPU acceleration using ``cudf`` and ``rapids``. When using
 - GFQL detects ``cudf`` dataframes and runs the query on the GPU.
 - Achieves significant performance improvements on large datasets.
 
-7. Forcing GPU Mode
-~~~~~~~~~~~~~~~~~~~~
+7. Selecting an Engine (CPU and GPU)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-You can explicitly set the engine to ensure GPU execution.
+You can explicitly set the execution engine. The same query returns identical
+results on every engine — see :doc:`Choosing an Engine <engines>`.
 
-**Example: Force GFQL to use GPU engine**
+**Example: CPU columnar speedup (no GPU)**
 
 ::
 
-    g_result = g_gpu.gfql([ ... ], engine='cudf')
+    g_result = g.gfql([ ... ], engine='polars')   # up to ~38x over pandas on real graphs
+
+**Example: Force GFQL to use a GPU engine**
+
+::
+
+    g_result = g_gpu.gfql([ ... ], engine='cudf')        # NVIDIA GPU, eager
+    g_result = g_gpu.gfql([ ... ], engine='polars-gpu')  # NVIDIA GPU, fused plan
 
 **Explanation:**
 
-- ``engine='cudf'`` forces the use of the GPU-accelerated engine.
-- Useful when you want to ensure the query runs on the GPU.
+- ``engine='polars'`` runs the columnar CPU engine — the biggest win without a GPU.
+- ``engine='cudf'`` / ``'polars-gpu'`` force GPU-accelerated execution.
+- Useful when you want to ensure the query runs on a specific engine.
 
 Integration with PyData Ecosystem
 ---------------------------------

diff --git a/docs/source/gfql/benchmark_filter_pagerank.rst b/docs/source/gfql/benchmark_filter_pagerank.rst
@@ -30,7 +30,10 @@ no database required. This benchmark compares **Graphistry's local Cypher**
      - **3.33s**
      - **>56x**
 
-*Warm median of 5 runs, 2 warmup iterations. DGX dgx-spark, GB10 GPU.*
+*Pipeline time (search + PageRank + search), warm median of 5 runs, 2 warmup iterations. DGX
+dgx-spark, GB10 GPU. The per-graph sections below report full-lifecycle totals that also include
+one-time ETL/load — hence the slightly larger numbers there (e.g. GPlus GPU 3.33s pipeline vs
+~7.1s lifecycle).*
 
 The pipeline
 ------------
@@ -173,8 +176,23 @@ pandas / cuDF). That is what makes the CPU-to-GPU switch a configuration
 flag (``engine="cudf"``) rather than a rewrite, and what keeps ETL, search,
 and analytics in the same in-process pipeline.
 
+**Same answer on every engine.** The CPU and GPU results above are not just
+comparable — they are *identical*. Differential parity across ``pandas`` /
+``polars`` / ``cudf`` / ``polars-gpu`` is a GFQL release gate: an engine either
+returns the same result or raises ``NotImplementedError`` — never a silently
+different answer. So the speedups here are a pure hardware/engine choice, not a
+change in what the query means.
+
+This page is one workload (a filter → PageRank → filter pipeline) against one
+external baseline (Neo4j+GDS). For the full four-engine picture — when Polars
+beats pandas on CPU, when the GPU pulls ahead, and how to choose — see
+:doc:`engines`. For sub-millisecond *seeded* lookups that beat Kuzu and Neo4j
+by 9–28×, see :doc:`index_adjacency`.
+
 For more on the GFQL design and supported surface:
 
+- :doc:`engines` — choosing pandas / Polars / cuDF / Polars-GPU
+- :doc:`index_adjacency` — seeded-traversal CSR adjacency index
 - :doc:`cypher` — Cypher syntax through ``g.gfql("MATCH ...")``
 - :doc:`overview` — GFQL design, features, and GPU acceleration
 - :doc:`about` — 10-minute introduction to GFQL