You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
buckaroo-js-core is already transport-agnostic — IModel, BuckarooView, and WebSocketModel (packages/buckaroo-js-core/src/server/) let the viewer run over a raw WebSocket instead of a Jupyter comm, and buckaroo/server/ proves it end to end. But every backend that actually answers infinite_request or computes the summary-stat SDType is Python: pandas (buckaroo_widget.py), polars (polars_buckaroo.py), lazy polars, and ibis/xorq (xorq_stat_pipeline.py). There is no backend that runs in a pure Node/Electron host, so any non-Python consumer must ship and supervise a Python process.
There's a downstream proof point. An Electron app embeds buckaroo-js-core@^0.15.0 with @duckdb/node-api and no Python. To do it, it had to work around the missing backend: (a) recompute column stats with DuckDB SUMMARIZE and flatten them to plain string cells through a private adapter, bypassing the native summary-stats / pinned_rows / histogram contract; and (b) fall back to the static DFViewer over a row-capped array (SELECT * FROM (stmt) LIMIT 101), so it gets no server-side search, filter, sort, or paging — sort is ag-grid client-side over the fetched rows. The DuckDB/Node compute exists in the wild but reimplements a thin slice of buckaroo outside buckaroo and leaves DFViewerInfinite / IDatasource / SmartRowCache unused.
Suggested fix
Add a first-class DuckDB/Node backend satisfying buckaroo's two existing contracts, so the JS-core viewer renders the same as it does behind Python. No JS-core changes required — IModel is the seam.
Stats → produce SDType (Dict[col, Dict[stat, val]], col_analysis.py:7-13) from DuckDB SQL. The SQL stat set already exists in customizations/xorq_stats_v2.py (typing, null_count, min/max, distinct_count, histogram, quantiles); port those expressions to DuckDB (SUMMARIZE covers most; targeted queries for histogram bins and exact/approx distinct). Emit the transposed summary_stats_data the viewer expects — row per stat, index=stat name, histogram_bins/histogram_log_bins as number[] per column (gridUtils.ts:387extractSDFT, resolveDFData.ts).
Rows / search / sort / paging → answer infinite_request/infinite_resp (the PayloadArgs/PayloadResponse protocol, SmartRowCache.ts:17-33) by translating each [start,end) window + sort/sort_direction into ... ORDER BY <col> LIMIT <n> OFFSET <start> and the search term into a WHERE … ILIKE predicate, returning rows as Parquet — the Node analog of server/data_loading.py's search_df_str → sort_values → slice → to_parquet. Because DuckDB filters/sorts/pages in SQL, it sidesteps the live path's full-df re-sort per request (what docs/smart-row-cache-redesign.md targets) and the per-window re-execution in row paging on an aggregate-backed xorq session re-executes the full aggregation per 300-row window #923.
Two altitudes, both already supported:
a Node host drives WebSocketModel — a TypeScript twin of the Tornado DataStreamHandler (separate companion process); or
implement IModel in-process and reuse getKeySmartRowCache + getDs unchanged (embedded Electron app).
On the Python side this parallels the DFStatsClass seam (dataflow.py:308) and the XorqDfStatsV2 precedent; a DuckDbDfStatsV2 could give Python users a DuckDB stats backend too, but the headline is the Node host that needs no Python.
DuckDB stays in the host/companion package; JS-core remains compute-free behind IModel.
likely a spike first to validate stat-parity + the paging/search round-trip end to end.
Context
Identified while integrating buckaroo-js-core@^0.15.0 into a downstream Electron app (@duckdb/node-api, no Python). Read against origin/main at the 0.15.1 release (fde5213b). The live row path is SmartRowCache/infinite_request — the rowid redesign in docs/smart-row-cache-redesign.md is built but unwired. Relates to #923, #911, #918.
Problem
buckaroo-js-coreis already transport-agnostic —IModel,BuckarooView, andWebSocketModel(packages/buckaroo-js-core/src/server/) let the viewer run over a raw WebSocket instead of a Jupyter comm, andbuckaroo/server/proves it end to end. But every backend that actually answersinfinite_requestor computes the summary-statSDTypeis Python: pandas (buckaroo_widget.py), polars (polars_buckaroo.py), lazy polars, and ibis/xorq (xorq_stat_pipeline.py). There is no backend that runs in a pure Node/Electron host, so any non-Python consumer must ship and supervise a Python process.There's a downstream proof point. An Electron app embeds
buckaroo-js-core@^0.15.0with@duckdb/node-apiand no Python. To do it, it had to work around the missing backend: (a) recompute column stats with DuckDBSUMMARIZEand flatten them to plainstringcells through a private adapter, bypassing the native summary-stats /pinned_rows/ histogram contract; and (b) fall back to the staticDFViewerover a row-capped array (SELECT * FROM (stmt) LIMIT 101), so it gets no server-side search, filter, sort, or paging — sort is ag-grid client-side over the fetched rows. The DuckDB/Node compute exists in the wild but reimplements a thin slice of buckaroo outside buckaroo and leavesDFViewerInfinite/IDatasource/SmartRowCacheunused.Suggested fix
Add a first-class DuckDB/Node backend satisfying buckaroo's two existing contracts, so the JS-core viewer renders the same as it does behind Python. No JS-core changes required —
IModelis the seam.Stats → produce
SDType(Dict[col, Dict[stat, val]],col_analysis.py:7-13) from DuckDB SQL. The SQL stat set already exists incustomizations/xorq_stats_v2.py(typing,null_count, min/max,distinct_count, histogram, quantiles); port those expressions to DuckDB (SUMMARIZEcovers most; targeted queries for histogram bins and exact/approx distinct). Emit the transposedsummary_stats_datathe viewer expects — row per stat,index=stat name,histogram_bins/histogram_log_binsasnumber[]per column (gridUtils.ts:387extractSDFT,resolveDFData.ts).Rows / search / sort / paging → answer
infinite_request/infinite_resp(thePayloadArgs/PayloadResponseprotocol,SmartRowCache.ts:17-33) by translating each[start,end)window +sort/sort_directioninto... ORDER BY <col> LIMIT <n> OFFSET <start>and the search term into aWHERE … ILIKEpredicate, returning rows as Parquet — the Node analog ofserver/data_loading.py'ssearch_df_str → sort_values → slice → to_parquet. Because DuckDB filters/sorts/pages in SQL, it sidesteps the live path's full-df re-sort per request (whatdocs/smart-row-cache-redesign.mdtargets) and the per-window re-execution in row paging on an aggregate-backed xorq session re-executes the full aggregation per 300-row window #923.Two altitudes, both already supported:
WebSocketModel— a TypeScript twin of the TornadoDataStreamHandler(separate companion process); orIModelin-process and reusegetKeySmartRowCache+getDsunchanged (embedded Electron app).On the Python side this parallels the
DFStatsClassseam (dataflow.py:308) and theXorqDfStatsV2precedent; aDuckDbDfStatsV2could give Python users a DuckDB stats backend too, but the headline is the Node host that needs no Python.Open questions
IModel.spikefirst to validate stat-parity + the paging/search round-trip end to end.Context
Identified while integrating
buckaroo-js-core@^0.15.0into a downstream Electron app (@duckdb/node-api, no Python). Read againstorigin/mainat the 0.15.1 release (fde5213b). The live row path isSmartRowCache/infinite_request— the rowid redesign indocs/smart-row-cache-redesign.mdis built but unwired. Relates to #923, #911, #918.