feat: support ST_DWithin pushdown in vortex#8625
Conversation
Signed-off-by: Nemo Yu <zyu379@wisc.edu>
Signed-off-by: Nemo Yu <zyu379@wisc.edu>
Signed-off-by: Nemo Yu <zyu379@wisc.edu>
Signed-off-by: Nemo Yu <zyu379@wisc.edu>
Signed-off-by: Nemo Yu <zyu379@wisc.edu>
Signed-off-by: Nemo Yu <zyu379@wisc.edu>
Signed-off-by: Nemo Yu <zyu379@wisc.edu>
| match format { | ||
| Format::VortexNative => strip_wkb_wrappers(query), | ||
| // Native geometry is `GEOMETRY`: drop `ST_GeomFromWKB(..)`, route pushable `ST_DWithin`. | ||
| Format::VortexNative => route_pushable_dwithin(&strip_wkb_wrappers(query)), |
There was a problem hiding this comment.
I think CREATE MACRO in init.sql is an easier approach than rewriting the query manually.
There was a problem hiding this comment.
Here the queries call ST_GeomFromWKB/ST_DWithin, which the spatial extension already defines, and you can't shadow those: CREATE MACRO ST_DWithin(...) -> Catalog Error: Macro Function with name "ST_DWithin" already exists (same for ST_GeomFromWKB, both tested).
There was a problem hiding this comment.
But this means we won't be able to use this function on real queries since you don't have access to the query string in Duckdb. Let's think of another way. Maybe we can delete the spatial function and then create ours? This won't survive extension re-LOAD, but is a temporary good solution.
Otherwise we can write a small pass replacing ST_DWithin with vortex_DWIthin in the graph, but doing nothing else, this should be a single Visitor override.
There was a problem hiding this comment.
It is actually messy than what I thought. Tried replacing ST_DWithin in the catalog with our own bind-cleared copy. Single-table filters push (Q1 ~8.6×, Q3 ~4.9×), but it regresses the join queries.
DuckDB's ST_DWithin's bind erases the constant radius into bind_data, leaving a 2-arg predicate, which is exactly what spatial's spatial-join optimizer requires (first gate: func.children.size() != 2). Clearing the bind to expose the radius gives 3 args, trips that gate, and Q8 drops from SPATIAL_JOIN to BLOCKWISE_NL_JOIN. Confirmed via EXPLAIN (override off → SPATIAL_JOIN, on → BLOCKWISE_NL_JOIN).
So the radius-erase is what we need gone to push but spatial needs present to optimize joins, a global replacement can't have both.
There was a problem hiding this comment.
Working on SpatialBench infra itself so we never need GeomFromWKB anymore, though selective of rewritten or overwritten or Macro of ST_DWithin is still needed. Or maybe we just rename it as ST_Distance <= xxx?
| } | ||
| const DatabaseWrapper &wrapper = *reinterpret_cast<DatabaseWrapper *>(ffi_db); | ||
| try { | ||
| Connection conn(*wrapper.database->instance); |
There was a problem hiding this comment.
Why do we need a transaction here? Can we register the function in the global catalog without it?
There was a problem hiding this comment.
It looks like we cannot not. DuckDB's catalog is MVCC, so CreateFunction needs an active transaction (it throws ActiveTransaction called without active transaction otherwise). A fresh Connection only begins a transaction per query, and we're calling the catalog API directly, so RunFunctionInTransaction is just the "begin → run → commit" wrapper to supply one.
I confirmed by removing it: registration aborts and queries then fail with vortex_dwithin does not exist.
- Replace non-ASCII characters in comments with ASCII. - Document why catalog registration needs RunFunctionInTransaction. - Reference the FFI function in register_geo_aliases doc. Signed-off-by: Nemo Yu <zyu379@wisc.edu>
Merging this PR will improve performance by 12.06%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
244.4 ns | 215.3 ns | +13.55% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
304.7 ns | 275.6 ns | +10.58% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing nemo/geo-native-pushdown (94674e0) with nemo/duckdb-native-geometry (69a4f90)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
bc74778 to
8e0738a
Compare
Summary
Insert non-throwing geo predicate
vortex_dwithinin DuckDB, which later pushdown into vortex, calldistancescalar function on scanning, significantly improve Q1/Q3 performance in SpatialBench.What changes are included in this PR?
vortex_dwithinin DuckDB, which is non-throwing and can be pushdown.ST_dwithinis text rewritten intovortex_dwithin.Performance
Takeaways: Q1 and Q3 is significantly improved due to the single table geo predicate is pushdown into vortex scan. Q1 is more benefit from the manually fast path without going through
geocrate for calculation. Q3 is also potentially can be benefit from manually calculation.