Conversation
a445f7d to
1c99386
Compare
Merging this PR will degrade performance by 24.94%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | new_bp_prim_test_between[i64, 32768] |
177.3 µs | 236.2 µs | -24.94% |
Comparing aduffy/geo-v0 (134b470) with develop (5e5572b)
Footnotes
-
138 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
62961e2 to
fb97700
Compare
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
| duckdb_logical_type duckdb_vx_create_geometry(const char *crs) { | ||
| D_ASSERT(crs); | ||
| auto geom = | ||
| (*crs == '\0') ? duckdb::LogicalType::GEOMETRY() : duckdb::LogicalType::GEOMETRY(std::string(crs)); | ||
| auto copy = duckdb::make_uniq<duckdb::LogicalType>(std::move(geom)); | ||
| return reinterpret_cast<duckdb_logical_type>(copy.release()); | ||
| } |
There was a problem hiding this comment.
These C++ changes are because DuckDB upstream doesn't expose the full geometry type stuff over the C API.
| auto copy = duckdb::make_uniq<duckdb::LogicalType>(std::move(geom)); | ||
| return reinterpret_cast<duckdb_logical_type>(copy.release()); |
There was a problem hiding this comment.
Claude generated this function and the header update, most of the rest of the PR was braincoded
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
| let blob = unsafe { cpp::duckdb_get_blob(self.as_ptr()) }; | ||
| let slice = | ||
| unsafe { std::slice::from_raw_parts(blob.data.cast::<u8>(), blob.size.as_()) }; |
There was a problem hiding this comment.
in certain situations this could lead to UB depending on the allocator, b/c if duckdb_malloc returns NULL then you'd call slice::from_raw_parts(NULL) which is UB.
see the new take_blob function below
i only fixed this b/c i was going to repeat this for the GEOMETRY branch and figured i'd fix it in both places instead
| } | ||
|
|
||
| pub unsafe fn set_vector_buffer(&self, buffer: &VectorBufferRef) { | ||
| pub unsafe fn set_vector_buffer(&mut self, buffer: &VectorBufferRef) { |
There was a problem hiding this comment.
these methods should've always been &mut for soundness reasons, they were just missing before
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
| } | ||
|
|
||
| #[test] | ||
| fn test_geometry() { |
There was a problem hiding this comment.
see this for (rather simple) example of querying geometry data from DuckDB
Summary
Part of the implementation of #7686
This PR adds a new crate,
vortex-geo, which will hold all extension types, custom layouts, and encodings necessary to store geospatial vector datasets in Vortex files.The goal of this crate is to enable integration with DuckDB Spatial, GeoDataFusion, SedonaDB, and Iceberg v3 geo types.
This initial PR implements an extension type for the Well-Known Binary encoding (WKB). This encoding is the most common format for geospatial data for analytics, it's what both GeoParquet and DuckDB use to represent geometry types.
We also wire this into vortex DuckDB extension to support converting Geometry columns between Vortex and DuckDB formats.
API Changes
Adds new crate
vortex-geowith extension typeWellKnownBinary, (geo.wkb)Adds support for geometry columns to DuckDB as well, e.g. you can now do something like
It won't be very performant yet, not until we support better layouts + stats for geometry.
Testing
We have unit tests for metadata serde, round trip conversion between Vortex <> DuckDB geometry format, and an additional E2E test that demonstrates reading a Vortex file with geometry data and providing it to the DuckDB Spatial extension.