Skip to content

add vortex-geo crate and WKB extension type#7722

Open
a10y wants to merge 9 commits intodevelopfrom
aduffy/geo-v0
Open

add vortex-geo crate and WKB extension type#7722
a10y wants to merge 9 commits intodevelopfrom
aduffy/geo-v0

Conversation

@a10y
Copy link
Copy Markdown
Contributor

@a10y a10y commented Apr 29, 2026

Summary

Part of the implementation of #7686

This PR adds a new crate, vortex-geo, which will hold all extension types, custom layouts, and encodings necessary to store geospatial vector datasets in Vortex files.

The goal of this crate is to enable integration with DuckDB Spatial, GeoDataFusion, SedonaDB, and Iceberg v3 geo types.

This initial PR implements an extension type for the Well-Known Binary encoding (WKB). This encoding is the most common format for geospatial data for analytics, it's what both GeoParquet and DuckDB use to represent geometry types.

We also wire this into vortex DuckDB extension to support converting Geometry columns between Vortex and DuckDB formats.

API Changes

Adds new crate vortex-geo with extension type WellKnownBinary, (geo.wkb)

Adds support for geometry columns to DuckDB as well, e.g. you can now do something like

SELECT building_id, building_name, ST_Area(geometry) AS area 
FROM read_vortex("buildings.vortex")
ORDER BY area DESC
LIMIT 10;

It won't be very performant yet, not until we support better layouts + stats for geometry.

Testing

We have unit tests for metadata serde, round trip conversion between Vortex <> DuckDB geometry format, and an additional E2E test that demonstrates reading a Vortex file with geometry data and providing it to the DuckDB Spatial extension.

@a10y a10y force-pushed the aduffy/geo-v0 branch 2 times, most recently from a445f7d to 1c99386 Compare April 29, 2026 20:51
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 29, 2026

Merging this PR will degrade performance by 24.94%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 1 regressed benchmark
✅ 1168 untouched benchmarks
⏩ 138 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation new_bp_prim_test_between[i64, 32768] 177.3 µs 236.2 µs -24.94%

Comparing aduffy/geo-v0 (134b470) with develop (5e5572b)

Open in CodSpeed

Footnotes

  1. 138 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@a10y a10y added the changelog/feature A new feature label Apr 29, 2026
@a10y a10y force-pushed the aduffy/geo-v0 branch 2 times, most recently from 62961e2 to fb97700 Compare April 30, 2026 21:06
a10y added 5 commits May 1, 2026 12:00
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y force-pushed the aduffy/geo-v0 branch from 3382e4f to 05dcff9 Compare May 1, 2026 16:02
@a10y a10y marked this pull request as ready for review May 1, 2026 16:24
Comment on lines +28 to +34
duckdb_logical_type duckdb_vx_create_geometry(const char *crs) {
D_ASSERT(crs);
auto geom =
(*crs == '\0') ? duckdb::LogicalType::GEOMETRY() : duckdb::LogicalType::GEOMETRY(std::string(crs));
auto copy = duckdb::make_uniq<duckdb::LogicalType>(std::move(geom));
return reinterpret_cast<duckdb_logical_type>(copy.release());
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These C++ changes are because DuckDB upstream doesn't expose the full geometry type stuff over the C API.

Comment on lines +32 to +33
auto copy = duckdb::make_uniq<duckdb::LogicalType>(std::move(geom));
return reinterpret_cast<duckdb_logical_type>(copy.release());
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude generated this function and the header update, most of the rest of the PR was braincoded

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Comment thread vortex-duckdb/src/datasource.rs Outdated
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Comment on lines -100 to -102
let blob = unsafe { cpp::duckdb_get_blob(self.as_ptr()) };
let slice =
unsafe { std::slice::from_raw_parts(blob.data.cast::<u8>(), blob.size.as_()) };
Copy link
Copy Markdown
Contributor Author

@a10y a10y May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in certain situations this could lead to UB depending on the allocator, b/c if duckdb_malloc returns NULL then you'd call slice::from_raw_parts(NULL) which is UB.

see the new take_blob function below

i only fixed this b/c i was going to repeat this for the GEOMETRY branch and figured i'd fix it in both places instead

}

pub unsafe fn set_vector_buffer(&self, buffer: &VectorBufferRef) {
pub unsafe fn set_vector_buffer(&mut self, buffer: &VectorBufferRef) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these methods should've always been &mut for soundness reasons, they were just missing before

a10y added 2 commits May 1, 2026 14:45
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y requested review from 0ax1 and joseph-isaacs May 1, 2026 18:51
}

#[test]
fn test_geometry() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see this for (rather simple) example of querying geometry data from DuckDB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant