Skip to content

feat: add support for GeoArrow WKB and Points#713

Closed
dwilson1988 wants to merge 6 commits into
apache:mainfrom
dwilson1988:feat/geoarrow-wkb-points
Closed

feat: add support for GeoArrow WKB and Points#713
dwilson1988 wants to merge 6 commits into
apache:mainfrom
dwilson1988:feat/geoarrow-wkb-points

Conversation

@dwilson1988
Copy link
Copy Markdown

@dwilson1988 dwilson1988 commented Mar 16, 2026

#504

Rationale for this change

This PR adds GeoArrow types for one native type (Point) and one encoded type (WKB) with a pattern set for adding all of the others by following the implementation pattern. I've tested integration with a 3rd party geometry library and confirmed it can be done with zero-copy (mostly, depends on the external library) for the underlying data.

This is a pretty big PR, but instead of breaking up the native and encoded types, I wanted one of each, with a full set of tests.

This PR supercedes #545. I think I got all/most of the comments applied there rolled into this implementation. The biggest change was isolating the geoarrow types into a subpackage to keep things neat and tidy.

I have a branch that implements for polygons as well and a very rough go-geom integration to demonstate the pattern.

What changes are included in this PR?

  • Addition of various types to support GeoArrow:
    • geoarrow.Dimensions - the x/y/z/m layout of the geometry data
    • geoarrow.EdgeInterpolation - the intended interpolation algorithm for edges in geometries
    • geoarrow.CRSType - the various formats that GeoArrow permits for spatial references
    • geoarrow.Metadata - a type used to store common metadata for all GeoArrow Geometry Types.
    • geoarrow.GeoemetryValue - a basic interface that geometry values must implement.
    • geoarrow.GeometryType - an interface that ties a native/encoded GeoArrow type to a Value format
    • (unexported) geoarrow.valueBuilder[V, G] that implements generic boilerplate for geometry builders
    • (unexported) geoarrow.geometryArray that implements generic boilerplate for geometry arrays
    • geoarrow.WKBType - a type for WKB encoded geometry types
    • geoarrow.WKBValue - represents WKB encoded geometry values
    • geoarrow.WKBArray a type alias for wkb arrays
    • geoarrow.WKBBuilder - a type alias for wkb builders
    • geoarrow.PointType a type for native encoded point geometry types.
    • geoarrow.PointValue a type for native encoded point values.
    • geoarrow.PointArray a type alias for point arrays
    • geoarrow.PointBuilder - a type alias for point builders

Are these changes tested?

Yes - tests implemented for each type following the patterns set by the rest of the extensions.

Are there any user-facing changes?

Yes - the addition of new extension types.

Additional disclaimer: I wrote all of the WKB and most of the point implementation by hand, but did use Claude Code to fill out the tests and apply some light uniformity refactors to mirror patterns in this repo.

@zeroshade @paleolimbot @Mandukhai-Alimaa

@dwilson1988 dwilson1988 requested a review from zeroshade as a code owner March 16, 2026 16:34
@zeroshade
Copy link
Copy Markdown
Member

All new files need to have the apache license header added to them please. Also you can use pre-commit to run the linter locally

@dwilson1988 dwilson1988 force-pushed the feat/geoarrow-wkb-points branch from 9913151 to b77f9ce Compare March 16, 2026 20:02
@dwilson1988
Copy link
Copy Markdown
Author

sorry about the churn - had a chicken/egg problem on the license check.

Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

My main question is: why here? If the geoarrow implementation lives here you have to wait for Matt to review the changes when your team/the geo-go community is more likely to provide useful feedback at the speed you want to iterate (zeroshade to Matt, pun intended).

If there's something specific required for the Parquet reader to work, I would suggest a much more limited scope (i.e., just geoarrow.wkb). I'm actually not sure you need it at all: in Arrow C++ we were able to support GeoParquet and completely avoid implementing any GeoArrow extension types (we just query the global type registry and if somebody else had registered the extension type and deseriailzed some JSON, returning a geoarrow.wkb data type).

Happy to defer to Matt on the scope he's willing to maintain here! (Also equally happy to open up geoarrow/geoarrow-go for you, or link to seerai/geoarrow-go from our website).

Comment on lines +84 to +98
if e.meta.CRSType != other.meta.CRSType {
return false
}
if e.meta.Edges != other.meta.Edges {
return false
}

if len(e.meta.CRS) != len(other.meta.CRS) {
return false
}
for i, r := range e.meta.CRS {
if r != other.meta.CRS[i] {
return false
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if you want to handle CRS equality properly here (we handle it here: https://github.com/apache/sedona-db/blob/4fcf9edf81fe5010a3b51475c243f155942e5541/rust/sedona-schema/src/crs.rs#L290-L300 ).

Comment on lines +57 to +64
// As defined in https://github.com/geoarrow/geoarrow
const (
PointID GeometryTypeID = 1
LineStringID GeometryTypeID = 2
PolygonID GeometryTypeID = 3
MultiPointID GeometryTypeID = 4
MultiLineStringID GeometryTypeID = 5
MultiPolygonID GeometryTypeID = 6
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth repeating here that value % 10 and value / 10 have meaning here (can be used to separate geometry type and dimensions)

if v.dim != XYZ && v.dim != XYZM {
return 0
}
return v.coords[2]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like returning NaN here but I know that the default value of 0 is more common (I just like having this be explicit).

@dwilson1988
Copy link
Copy Markdown
Author

Up to Matt, but my general thinking here is for the iceberg-go package since it already depends on it, GeoArrow would be a registered extension here. I'd like to get a geoarrow-go or geoarrow-go-* set of packages for different flavors of go geometry packages.

In general - geospatial types really shouldn't be considered "special". Keeping them here lowers friction in general.

I'll take a look at your comments - seem straightforward.

@paleolimbot
Copy link
Copy Markdown
Member

Up to Matt, but my general thinking here is for the iceberg-go package since it already depends on it, GeoArrow would be a registered extension here.

If you're willing to handle some basic CRS things here I'm happy to review them, and, geoarrow.wkb certainly has precedent and is the required zero copy to/from type for Parquet and Iceberg.

The other ones are considerably more complex, though, and aren't needed for Parquet or Iceberg. I think we would all benefit from seeing what it is like to have geoarrow.wkb here before embarking on that project. (I personally think those extension types are the perfect example of domain-specific types that are more efficiently maintained elsewhere).

@dwilson1988
Copy link
Copy Markdown
Author

I wasn't planning a full geometry implementation, but I did Point and Polygon to make sure the pattern works well - they are just basic containers for the arrow values and some generic utilities: dwilson1988#1 along with a extremely minimal test for zero-copy between arrow and go-geom here https://github.com/seerai/geoarrow-go

I don't think taking it any further than that really has any value, but I think stopping at WKB isn't far enough - want the arrow ecosystem in go complete for more than just iceberg/parquet.

@zeroshade
Copy link
Copy Markdown
Member

@paleolimbot would it make more sense possibly to create a repo in the geoarrow org which can register the extension type in the func init() and then have iceberg-go pull that in for the geoarrow support?

If you're willing to handle some basic CRS things here I'm happy to review them, and, geoarrow.wkb certainly has precedent and is the required zero copy to/from type for Parquet and Iceberg.

That said, I think I'm fine with the basic CRS things and minimum necessary to zero-copy to/from for Parquet and Iceberg to live here with additional packages in the geoarrow org to add more functionality for users that might want it. I'll give this a review and take a look at the level of complexity here. Depending on the amount of work/complexity it might make sense for the entire extension type to live in a geoarrow org repo instead of here even for this basic piece.

@dwilson1988
Copy link
Copy Markdown
Author

I'm fine with whatever approach you feel is best. I can start the conversation with @kylebarron if that's the direction you want to go. Happy to incubate/maintain them in the @seerai org until ready too.

The main thing to me that matters is the geospatial types are accessible to everyone. They've been specialized in software for far too long.

@zeroshade
Copy link
Copy Markdown
Member

I agree with you that we should make them accessible to everyone. It looks like you've pretty much isolated everything into a single geoarrow package which is perfect and i think makes it ripe to be pulled out as its own package that can live in the geoarrow org (@paleolimbot could you help with creating a repo there for this?) where it can grow and extend to cover all of the geoarrow types and include packages for connecting to the wider geospatial ecosystem in go with geom-go etc.

Then iceberg-go can simply import that geoarrow-go package and register the extension type to manipulate them.

Given the scope of this I think that's probably the best route, particularly given the TODOs you have here for the remaining types. I wouldn't want to push this out here and then move it later on, it would be better to have that separate module from the beginning so it has room to grow.

@kylebarron
Copy link
Copy Markdown
Member

No objections from me for a new repo in the geoarrow org!

@paleolimbot
Copy link
Copy Markdown
Member

paleolimbot could you help with creating a repo there for this?

Done! https://github.com/geoarrow/geoarrow-go (Daniel, you should have write access when you accept the invite...feel free to push willy nilly although I am happy to review anything if you'd like)

@dwilson1988
Copy link
Copy Markdown
Author

Sweet. Thanks!

We sticking with an MIT or should it be an Apache 2.0?

I'll migrate this PR there as the baseline and we can iterate.

Point of clarification: Do we want to geoarrow-go to be a dependency of arrow-go or make it an opt-in extension?

@zeroshade
Copy link
Copy Markdown
Member

Point of clarification: Do we want to geoarrow-go to be a dependency of arrow-go or make it an opt-in extension?

It should be an opt-in extension. Once you have the package ready in the geoarrow-go repo we can add that as a dependency to iceberg-go to handle the geospatial stuff

@dwilson1988
Copy link
Copy Markdown
Author

Closing this since we're moving in a different direction. Thanks for the input, all!

@paleolimbot
Copy link
Copy Markdown
Member

We sticking with an MIT or should it be an Apache 2.0?

Either! I think Kyle's repos are dual licensed Apache/MIT and I think mine are Apache but I didn't think too hard about it.

dwilson1988 added a commit to geoarrow/geoarrow-go that referenced this pull request Apr 3, 2026
This PR migrates the contents of
apache/arrow-go#713 into this repo

This does not yet address @paleolimbot's comments such as on CRS
equality and will be covered in a future PR. This just puts something
here for us to work on, but consider it rough clay.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants