[GH-2877] Add Box2D type and Box2DUDT#2878
Conversation
Introduces a planar bounding-box value type backed by a struct UDT (struct<xmin,ymin,xmax,ymax>, all double, non-nullable) so values round-trip natively to Parquet and align with GeoParquet 1.1 bbox covering columns. Empty boxes are encoded as xmin > xmax (JTS Envelope convention), making union/expand a no-op against empty. This change adds only the type and its registration. Functions (ST_Box2D, ST_MakeBox2D, ST_Extent, accessor overloads, casts) follow in subsequent commits per the plan in apache#2877.
There was a problem hiding this comment.
Pull request overview
Adds a new JVM/Spark-native planar bounding-box value type (Box2D) and a Spark UDT (Box2DUDT) as groundwork for bbox-related SQL functions and GeoParquet bbox covering-column interoperability (per GH-2877).
Changes:
- Introduce
Box2D(Java) with empty-box semantics and basic conversions (Envelope/Polygon). - Add struct-backed
Box2DUDT(struct<xmin,ymin,xmax,ymax>doubles) for Spark SQL serialization/deserialization. - Register
Box2D↔Box2DUDTinUdtRegistratorWrapper.registerAll().
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| common/src/main/java/org/apache/sedona/common/geometryObjects/Box2D.java | New planar bbox value type with empty/union helpers and conversion utilities. |
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/UDT/Box2DUDT.scala | New struct-backed Spark UDT for Box2D, including JSON schema support. |
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/UDT/UdtRegistratorWrapper.scala | Registers the new Box2D UDT mapping alongside existing Sedona UDTs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Mirrors the JVM Box2DUDT so a Box2D column materialized in PySpark (e.g. via a JVM-created DataFrame) resolves to the matching Python type. Round-trips through the struct sqlType cleanly, including the empty-box encoding (xmin > xmax || ymin > ymax).
Test coverage: UDT registration, JSON schema round-trip, Box2D serde round-trip (including empty), case-object equality, Parquet write/read of a Box2D column. Javadoc on Box2D updated to match isEmpty() (xmin > xmax || ymin > ymax), not just xmin > xmax.
Drops the in-band 'xmin > xmax' empty marker. A Box2D is now always a valid finite bbox; absence (bbox of empty geometry, extent over zero rows) is represented by SQL NULL at the column level. This matches PostGIS behavior (where Box2D(EMPTY) returns NULL) and leaves xmin > xmax free for a future antimeridian-wraparound semantics on geography bboxes (cf. sedona-db's WraparoundInterval, S2's S2LatLngRect). Drops Box2D.empty() / isEmpty() and the Python equivalents. The expandToInclude(null) no-op is preserved so aggregation buffers can fold over a stream of geometries that may produce null bboxes.
|
@zhangfengcdt @paleolimbot what do you think of this? |
paleolimbot
left a comment
There was a problem hiding this comment.
I can't speak to the Spark details but the definition looks good to me!
It also matches the GeoArrow naming and definition of the box type, which is what we'll match this with in SedonaDB: https://geoarrow.org/format.html#box
|
Would be clearer if we just name it BOXUDT, with the possibility to extend to z and m dimensions later on (if needed)? The parquet bbox does not limit to 2D scenario. |
@zhangfengcdt Yes, there will be a BOX3D type. This is to maintain compatibility with PostGIS box2d and box3d |
Got it, make sense to me. |
fromEnvelope(Envelope) and toEnvelope() are not used by the Phase 1 SQL surface (ST_Box2D, ST_MakeBox2D, ST_Extent, accessors, CAST AS geometry, ST_AsText). Removing them in line with the PostGIS box function set we're targeting.
The polygon conversion is only needed by CAST(box2d AS geometry), which lands with the function PR. Dropping until then keeps Box2D as pure data plus the Geometry intake (fromGeometry) and the merge primitive (expandToInclude) that ST_Extent needs. Removes Polygon, Coordinate, GeometryFactory imports.
Mirrors the Phase 1 SQL surface added in apache#2890, apache#2895, apache#2897, apache#2898, apache#2899 in PySpark wrappers: - ST_Box2D in st_functions - ST_MakeBox2D and ST_GeomFromBox2D in st_constructors - ST_Extent in st_aggregates Accessor overloads (ST_XMin/XMax/YMin/YMax) and ST_AsText already worked with Box2D inputs through their existing wrappers; SQL overload resolution happens on the JVM side. The Python Box2DType UDT and Box2D value class were merged in apache#2878, so collected results materialize as Box2D Python objects with xmin/ymin/xmax/ymax attributes. Closes apache#2887.
Summary
Adds the
Box2Dvalue type and its UDT, the foundation for the bbox work tracked in #2877. Functions (ST_Box2D,ST_MakeBox2D,ST_Extent, accessor overloads, casts) follow in subsequent PRs.common/.../geometryObjects/Box2D.java— planar 2D bounding box. Always a valid finite bbox; absence of a bbox is represented by SQL NULL at the column level (PostGIS-compatible).xmin > xmaxis intentionally not used as an in-band empty marker so it remains free for a future antimeridian-wraparound semantics on geography bboxes (cf.apache/sedona-db'sWraparoundInterval).spark/common/.../UDT/Box2DUDT.scala— struct-backed UDT withsqlType = struct<xmin, ymin, xmax, ymax>(alldouble, non-nullable). Struct-backed (not binary-backed) so values round-trip natively to Parquet and align zero-copy with GeoParquet 1.1 bbox covering columns.spark/common/.../UDT/UdtRegistratorWrapper.scala— registerBox2D ↔ Box2DUDT.python/sedona/spark/...— matchingBox2DTypeUDT andBox2Dvalue class so a Box2D column materialized in PySpark resolves cleanly.Box2DUDTSuitecovers UDT registration, JSON schema round-trip, serde round-trip, case-object equality, and Parquet write/read of a Box2D column.Field names (
xmin/ymin/xmax/ymax) match the GeoParquet 1.1 spec andapache/sedona-db's GeoParquet writer for direct cross-engine interop.Test plan
commonandspark/commonmodules cleanly.Box2DUDTSuite.Box2DUDTSuite.Box2DUDTSuite.Box2DTyperound-trips throughserialize/deserialize(smoke-tested locally).UdtRegistratorWrapper.registerAll()registers Box2D (asserted inBox2DUDTSuite).