Skip to content

Add a native Box2D type for bounding boxes #2877

@jiayuasu

Description

@jiayuasu

Phase 1 complete. All sub-issues are closed. The Box2D type, its UDT, the SQL surface, cross-language bindings (Python, Flink, Scala DataFrame API), and GeoParquet writer recognition all landed in master.

Summary of what shipped

  • Box2D value class + Box2DUDT (struct-backed xmin/ymin/xmax/ymax: double)
  • SQL: ST_Box2D, ST_MakeBox2D, ST_Extent (aggregate), ST_GeomFromBox2D, ST_AsText(box2d), accessor overloads ST_XMin/XMax/YMin/YMax(box2d)
  • Python bindings (PySpark Box2DType, all wrappers in st_functions / st_constructors / st_aggregates)
  • Scala DataFrame API wrappers in st_functions / st_constructors / st_aggregates
  • Flink bindings (scalar + aggregate + Box2DTypeSerializer)
  • GeoParquet 1.1: Box2D columns are recognized as bbox covering columns in metadata (both explicit geoparquet.covering[.geom]=<col> and auto-detected <geom>_bbox)

Sub-issues

Foundation

SQL surface

Storage

Cross-language bindings

Deferred follow-ups

These were explicitly scoped out of Phase 1 and will need their own issues if/when prioritized:

  • GeoParquet writer: Float32 + conservative outward rounding (Math.nextUp / Math.nextDown). Bit-compatible with apache/sedona-db's next_after. Deferred because writing Float32 today would create a write/read asymmetry — reads come back as struct<float> not Box2D until the reader auto-materialization is in place. Best to land Float32 + reader auto-materialization together.
  • GeoParquet reader: auto-materialize covering bbox columns as Box2D when GeoParquet 1.1 metadata points at them. Typed bbox columns from existing files with no migration. The reader path has more edge cases (legacy files, missing metadata, conflicting schemas) — worth its own change.
  • ST_Expand(box, dx, dy)
  • Box predicates (ST_BoxIntersects, ST_BoxContains)
  • Implicit geometry → box2d cast at the Catalyst level
  • Box3D, ST_3DExtent, ST_3DMakeBox, ST_ZMin/ZMax — when a concrete user (point clouds, BIM, voxel data) appears
  • ST_Box2dFromGeoHash, ST_EstimatedExtent — additional PostGIS box functions
  • Geography bboxesGeography doesn't have a bbox type today; PostGIS doesn't expose one either. Likely path: reuse Box2D with antimeridian-wraparound semantics on the X axis (xmin > xmax), matching apache/sedona-db's WraparoundInterval. This is exactly why we did not burn xmin > xmax on an in-band empty marker.
  • R bindings
  • Documentation update consolidating the Phase 1 surface in one coherent docs change.

Coordination with sedona-db

sedona-db's GeoParquet writer uses xmin/ymin/xmax/ymax (Float32), but its st_analyze_agg returns minx/miny/maxx/maxy (Float64). Worth aligning on the Parquet-spec naming convention as part of the deferred Float32 writer work above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions