[DOCS] Add Box2D SQL documentation#2966
Conversation
Document the planar Box2D UDT and its function surface as a sibling of
the Geometry and Geography type docs.
- New `docs/api/sql/box2d/` folder with:
- `Box2D-Functions.md` master page (introduction, semantic notes,
constructor/accessor/predicate/function tables, type-conversion
via CAST and function forms, and a query-optimization summary
linking to Optimizer.md).
- Per-function pages for constructors (ST_Box2D, ST_MakeBox2D,
ST_GeomFromBox2D), accessors (ST_XMin/YMin/XMax/YMax Box2D
variants), predicates (ST_BoxIntersects, ST_BoxContains), and
Box2D variants of ST_Expand and ST_AsText.
- Optimizer page additions:
- `## Box2D filter pushdown` describing the row-group inequality
translation for Box2D columns (apache#2946).
- `## Box2D spatial join` describing the rectangle-polygon
materialisation at the join boundary that lets ST_BoxIntersects /
ST_BoxContains reuse the existing range and broadcast-index join
operators (apache#2939).
- `mkdocs.yml`: added "Box2D Functions" nav entry between Geometry and
Geography under "Vector data".
Local `mkdocs build --strict` produces all eleven Box2D pages cleanly;
the remaining warnings are pre-existing and unrelated. Chinese (.zh.md)
translations are intentionally out of scope and will be picked up by
the existing i18n epic.
There was a problem hiding this comment.
Pull request overview
Adds SQL documentation for Sedona’s Box2D type, covering its constructors, accessors, predicates, scalar functions, optimizer behavior, and navigation entry.
Changes:
- Adds a new
docs/api/sql/box2d/documentation section for Box2D APIs. - Extends optimizer docs with Box2D filter pushdown and spatial join behavior.
- Adds Box2D to the SQL Vector data navigation.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
mkdocs.yml |
Adds Box2D Functions to SQL navigation. |
docs/api/sql/Optimizer.md |
Documents Box2D pushdown and join optimization. |
docs/api/sql/box2d/Box2D-Functions.md |
Adds Box2D overview and function tables. |
docs/api/sql/box2d/Box2D-Constructors/ST_Box2D.md |
Documents ST_Box2D. |
docs/api/sql/box2d/Box2D-Constructors/ST_MakeBox2D.md |
Documents ST_MakeBox2D. |
docs/api/sql/box2d/Box2D-Constructors/ST_GeomFromBox2D.md |
Documents ST_GeomFromBox2D. |
docs/api/sql/box2d/Box2D-Accessors/ST_XMin.md |
Documents Box2D ST_XMin. |
docs/api/sql/box2d/Box2D-Accessors/ST_XMax.md |
Documents Box2D ST_XMax. |
docs/api/sql/box2d/Box2D-Accessors/ST_YMin.md |
Documents Box2D ST_YMin. |
docs/api/sql/box2d/Box2D-Accessors/ST_YMax.md |
Documents Box2D ST_YMax. |
docs/api/sql/box2d/Box2D-Predicates/ST_BoxIntersects.md |
Documents ST_BoxIntersects. |
docs/api/sql/box2d/Box2D-Predicates/ST_BoxContains.md |
Documents ST_BoxContains. |
docs/api/sql/box2d/Box2D-Functions/ST_Expand.md |
Documents Box2D ST_Expand. |
docs/api/sql/box2d/Box2D-Functions/ST_AsText.md |
Documents Box2D ST_AsText. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| When a query filters on a `Box2D` column (see [Box2D Functions](box2d/Box2D-Functions.md)) using `ST_BoxIntersects` or `ST_BoxContains` against a literal `Box2D`, Sedona translates the predicate into Parquet row-group inequalities on the column's underlying `xmin` / `ymin` / `xmax` / `ymax` leaves and pushes them down via `ParquetInputFormat.setFilterPredicate`. Parquet's row-group statistics machinery then skips row groups whose recorded min/max disprove the predicate — no file metadata scan is required. | ||
|
|
||
| This works for any writer that produces a `Box2D` column (including the `<geom>_bbox` covering column auto-generated by Sedona when writing GeoParquet 1.1), because the pruning operates on the actual stored values' statistics rather than on a separate geometry-column bbox. |
There was a problem hiding this comment.
Fixed in 3ef5ef5 — rewrote the paragraph to clarify that the pushdown applies to Box2DUDT-typed columns (obtained via ST_Box2D(geom) or the SQL cast). The auto-generated _bbox column is a plain struct<xmin,ymin,xmax,ymax>; it satisfies the GeoParquet covering contract but is not a Box2D, so these predicates don't target it directly — users on that column rely on the existing file-metadata pushdown described in the previous section.
| | `ST_BoxContains(box_col, lit)` | `box.xmin <= lit.xmin AND box.xmax >= lit.xmax AND box.ymin <= lit.ymin AND box.ymax >= lit.ymax` | | ||
| | `ST_BoxContains(lit, box_col)` | `box.xmin >= lit.xmin AND box.xmax <= lit.xmax AND box.ymin >= lit.ymin AND box.ymax <= lit.ymax` | | ||
|
|
||
| Pushdown is enabled by default; it is gated by the same Spark setting as ordinary Parquet predicate pushdown (`spark.sql.parquet.filterPushdown`). Inverted-bound literals (`xmin > xmax` / `ymin > ymax`) are not pushed down — the predicate falls back to per-row evaluation so callers see the expected `IllegalArgumentException` from the scalar contract. |
There was a problem hiding this comment.
Fixed in 3ef5ef5 — documented both flags: spark.sedona.geoparquet.spatialFilterPushDown gates rule attachment, spark.sql.parquet.filterPushdown gates Parquet honouring it. Disabling either disables Box2D pushdown.
| - `Box2D` values use closed-interval semantics: edge-touching boxes are considered intersecting and (per [ST_BoxContains](Box2D-Predicates/ST_BoxContains.md)) contained. | ||
| - Absence is represented by SQL `NULL` rather than an in-band sentinel. | ||
| - Bounds are required to be ordered (`xmin <= xmax`, `ymin <= ymax`). Inverted-bound values are reserved for a future antimeridian-wraparound semantics on geography bboxes; predicates and join planning throw `IllegalArgumentException` on inverted input today. | ||
| - Unlike [ST_Envelope](../Bounding-Box-Functions/ST_Envelope.md), which returns a `Geometry` polygon, [ST_Box2D](Box2D-Constructors/ST_Box2D.md) returns a typed `Box2D` value. Prefer the typed form when downstream code only needs the four bounds, and prefer the polygon when downstream code expects a `Geometry`. |
There was a problem hiding this comment.
Fixed in 3ef5ef5 — softened the contrast: ST_Envelope returns the envelope as a Geometry (typically a polygon, but Point/LineString for degenerate inputs), while ST_Box2D always returns a typed Box2D.
|
|
||
| Introduction: Return the planar bounding box of a Geometry as a typed `Box2D` value (four doubles: `xmin`, `ymin`, `xmax`, `ymax`). | ||
|
|
||
| `ST_Box2D` is the typed counterpart to [ST_Envelope](../../Bounding-Box-Functions/ST_Envelope.md). `ST_Envelope` returns a `Geometry` polygon; `ST_Box2D` returns a `Box2D` value that serialises to a struct of four non-nullable doubles and round-trips through Parquet without WKB overhead. |
There was a problem hiding this comment.
Fixed in 3ef5ef5 — same softening on ST_Box2D.md.
| | :--- | :--- | :--- | :--- | | ||
| | [ST_Expand](Box2D-Functions/ST_Expand.md) | Box2D | Expand a Box2D by a per-axis or uniform delta. | v1.9.1 | | ||
| | [ST_AsText](Box2D-Functions/ST_AsText.md) | String | Return the `BOX(xmin ymin, xmax ymax)` text representation of a Box2D. | v1.9.1 | | ||
|
|
There was a problem hiding this comment.
Added in 3ef5ef5: new docs/api/sql/Aggregate-Functions/ST_Extent.md (the function is already SQL-registered via Catalog.aggregateExpressions; only the docs page was missing). Cross-referenced from Geometry-Functions.md's Aggregate Functions table (Geometry input) and from a new "Box2D Aggregates" subsection on Box2D-Functions.md (Box2D output).
| - Quick start: api/sql/Overview.md | ||
| - Vector data: | ||
| - Geometry Functions: api/sql/Geometry-Functions.md | ||
| - Box2D Functions: api/sql/box2d/Box2D-Functions.md |
There was a problem hiding this comment.
Fixed in 3ef5ef5 — added "Box2D Functions: Box2D 函数" under nav_translations alongside the Geometry / Geography entries.
- Correct the Box2D filter pushdown source-of-truth. The auto-generated
`<geom>_bbox` covering column is written as a plain
`struct<xmin,ymin,xmax,ymax>`, not as Box2DUDT, so ST_BoxIntersects /
ST_BoxContains do not target it directly. Rewrote the paragraph to
point users at `ST_Box2D(geom)` (or the SQL cast) for row-group-level
pushdown, and at the existing file-metadata pushdown for the
auto-generated column.
- Document the dual-flag gating: `spark.sedona.geoparquet.spatialFilterPushDown`
controls whether the rule injects the spatial predicate at all;
`spark.sql.parquet.filterPushdown` controls whether Parquet honours
it. Disabling either disables Box2D pushdown.
- Soften the ST_Envelope-vs-ST_Box2D contrast on both Box2D-Functions.md
and ST_Box2D.md — JTS `Geometry.getEnvelope()` returns Point or
LineString for degenerate inputs, not always a polygon.
- Add an ST_Extent aggregate page under `Aggregate-Functions/` (the
function is already SQL-registered via `Catalog.aggregateExpressions`;
the docs page was just missing) and reference it from:
* `Geometry-Functions.md` Aggregate Functions table (Geometry input)
* `Box2D-Functions.md` new "Box2D Aggregates" subsection (Box2D
output, so the page is discoverable from the Box2D reference too).
- Localise the new "Box2D Functions" nav label under
`nav_translations` so the Chinese build doesn't show an
English-only outlier between the localised Geometry / Geography
entries.
Verified locally: `mkdocs build --strict` produces the ST_Extent page
and the updated Box2D pages cleanly; remaining warnings are pre-existing
and unrelated.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[DOCS] my subject. Documentation for the Box2D type and associated functions delivered across the Box2D epic (Add a native Box2D type for bounding boxes #2877, Implement ST_Expand for Box2D #2925, Add Box2D predicates: ST_BoxIntersects, ST_BoxContains #2926, Filter pushdown for ST_BoxIntersects / ST_BoxContains to GeoParquet bbox covering columns #2938, Implicit Catalyst cast: geometry -> Box2D #2927, Spatial join planner: recognize ST_BoxIntersects / ST_BoxContains as join predicates #2939).What changes were proposed in this PR?
Documents the planar
Box2DUDT and its full function surface, structured as a sibling of the existing Geometry and Geography type docs.New
docs/api/sql/box2d/folderA master page plus per-function pages mirroring the layout of
docs/api/sql/geography/:Box2D-Functions.md— type intro, semantic notes (closed-interval, NULL-for-absent, inverted-bounds reservation), constructor / accessor / predicate / function tables, type-conversion section covering bothCASTand the function-form aliases, and a query-optimization summary linking out toOptimizer.md.Box2D-Constructors/—ST_Box2D,ST_MakeBox2D,ST_GeomFromBox2D.Box2D-Accessors/—ST_XMin,ST_YMin,ST_XMax,ST_YMax(Box2D-input variants of the same functions documented forGeometryunderBounding-Box-Functions/).Box2D-Predicates/—ST_BoxIntersects,ST_BoxContains.Box2D-Functions/—ST_Expand(Box2D overloads) andST_AsText(PostGIS-compatibleBOX(...)form).Optimizer page additions
Two new sections in
docs/api/sql/Optimizer.md:ST_BoxIntersects/ST_BoxContainson a Parquet-backed Box2D column (the path that landed in [GH-2938] Push down ST_BoxIntersects / ST_BoxContains via Parquet row-group statistics #2946). Includes the per-predicate inequality table.ST_BoxIntersects/ST_BoxContainsreuse the existing range and broadcast-index join operators (Spatial join planner: recognize ST_BoxIntersects / ST_BoxContains as join predicates #2939). Notes theCOVERS-not-CONTAINSmapping for closed-interval semantics.Navigation
Added a single line in
mkdocs.ymlplacingBox2D FunctionsbetweenGeometry FunctionsandGeography Functionsunder "Vector data".Out of scope
.zh.md) translations — these will be picked up by the existing i18n epic (Add Chinese version of the documentation #2867) on its normal cadence.How was this patch tested?
mkdocs build --strictlocally produces all eleven Box2D pages and adds them to the SQL nav. The remaining strict-mode warnings are pre-existing and unrelated (api/rdocs/api/pydocs/scaladocare generated build artifacts not present in a clean checkout; theOptimizer.zh.mdanchor info messages come from the i18n fallback for sections that don't have Chinese translations yet).Box2D-Functions.md↔ per-function pages ↔Optimizer.md↔ existing Geometry pages) all resolve.Did this PR include necessary documentation updates?