Skip to content

[DOCS] Add Box2D SQL documentation#2966

Open
jiayuasu wants to merge 2 commits into
apache:masterfrom
jiayuasu:feature/box2d-docs
Open

[DOCS] Add Box2D SQL documentation#2966
jiayuasu wants to merge 2 commits into
apache:masterfrom
jiayuasu:feature/box2d-docs

Conversation

@jiayuasu
Copy link
Copy Markdown
Member

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Documents the planar Box2D UDT and its full function surface, structured as a sibling of the existing Geometry and Geography type docs.

New docs/api/sql/box2d/ folder

A master page plus per-function pages mirroring the layout of docs/api/sql/geography/:

  • Box2D-Functions.md — type intro, semantic notes (closed-interval, NULL-for-absent, inverted-bounds reservation), constructor / accessor / predicate / function tables, type-conversion section covering both CAST and the function-form aliases, and a query-optimization summary linking out to Optimizer.md.
  • Box2D-Constructors/ST_Box2D, ST_MakeBox2D, ST_GeomFromBox2D.
  • Box2D-Accessors/ST_XMin, ST_YMin, ST_XMax, ST_YMax (Box2D-input variants of the same functions documented for Geometry under Bounding-Box-Functions/).
  • Box2D-Predicates/ST_BoxIntersects, ST_BoxContains.
  • Box2D-Functions/ST_Expand (Box2D overloads) and ST_AsText (PostGIS-compatible BOX(...) form).

Optimizer page additions

Two new sections in docs/api/sql/Optimizer.md:

Navigation

Added a single line in mkdocs.yml placing Box2D Functions between Geometry Functions and Geography Functions under "Vector data".

Out of scope

  • Chinese (.zh.md) translations — these will be picked up by the existing i18n epic (Add Chinese version of the documentation #2867) on its normal cadence.
  • GeoParquet tutorial mention of Box2D covering columns — small enough to slip into the tutorial separately if useful.

How was this patch tested?

  • mkdocs build --strict locally produces all eleven Box2D pages and adds them to the SQL nav. The remaining strict-mode warnings are pre-existing and unrelated (api/rdocs / api/pydocs / scaladoc are generated build artifacts not present in a clean checkout; the Optimizer.zh.md anchor info messages come from the i18n fallback for sections that don't have Chinese translations yet).
  • Cross-references between pages (Box2D-Functions.md ↔ per-function pages ↔ Optimizer.md ↔ existing Geometry pages) all resolve.

Did this PR include necessary documentation updates?

  • Yes, this PR itself is the documentation update.

Document the planar Box2D UDT and its function surface as a sibling of
the Geometry and Geography type docs.

- New `docs/api/sql/box2d/` folder with:
    - `Box2D-Functions.md` master page (introduction, semantic notes,
      constructor/accessor/predicate/function tables, type-conversion
      via CAST and function forms, and a query-optimization summary
      linking to Optimizer.md).
    - Per-function pages for constructors (ST_Box2D, ST_MakeBox2D,
      ST_GeomFromBox2D), accessors (ST_XMin/YMin/XMax/YMax Box2D
      variants), predicates (ST_BoxIntersects, ST_BoxContains), and
      Box2D variants of ST_Expand and ST_AsText.
- Optimizer page additions:
    - `## Box2D filter pushdown` describing the row-group inequality
      translation for Box2D columns (apache#2946).
    - `## Box2D spatial join` describing the rectangle-polygon
      materialisation at the join boundary that lets ST_BoxIntersects /
      ST_BoxContains reuse the existing range and broadcast-index join
      operators (apache#2939).
- `mkdocs.yml`: added "Box2D Functions" nav entry between Geometry and
  Geography under "Vector data".

Local `mkdocs build --strict` produces all eleven Box2D pages cleanly;
the remaining warnings are pre-existing and unrelated. Chinese (.zh.md)
translations are intentionally out of scope and will be picked up by
the existing i18n epic.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds SQL documentation for Sedona’s Box2D type, covering its constructors, accessors, predicates, scalar functions, optimizer behavior, and navigation entry.

Changes:

  • Adds a new docs/api/sql/box2d/ documentation section for Box2D APIs.
  • Extends optimizer docs with Box2D filter pushdown and spatial join behavior.
  • Adds Box2D to the SQL Vector data navigation.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
mkdocs.yml Adds Box2D Functions to SQL navigation.
docs/api/sql/Optimizer.md Documents Box2D pushdown and join optimization.
docs/api/sql/box2d/Box2D-Functions.md Adds Box2D overview and function tables.
docs/api/sql/box2d/Box2D-Constructors/ST_Box2D.md Documents ST_Box2D.
docs/api/sql/box2d/Box2D-Constructors/ST_MakeBox2D.md Documents ST_MakeBox2D.
docs/api/sql/box2d/Box2D-Constructors/ST_GeomFromBox2D.md Documents ST_GeomFromBox2D.
docs/api/sql/box2d/Box2D-Accessors/ST_XMin.md Documents Box2D ST_XMin.
docs/api/sql/box2d/Box2D-Accessors/ST_XMax.md Documents Box2D ST_XMax.
docs/api/sql/box2d/Box2D-Accessors/ST_YMin.md Documents Box2D ST_YMin.
docs/api/sql/box2d/Box2D-Accessors/ST_YMax.md Documents Box2D ST_YMax.
docs/api/sql/box2d/Box2D-Predicates/ST_BoxIntersects.md Documents ST_BoxIntersects.
docs/api/sql/box2d/Box2D-Predicates/ST_BoxContains.md Documents ST_BoxContains.
docs/api/sql/box2d/Box2D-Functions/ST_Expand.md Documents Box2D ST_Expand.
docs/api/sql/box2d/Box2D-Functions/ST_AsText.md Documents Box2D ST_AsText.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/api/sql/Optimizer.md Outdated

When a query filters on a `Box2D` column (see [Box2D Functions](box2d/Box2D-Functions.md)) using `ST_BoxIntersects` or `ST_BoxContains` against a literal `Box2D`, Sedona translates the predicate into Parquet row-group inequalities on the column's underlying `xmin` / `ymin` / `xmax` / `ymax` leaves and pushes them down via `ParquetInputFormat.setFilterPredicate`. Parquet's row-group statistics machinery then skips row groups whose recorded min/max disprove the predicate — no file metadata scan is required.

This works for any writer that produces a `Box2D` column (including the `<geom>_bbox` covering column auto-generated by Sedona when writing GeoParquet 1.1), because the pruning operates on the actual stored values' statistics rather than on a separate geometry-column bbox.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3ef5ef5 — rewrote the paragraph to clarify that the pushdown applies to Box2DUDT-typed columns (obtained via ST_Box2D(geom) or the SQL cast). The auto-generated _bbox column is a plain struct<xmin,ymin,xmax,ymax>; it satisfies the GeoParquet covering contract but is not a Box2D, so these predicates don't target it directly — users on that column rely on the existing file-metadata pushdown described in the previous section.

Comment thread docs/api/sql/Optimizer.md Outdated
| `ST_BoxContains(box_col, lit)` | `box.xmin <= lit.xmin AND box.xmax >= lit.xmax AND box.ymin <= lit.ymin AND box.ymax >= lit.ymax` |
| `ST_BoxContains(lit, box_col)` | `box.xmin >= lit.xmin AND box.xmax <= lit.xmax AND box.ymin >= lit.ymin AND box.ymax <= lit.ymax` |

Pushdown is enabled by default; it is gated by the same Spark setting as ordinary Parquet predicate pushdown (`spark.sql.parquet.filterPushdown`). Inverted-bound literals (`xmin > xmax` / `ymin > ymax`) are not pushed down — the predicate falls back to per-row evaluation so callers see the expected `IllegalArgumentException` from the scalar contract.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3ef5ef5 — documented both flags: spark.sedona.geoparquet.spatialFilterPushDown gates rule attachment, spark.sql.parquet.filterPushdown gates Parquet honouring it. Disabling either disables Box2D pushdown.

Comment thread docs/api/sql/box2d/Box2D-Functions.md Outdated
- `Box2D` values use closed-interval semantics: edge-touching boxes are considered intersecting and (per [ST_BoxContains](Box2D-Predicates/ST_BoxContains.md)) contained.
- Absence is represented by SQL `NULL` rather than an in-band sentinel.
- Bounds are required to be ordered (`xmin <= xmax`, `ymin <= ymax`). Inverted-bound values are reserved for a future antimeridian-wraparound semantics on geography bboxes; predicates and join planning throw `IllegalArgumentException` on inverted input today.
- Unlike [ST_Envelope](../Bounding-Box-Functions/ST_Envelope.md), which returns a `Geometry` polygon, [ST_Box2D](Box2D-Constructors/ST_Box2D.md) returns a typed `Box2D` value. Prefer the typed form when downstream code only needs the four bounds, and prefer the polygon when downstream code expects a `Geometry`.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3ef5ef5 — softened the contrast: ST_Envelope returns the envelope as a Geometry (typically a polygon, but Point/LineString for degenerate inputs), while ST_Box2D always returns a typed Box2D.


Introduction: Return the planar bounding box of a Geometry as a typed `Box2D` value (four doubles: `xmin`, `ymin`, `xmax`, `ymax`).

`ST_Box2D` is the typed counterpart to [ST_Envelope](../../Bounding-Box-Functions/ST_Envelope.md). `ST_Envelope` returns a `Geometry` polygon; `ST_Box2D` returns a `Box2D` value that serialises to a struct of four non-nullable doubles and round-trips through Parquet without WKB overhead.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3ef5ef5 — same softening on ST_Box2D.md.

| :--- | :--- | :--- | :--- |
| [ST_Expand](Box2D-Functions/ST_Expand.md) | Box2D | Expand a Box2D by a per-axis or uniform delta. | v1.9.1 |
| [ST_AsText](Box2D-Functions/ST_AsText.md) | String | Return the `BOX(xmin ymin, xmax ymax)` text representation of a Box2D. | v1.9.1 |

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 3ef5ef5: new docs/api/sql/Aggregate-Functions/ST_Extent.md (the function is already SQL-registered via Catalog.aggregateExpressions; only the docs page was missing). Cross-referenced from Geometry-Functions.md's Aggregate Functions table (Geometry input) and from a new "Box2D Aggregates" subsection on Box2D-Functions.md (Box2D output).

Comment thread mkdocs.yml
- Quick start: api/sql/Overview.md
- Vector data:
- Geometry Functions: api/sql/Geometry-Functions.md
- Box2D Functions: api/sql/box2d/Box2D-Functions.md
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3ef5ef5 — added "Box2D Functions: Box2D 函数" under nav_translations alongside the Geometry / Geography entries.

- Correct the Box2D filter pushdown source-of-truth. The auto-generated
  `<geom>_bbox` covering column is written as a plain
  `struct<xmin,ymin,xmax,ymax>`, not as Box2DUDT, so ST_BoxIntersects /
  ST_BoxContains do not target it directly. Rewrote the paragraph to
  point users at `ST_Box2D(geom)` (or the SQL cast) for row-group-level
  pushdown, and at the existing file-metadata pushdown for the
  auto-generated column.
- Document the dual-flag gating: `spark.sedona.geoparquet.spatialFilterPushDown`
  controls whether the rule injects the spatial predicate at all;
  `spark.sql.parquet.filterPushdown` controls whether Parquet honours
  it. Disabling either disables Box2D pushdown.
- Soften the ST_Envelope-vs-ST_Box2D contrast on both Box2D-Functions.md
  and ST_Box2D.md — JTS `Geometry.getEnvelope()` returns Point or
  LineString for degenerate inputs, not always a polygon.
- Add an ST_Extent aggregate page under `Aggregate-Functions/` (the
  function is already SQL-registered via `Catalog.aggregateExpressions`;
  the docs page was just missing) and reference it from:
    * `Geometry-Functions.md` Aggregate Functions table (Geometry input)
    * `Box2D-Functions.md` new "Box2D Aggregates" subsection (Box2D
      output, so the page is discoverable from the Box2D reference too).
- Localise the new "Box2D Functions" nav label under
  `nav_translations` so the Chinese build doesn't show an
  English-only outlier between the localised Geometry / Geography
  entries.

Verified locally: `mkdocs build --strict` produces the ST_Extent page
and the updated Box2D pages cleanly; remaining warnings are pre-existing
and unrelated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants