Skip to content

[GH-2857] Fix ST_S2CellIDs under-covering planar polygons along non-meridional edges#2863

Merged
jiayuasu merged 9 commits into
apache:masterfrom
jiayuasu:fix/s2-cellids-coverage
Apr 27, 2026
Merged

[GH-2857] Fix ST_S2CellIDs under-covering planar polygons along non-meridional edges#2863
jiayuasu merged 9 commits into
apache:masterfrom
jiayuasu:fix/s2-cellids-coverage

Conversation

@jiayuasu

@jiayuasu jiayuasu commented Apr 26, 2026

Copy link
Copy Markdown
Member

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

ST_S2CellIDs returned a covering that under-covered the planar input polygon along long non-meridional edges (~0.92% of the reporter's polygon at level 12). Sedona geometry-type objects are planar — an edge between two vertices is a straight line in (longitude, latitude) space — but S2 connects the same vertices with great-circle arcs on the sphere. The two interpretations agree at the vertices but diverge along the edges (a great-circle arc between two points at the same northern latitude bulges poleward of the planar chord). Handing JTS vertices to S2 directly produces a spherical polygon whose interior is smaller than the planar polygon's, and the S2 covering of that smaller polygon misses thin slivers of the original input.

This PR JTS-buffers the input by an upper bound on the worst-case great-circle/chord deviation before converting to S2:

  • Polygon: per-edge midpoint deviation across exterior + interior rings. Buffering offsets edges in place, so per-edge analysis is tight.
  • LineString: bbox-corner pairs (SW–NE diagonal, NW–SE diagonal, worst-case east-west). Buffering turns the line into a polygon corridor whose long edges span the line's full envelope (especially for collinear segments JTS simplifies away), which per-segment analysis under-bounds. Diagonal great-circle arcs deviate more than east-west arcs at high latitudes, so we probe both.
  • A 1.5× safety multiplier absorbs numerical error and the small additional deviation the buffered polygon's own (slightly different) edges introduce.
  • Antimeridian-crossing inputs (envelope width > 180°) skip the buffer because JTS planar buffer doesn't understand antimeridian crossing and would produce a polygon going the wrong way around the globe; their edges are also short under the correct interpretation, so the original miscoverage doesn't apply.

Multi-geometries are decomposed by GeomUtils.extractGeometryCollection upstream and each leaf is buffered/covered independently. When a JTS buffer collapses a self-touching polygon to a MultiPolygon, every component's loops are kept in a single S2Polygon so the containment guarantee holds for every shell.

The conceptual diagram (also added to the function's docs) shows why a JTS polygon and the S2 covering of it can disagree, and how the buffer ensures the buffered polygon's spherical interpretation still contains the original planar one:

Planar polygon vs S2's great-circle interpretation

The empirical plot below renders the cell coverage of the reporter's polygon at level 12. Blue: union of returned cell geometries. Dark blue: input outline. Red: planar input area not covered.

Cell coverage before and after the fix

Status of the reported bug

Metric Before (master) After (this PR)
Original polygon area 22.1941 deg² 22.1941 deg²
Cells returned at level 12 49,086 53,192 (+8.4%)
Uncovered area 0.20349708 deg² 0.0
Uncovered fraction 0.916897% 0.000000%
coverage.covers(input) false true

Side effect for LineString

After the buffer, a line input becomes a polygon corridor before S2 sees it. Returned cells therefore cover a thin strip around the line rather than only cells the line geometrically traverses. This is documented in the new "Planar input, spherical cells" note in the SQL / Flink / Snowflake ST_S2CellIDs pages. Use a sufficiently fine level to keep the corridor narrow.

How was this patch tested?

mvn -pl common test -Dtest=FunctionsTest — 285 tests, all green.

New test cases in common/src/test/java/org/apache/sedona/common/FunctionsTest.java:

  • testS2CoverageContainsInput — the reporter's exact polygon at level 12; reproduces 0.20349708 deg² uncovered on master, passes after the fix. The expensive union/difference only runs on failure to surface a diagnostic.
  • testS2CoverageContainsLineString — long east-west line at mid-latitude.
  • testS2CoverageContainsMultiPolygon — disjoint polygons, decomposition + per-leaf buffer.
  • testS2CoverageContainsMultiLineString — north / south / diagonal multi-segment lines.
  • testS2CoverageContainsGeometryCollection — heterogeneous Polygon + LineString.
  • testS2CoverageContainsPolygonWithHole — exterior + interior ring deviations.
  • testS2CoverageContainsHighLatitudePolygon — 80°N polygon with 60° east-west edges.
  • testS2CoverageContainsAntimeridianLine — line crossing the antimeridian; locks in correct chord-midpoint handling and a sane cell count.

Existing tests (including S2CellIDs spatial-join correctness, MBR equivalence, single-cell polygons, lines, points) continue to pass.

Did this PR include necessary documentation updates?

  • Yes, I have updated the documentation. Added a "Planar input, spherical cells" note plus a conceptual SVG and an empirical PNG (docs/image/ST_S2CellIDs/planar_vs_spherical_edge.svg and docs/image/ST_S2CellIDs/coverage_before_after.png) to ST_S2CellIDs.md for SQL, Flink, and Snowflake explaining the planar-vs-spherical mismatch, when no extra cells are added (meridional/equator-aligned edges), and the LineString corridor consequence. No public API surface changed.

@jiayuasu jiayuasu added this to the sedona-1.9.1 milestone Apr 26, 2026
@jiayuasu jiayuasu requested a review from Copilot April 26, 2026 22:43
@jiayuasu jiayuasu force-pushed the fix/s2-cellids-coverage branch from b16d7f6 to dd4aa50 Compare April 26, 2026 22:44

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes ST_S2CellIDs under-covering planar polygons by compensating for the planar (JTS) vs spherical (S2 great-circle) edge interpretation mismatch, ensuring the returned S2 covering contains the original input geometry.

Changes:

  • Add a great-circle vs chord deviation bound and JTS buffer step before converting geometries to S2 regions.
  • Add unit tests asserting S2-cell-derived coverage contains various geometry types (polygon, line, multi-*, holes, high-latitude).
  • Update SQL/Flink/Snowflake docs with a “Planar input, spherical cells” note and add a before/after visualization image.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
common/src/main/java/org/apache/sedona/common/utils/S2Utils.java Buffers input geometry by a computed deviation bound before creating an S2Region.
common/src/test/java/org/apache/sedona/common/FunctionsTest.java Adds coverage-containment regression tests for GH-2857 across geometry types.
docs/api/sql/Spatial-Indexing/ST_S2CellIDs.md Documents planar-vs-spherical behavior and LineString corridor side effect.
docs/api/flink/Spatial-Indexing/ST_S2CellIDs.md Same documentation note for Flink API.
docs/api/snowflake/vector-data/Spatial-Indexing/ST_S2CellIDs.md Same documentation note for Snowflake API.
docs/image/GH-2857/coverage_before_after.png Adds visual illustration of before/after coverage for GH-2857.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread common/src/test/java/org/apache/sedona/common/FunctionsTest.java Outdated
Comment thread common/src/test/java/org/apache/sedona/common/FunctionsTest.java Outdated
Comment thread common/src/main/java/org/apache/sedona/common/utils/S2Utils.java
Comment thread common/src/main/java/org/apache/sedona/common/utils/S2Utils.java Outdated
… non-meridional edges

Sedona geometry-type objects are planar (edges are straight lines in
(lng, lat) space) but S2 connects the same vertices with great-circle
arcs. Along non-meridional edges the great-circle arc bulges poleward of
the planar chord, so handing JTS vertices to S2 directly produces a
spherical polygon whose interior is smaller than the planar polygon's.
The S2 covering of that smaller polygon misses thin slivers of the
original planar input — about 0.92% of the reporter's polygon at
level 12.

Compensate by JTS-buffering the input by an upper bound on the
great-circle/chord deviation before converting to S2:

- Polygon: per-edge midpoint deviation across exterior + interior rings
  (buffer offsets edges in place, so per-edge is tight).
- LineString: bbox-corner pairs (SW-NE diagonal, NW-SE diagonal,
  worst-case east-west). The buffer turns the line into a polygon
  corridor whose long edges span the line's full envelope, which
  per-segment analysis under-bounds.

Adds tests covering the reporter's polygon, lines, multi-geometries,
geometry collections, polygons with holes, and high-latitude polygons.
Updates SQL/Flink/Snowflake ST_S2CellIDs docs with a note explaining
the planar-vs-spherical mismatch and the LineString corridor side
effect.

Closes apache#2857
@jiayuasu jiayuasu force-pushed the fix/s2-cellids-coverage branch from dd4aa50 to 9850086 Compare April 26, 2026 22:48
- edgeDeviationDegrees: compute chord midpoint via wrapped longitude delta
  so antimeridian-spanning edges (e.g. 179° → -179°) don't yield a bogus
  ~180° deviation and the buffer doesn't explode.
- toS2Region: skip the JTS buffer entirely when the input's envelope spans
  >180° in longitude. JTS planar buffer doesn't understand antimeridian
  crossing and would produce a polygon that goes the wrong way around the
  globe; antimeridian-crossing inputs don't suffer the original
  miscoverage anyway since their edges are short.
- envelopeDeviationDegrees: defense-in-depth fallback to per-segment
  ringMaxDeviationDegrees when env.getWidth() > 180.
- toS2Region: when JTS buffer collapses to MultiPolygon (rare, only for
  self-touching inputs), build a single S2Polygon containing every
  component's loops instead of dropping smaller pieces — keeps the
  containment guarantee for every shell.
- testS2CoverageContainsInput: only compute the expensive ~50k-cell
  difference on failure to surface a useful diagnostic; happy path skips
  the union/difference work.
- testS2CoverageContainsHighLatitudePolygon: comment now matches the
  actual implementation (no rigid cap, no latitude clamp).
- New test testS2CoverageContainsAntimeridianLine locks in correct
  behavior and a sane cell count for antimeridian-crossing input.
Reference both the conceptual SVG (explains why) and the empirical PNG
(shows the actual cell coverage on a real polygon at level 12) from
the SQL/Flink/Snowflake doc pages.
Re-render the before/after PNG with panel titles "Without buffer" /
"With JTS buffer (current behavior)" instead of "Before fix" /
"After fix (this PR)", and rephrase the surrounding doc captions to
match. Image and text are now ongoing function documentation, not
PR-specific artifacts.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread common/src/test/java/org/apache/sedona/common/FunctionsTest.java Outdated
Comment thread common/src/main/java/org/apache/sedona/common/utils/S2Utils.java Outdated
Comment thread common/src/test/java/org/apache/sedona/common/FunctionsTest.java
The 'Run manual hooks' CI job runs oxipng on PNGs and fails when files
aren't already optimized. Saves 20.9 KiB (24.3%).
…omment

- Replace envelope-width antimeridian heuristic with per-edge detection
  (any edge with |Δlng| > 180°). The envelope-width version also fired
  for wide non-straddling polygons (e.g. spanning -100° to +100°),
  which would have skipped the buffer and reintroduced the miscoverage
  on their long east-west edges. Add test
  testS2CoverageContainsWideNonAntimeridianPolygon to lock that in.
- Tighten the lazy-diff comment in testS2CoverageContainsInput: the
  union itself is unavoidable for exact containment, only the
  difference is gated on failure.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread common/src/main/java/org/apache/sedona/common/utils/S2Utils.java
Drop the redundant 'LinearRing' from the rejected-type list since
LinearRing extends LineString, so 'instanceof LineString' already
matches. Add a parenthetical so callers passing LinearRing know it's
supported via that branch.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread common/src/test/java/org/apache/sedona/common/FunctionsTest.java Outdated
Comment thread common/src/main/java/org/apache/sedona/common/utils/S2Utils.java Outdated
Comment thread common/src/main/java/org/apache/sedona/common/utils/S2Utils.java Outdated
…omments

- Tessellate the wide non-antimeridian polygon test with intermediate
  vertices ((-100 30, -10 30, 100 30, 100 50, 10 50, -100 50, -100 30))
  so every edge has |Δlng| < 180°. The previous WKT had edges of
  Δlng=200° that the per-edge antimeridian check would (correctly)
  flag as crossing — so the test was passing for the wrong reason
  (buffer skipped) instead of exercising the intended wide-but-not-
  crossing path (buffer applied).
- Reword the per-edge-vs-envelope-width comment in toS2Region to drop
  the misleading example (a coarse -100°→+100° polygon is classified
  as crossing under both heuristics; only tessellated wide polygons
  differ).
- Update edgeDeviationDegrees Javadoc to describe the wrapped-delta
  longitude midpoint logic instead of the stale 'plain Euclidean mean'
  description.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread common/src/main/java/org/apache/sedona/common/utils/S2Utils.java Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread common/src/main/java/org/apache/sedona/common/utils/S2Utils.java Outdated
- crossesAntimeridian Javadoc: drop misleading -100°→+100° example,
  describe the actual differentiating case (tessellated wide geometries
  whose envelope > 180° but every edge < 180°). Mirrors the comment
  cleanup already applied in toS2Region.
- edgeDeviationDegrees: throw IllegalArgumentException for antipodal
  (or near-antipodal) endpoints instead of silently returning 0. The
  great circle through antipodal points is not unique, so a 0
  deviation underestimates the buffer and could break the toS2Region
  containment guarantee. Functions.s2CellIDs already wraps toS2Region
  in try/catch (IllegalArgumentException) with a per-coordinate cell
  fallback, so callers degrade gracefully.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 7 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/api/snowflake/vector-data/Spatial-Indexing/ST_S2CellIDs.md
Comment thread common/src/main/java/org/apache/sedona/common/utils/S2Utils.java
Comment thread common/src/main/java/org/apache/sedona/common/utils/S2Utils.java
Comment thread docs/api/sql/Spatial-Indexing/ST_S2CellIDs.md
Comment thread docs/api/flink/Spatial-Indexing/ST_S2CellIDs.md
@jiayuasu jiayuasu merged commit b1fd9fc into apache:master Apr 27, 2026
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: ST_S2CellIDs + ST_S2ToGeom does not cover the source polygon

2 participants