[GH-2857] Fix ST_S2CellIDs under-covering planar polygons along non-meridional edges#2863
Conversation
b16d7f6 to
dd4aa50
Compare
There was a problem hiding this comment.
Pull request overview
Fixes ST_S2CellIDs under-covering planar polygons by compensating for the planar (JTS) vs spherical (S2 great-circle) edge interpretation mismatch, ensuring the returned S2 covering contains the original input geometry.
Changes:
- Add a great-circle vs chord deviation bound and JTS buffer step before converting geometries to S2 regions.
- Add unit tests asserting S2-cell-derived coverage contains various geometry types (polygon, line, multi-*, holes, high-latitude).
- Update SQL/Flink/Snowflake docs with a “Planar input, spherical cells” note and add a before/after visualization image.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
common/src/main/java/org/apache/sedona/common/utils/S2Utils.java |
Buffers input geometry by a computed deviation bound before creating an S2Region. |
common/src/test/java/org/apache/sedona/common/FunctionsTest.java |
Adds coverage-containment regression tests for GH-2857 across geometry types. |
docs/api/sql/Spatial-Indexing/ST_S2CellIDs.md |
Documents planar-vs-spherical behavior and LineString corridor side effect. |
docs/api/flink/Spatial-Indexing/ST_S2CellIDs.md |
Same documentation note for Flink API. |
docs/api/snowflake/vector-data/Spatial-Indexing/ST_S2CellIDs.md |
Same documentation note for Snowflake API. |
docs/image/GH-2857/coverage_before_after.png |
Adds visual illustration of before/after coverage for GH-2857. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… non-meridional edges Sedona geometry-type objects are planar (edges are straight lines in (lng, lat) space) but S2 connects the same vertices with great-circle arcs. Along non-meridional edges the great-circle arc bulges poleward of the planar chord, so handing JTS vertices to S2 directly produces a spherical polygon whose interior is smaller than the planar polygon's. The S2 covering of that smaller polygon misses thin slivers of the original planar input — about 0.92% of the reporter's polygon at level 12. Compensate by JTS-buffering the input by an upper bound on the great-circle/chord deviation before converting to S2: - Polygon: per-edge midpoint deviation across exterior + interior rings (buffer offsets edges in place, so per-edge is tight). - LineString: bbox-corner pairs (SW-NE diagonal, NW-SE diagonal, worst-case east-west). The buffer turns the line into a polygon corridor whose long edges span the line's full envelope, which per-segment analysis under-bounds. Adds tests covering the reporter's polygon, lines, multi-geometries, geometry collections, polygons with holes, and high-latitude polygons. Updates SQL/Flink/Snowflake ST_S2CellIDs docs with a note explaining the planar-vs-spherical mismatch and the LineString corridor side effect. Closes apache#2857
dd4aa50 to
9850086
Compare
- edgeDeviationDegrees: compute chord midpoint via wrapped longitude delta so antimeridian-spanning edges (e.g. 179° → -179°) don't yield a bogus ~180° deviation and the buffer doesn't explode. - toS2Region: skip the JTS buffer entirely when the input's envelope spans >180° in longitude. JTS planar buffer doesn't understand antimeridian crossing and would produce a polygon that goes the wrong way around the globe; antimeridian-crossing inputs don't suffer the original miscoverage anyway since their edges are short. - envelopeDeviationDegrees: defense-in-depth fallback to per-segment ringMaxDeviationDegrees when env.getWidth() > 180. - toS2Region: when JTS buffer collapses to MultiPolygon (rare, only for self-touching inputs), build a single S2Polygon containing every component's loops instead of dropping smaller pieces — keeps the containment guarantee for every shell. - testS2CoverageContainsInput: only compute the expensive ~50k-cell difference on failure to surface a useful diagnostic; happy path skips the union/difference work. - testS2CoverageContainsHighLatitudePolygon: comment now matches the actual implementation (no rigid cap, no latitude clamp). - New test testS2CoverageContainsAntimeridianLine locks in correct behavior and a sane cell count for antimeridian-crossing input.
Reference both the conceptual SVG (explains why) and the empirical PNG (shows the actual cell coverage on a real polygon at level 12) from the SQL/Flink/Snowflake doc pages.
Re-render the before/after PNG with panel titles "Without buffer" / "With JTS buffer (current behavior)" instead of "Before fix" / "After fix (this PR)", and rephrase the surrounding doc captions to match. Image and text are now ongoing function documentation, not PR-specific artifacts.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The 'Run manual hooks' CI job runs oxipng on PNGs and fails when files aren't already optimized. Saves 20.9 KiB (24.3%).
…omment - Replace envelope-width antimeridian heuristic with per-edge detection (any edge with |Δlng| > 180°). The envelope-width version also fired for wide non-straddling polygons (e.g. spanning -100° to +100°), which would have skipped the buffer and reintroduced the miscoverage on their long east-west edges. Add test testS2CoverageContainsWideNonAntimeridianPolygon to lock that in. - Tighten the lazy-diff comment in testS2CoverageContainsInput: the union itself is unavoidable for exact containment, only the difference is gated on failure.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Drop the redundant 'LinearRing' from the rejected-type list since LinearRing extends LineString, so 'instanceof LineString' already matches. Add a parenthetical so callers passing LinearRing know it's supported via that branch.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…omments - Tessellate the wide non-antimeridian polygon test with intermediate vertices ((-100 30, -10 30, 100 30, 100 50, 10 50, -100 50, -100 30)) so every edge has |Δlng| < 180°. The previous WKT had edges of Δlng=200° that the per-edge antimeridian check would (correctly) flag as crossing — so the test was passing for the wrong reason (buffer skipped) instead of exercising the intended wide-but-not- crossing path (buffer applied). - Reword the per-edge-vs-envelope-width comment in toS2Region to drop the misleading example (a coarse -100°→+100° polygon is classified as crossing under both heuristics; only tessellated wide polygons differ). - Update edgeDeviationDegrees Javadoc to describe the wrapped-delta longitude midpoint logic instead of the stale 'plain Euclidean mean' description.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- crossesAntimeridian Javadoc: drop misleading -100°→+100° example, describe the actual differentiating case (tessellated wide geometries whose envelope > 180° but every edge < 180°). Mirrors the comment cleanup already applied in toS2Region. - edgeDeviationDegrees: throw IllegalArgumentException for antipodal (or near-antipodal) endpoints instead of silently returning 0. The great circle through antipodal points is not unique, so a 0 deviation underestimates the buffer and could break the toS2Region containment guarantee. Functions.s2CellIDs already wraps toS2Region in try/catch (IllegalArgumentException) with a per-coordinate cell fallback, so callers degrade gracefully.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 7 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes [BUG]: ST_S2CellIDs + ST_S2ToGeom does not cover the source polygon #2857What changes were proposed in this PR?
ST_S2CellIDsreturned a covering that under-covered the planar input polygon along long non-meridional edges (~0.92% of the reporter's polygon at level 12). Sedona geometry-type objects are planar — an edge between two vertices is a straight line in(longitude, latitude)space — but S2 connects the same vertices with great-circle arcs on the sphere. The two interpretations agree at the vertices but diverge along the edges (a great-circle arc between two points at the same northern latitude bulges poleward of the planar chord). Handing JTS vertices to S2 directly produces a spherical polygon whose interior is smaller than the planar polygon's, and the S2 covering of that smaller polygon misses thin slivers of the original input.This PR JTS-buffers the input by an upper bound on the worst-case great-circle/chord deviation before converting to S2:
Multi-geometries are decomposed by
GeomUtils.extractGeometryCollectionupstream and each leaf is buffered/covered independently. When a JTS buffer collapses a self-touching polygon to aMultiPolygon, every component's loops are kept in a singleS2Polygonso the containment guarantee holds for every shell.The conceptual diagram (also added to the function's docs) shows why a JTS polygon and the S2 covering of it can disagree, and how the buffer ensures the buffered polygon's spherical interpretation still contains the original planar one:
The empirical plot below renders the cell coverage of the reporter's polygon at level 12. Blue: union of returned cell geometries. Dark blue: input outline. Red: planar input area not covered.
Status of the reported bug
coverage.covers(input)Side effect for
LineStringAfter the buffer, a line input becomes a polygon corridor before S2 sees it. Returned cells therefore cover a thin strip around the line rather than only cells the line geometrically traverses. This is documented in the new "Planar input, spherical cells" note in the SQL / Flink / Snowflake
ST_S2CellIDspages. Use a sufficiently finelevelto keep the corridor narrow.How was this patch tested?
mvn -pl common test -Dtest=FunctionsTest— 285 tests, all green.New test cases in
common/src/test/java/org/apache/sedona/common/FunctionsTest.java:testS2CoverageContainsInput— the reporter's exact polygon at level 12; reproduces0.20349708 deg² uncoveredon master, passes after the fix. The expensiveunion/differenceonly runs on failure to surface a diagnostic.testS2CoverageContainsLineString— long east-west line at mid-latitude.testS2CoverageContainsMultiPolygon— disjoint polygons, decomposition + per-leaf buffer.testS2CoverageContainsMultiLineString— north / south / diagonal multi-segment lines.testS2CoverageContainsGeometryCollection— heterogeneous Polygon + LineString.testS2CoverageContainsPolygonWithHole— exterior + interior ring deviations.testS2CoverageContainsHighLatitudePolygon— 80°N polygon with 60° east-west edges.testS2CoverageContainsAntimeridianLine— line crossing the antimeridian; locks in correct chord-midpoint handling and a sane cell count.Existing tests (including
S2CellIDsspatial-join correctness, MBR equivalence, single-cell polygons, lines, points) continue to pass.Did this PR include necessary documentation updates?
docs/image/ST_S2CellIDs/planar_vs_spherical_edge.svganddocs/image/ST_S2CellIDs/coverage_before_after.png) toST_S2CellIDs.mdfor SQL, Flink, and Snowflake explaining the planar-vs-spherical mismatch, when no extra cells are added (meridional/equator-aligned edges), and the LineString corridor consequence. No public API surface changed.