Skip to content

Commit 20b64fd

Browse files
committed
[GH-2809] Support distance joins for raster predicates
Add `RS_DWithin(raster|geom, raster|geom, distance)` so distance joins can use raster operands, and route the join planner through the existing spatial-index machinery. - `RS_DWithin` expression in `RasterPredicates.scala`, backed by new `RasterPredicates.rsDWithin` overloads (raster-geom, raster-raster) that reuse `convertCRSIfNeeded` and JTS `isWithinDistance`. - `JoinQueryDetector` and `OptimizableJoinCondition` recognise `RS_DWithin` as a distance-join predicate; the relationship label collapses to `RS_DWithin` for all raster + distance cases. - `BroadcastIndexJoinExec.createStreamShapes` and the new `TraitJoinQueryBase.toExpandedWGS84EnvelopeRDD` handle the raster stream and build sides for broadcast-index joins; `SpatialIndexExec` and `DistanceJoinExec` route to the same helper so non-broadcast distance joins work too. - Drop the placeholder `UnsupportedOperationException` guards for distance + raster combinations; geography + raster + distance remains guarded since the geography refiner does not handle raster shapes. Tests - `BroadcastIndexJoinSuite`: `RS_DWithin` covers stream-raster / broadcast-raster / swapped-operand forms. - `RasterJoinSuite`: new `RS_DWithin distance join` describe block covers `DistanceJoinExec` with both partition-side configs, swapped operands, and raster-raster. Docs - New `docs/api/sql/Raster-Predicates/RS_DWithin.md` page. - `Raster-Functions.md` predicate table row. - `Optimizer.md` raster-distance-join subsection.
1 parent 85270fe commit 20b64fd

15 files changed

Lines changed: 690 additions & 13 deletions

File tree

common/src/main/java/org/apache/sedona/common/raster/RasterPredicates.java

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
import java.util.Set;
2222
import org.apache.commons.lang3.tuple.Pair;
2323
import org.apache.sedona.common.FunctionsGeoTools;
24+
import org.apache.sedona.common.S2Geography.WKBGeography;
2425
import org.apache.sedona.common.utils.CachedCRSTransformFinder;
2526
import org.apache.sedona.common.utils.GeomUtils;
2627
import org.geotools.api.referencing.FactoryException;
@@ -34,7 +35,10 @@
3435
import org.geotools.referencing.CRS;
3536
import org.geotools.referencing.crs.DefaultEngineeringCRS;
3637
import org.geotools.referencing.crs.DefaultGeographicCRS;
38+
import org.locationtech.jts.algorithm.Orientation;
3739
import org.locationtech.jts.geom.Geometry;
40+
import org.locationtech.jts.geom.MultiPolygon;
41+
import org.locationtech.jts.geom.Polygon;
3842

3943
public class RasterPredicates {
4044
/**
@@ -82,6 +86,134 @@ public static boolean rsContains(GridCoverage2D left, GridCoverage2D right) {
8286
return leftGeometry.contains(rightGeometry);
8387
}
8488

89+
/**
90+
* Test if a raster is within {@code distance} meters of a geometry. Both shapes are projected to
91+
* WGS84 unconditionally so the distance unit is always meters regardless of input CRS, then the
92+
* minimum spherical distance between the two shapes (NOT centroid-to-centroid — uses S2's {@code
93+
* ClosestEdgeQuery}) is compared against the threshold. Two raster footprints that overlap or
94+
* touch therefore satisfy {@code RS_DWithin(a, b, 0)} just like {@link #rsIntersects}, which
95+
* matches the natural reading of "are these shapes within d meters." The WGS84-only path is also
96+
* what the join planner's R-tree filter assumes — see {@code
97+
* TraitJoinQueryBase.toExpandedWGS84EnvelopeRDD} and the {@code isGeography = true} envelope
98+
* expansion — so the index bound and the per-row predicate share one unit.
99+
*/
100+
public static boolean rsDWithin(GridCoverage2D raster, Geometry geometry, double distance) {
101+
Pair<Geometry, Geometry> geometries = toWGS84Pair(raster, geometry);
102+
return geographyDWithin(geometries.getLeft(), geometries.getRight(), distance);
103+
}
104+
105+
/** Raster-raster variant of {@link #rsDWithin(GridCoverage2D, Geometry, double)}. */
106+
public static boolean rsDWithin(GridCoverage2D left, GridCoverage2D right, double distance) {
107+
Pair<Geometry, Geometry> geometries = toWGS84Pair(left, right);
108+
return geographyDWithin(geometries.getLeft(), geometries.getRight(), distance);
109+
}
110+
111+
/**
112+
* Run {@link org.apache.sedona.common.geography.Functions#dWithin} on two WGS84 JTS geometries by
113+
* wrapping them as {@link WKBGeography}. The Geography path uses S2's {@code ClosestEdgeQuery}
114+
* for the minimum geodesic distance — true overlap/touch returns 0 — so the threshold is
115+
* interpreted strictly as "meters between any two points on the shapes."
116+
*/
117+
private static boolean geographyDWithin(Geometry left, Geometry right, double distance) {
118+
WKBGeography leftGeog = WKBGeography.fromJTS(toS2Ready(left));
119+
WKBGeography rightGeog = WKBGeography.fromJTS(toS2Ready(right));
120+
return org.apache.sedona.common.geography.Functions.dWithin(leftGeog, rightGeog, distance);
121+
}
122+
123+
/**
124+
* Produce a fresh JTS geometry ready for the S2-backed distance query: shell orientation forced
125+
* to CCW (S2's expected interior side) and SRID stamped to 4326 (the caller has already
126+
* reprojected to WGS84 via {@link #toWGS84Pair}). Returns a copy whenever the caller-owned
127+
* geometry would otherwise be mutated, so this helper is side-effect-free with respect to its
128+
* input — important because {@link #rsDWithin} is a public predicate that should not modify
129+
* geometries handed to it.
130+
*/
131+
private static Geometry toS2Ready(Geometry geom) {
132+
Geometry oriented = ensureCcwForS2(geom);
133+
if (oriented == geom) {
134+
// ensureCcwForS2 returned the input unchanged (already CCW or non-polygon); clone before
135+
// touching SRID so we don't write back into the caller's object.
136+
oriented = geom.copy();
137+
}
138+
// S2 treats coordinates as lat/lng regardless of SRID metadata, but we tag the geography as
139+
// EPSG:4326 since both inputs are guaranteed to be in WGS84 here. JTS.transform does not
140+
// propagate SRID, so we set it explicitly on the (now-owned) copy.
141+
oriented.setSRID(4326);
142+
return oriented;
143+
}
144+
145+
/**
146+
* Return a copy of {@code geom} whose polygon shells are CCW (S2's expected orientation).
147+
* Non-polygon geometries are returned unchanged. For MultiPolygons we conservatively check the
148+
* first shell and reverse the whole geometry if it is CW — relying on consistent ring direction
149+
* within a single input, which {@link GeometryFunctions#convexHull} and the JTS transform path
150+
* both produce.
151+
*/
152+
private static Geometry ensureCcwForS2(Geometry geom) {
153+
if (geom instanceof Polygon) {
154+
Polygon p = (Polygon) geom;
155+
if (!Orientation.isCCW(p.getExteriorRing().getCoordinates())) {
156+
return geom.reverse();
157+
}
158+
return geom;
159+
}
160+
if (geom instanceof MultiPolygon) {
161+
Polygon first = (Polygon) geom.getGeometryN(0);
162+
if (!Orientation.isCCW(first.getExteriorRing().getCoordinates())) {
163+
return geom.reverse();
164+
}
165+
}
166+
return geom;
167+
}
168+
169+
private static Pair<Geometry, Geometry> toWGS84Pair(GridCoverage2D raster, Geometry queryWindow) {
170+
Geometry rasterGeometry;
171+
try {
172+
rasterGeometry = GeometryFunctions.convexHull(raster);
173+
} catch (FactoryException | TransformException e) {
174+
throw new RuntimeException("Failed to calculate the convex hull of the raster", e);
175+
}
176+
177+
CoordinateReferenceSystem rasterCRS = raster.getCoordinateReferenceSystem();
178+
if (rasterCRS == null || rasterCRS instanceof DefaultEngineeringCRS) {
179+
rasterCRS = DefaultGeographicCRS.WGS84;
180+
}
181+
182+
int queryWindowSRID = queryWindow.getSRID();
183+
if (queryWindowSRID <= 0) {
184+
queryWindowSRID = 4326;
185+
}
186+
CoordinateReferenceSystem queryWindowCRS = FunctionsGeoTools.sridToCRS(queryWindowSRID);
187+
188+
Geometry transformedRaster = transformGeometryToWGS84(rasterGeometry, rasterCRS);
189+
Geometry transformedQuery = transformGeometryToWGS84(queryWindow, queryWindowCRS);
190+
return Pair.of(transformedRaster, transformedQuery);
191+
}
192+
193+
private static Pair<Geometry, Geometry> toWGS84Pair(GridCoverage2D left, GridCoverage2D right) {
194+
Geometry leftGeometry;
195+
Geometry rightGeometry;
196+
try {
197+
leftGeometry = GeometryFunctions.convexHull(left);
198+
rightGeometry = GeometryFunctions.convexHull(right);
199+
} catch (FactoryException | TransformException e) {
200+
throw new RuntimeException("Failed to calculate the convex hull of the raster", e);
201+
}
202+
203+
CoordinateReferenceSystem leftCRS = left.getCoordinateReferenceSystem();
204+
if (leftCRS == null || leftCRS instanceof DefaultEngineeringCRS) {
205+
leftCRS = DefaultGeographicCRS.WGS84;
206+
}
207+
CoordinateReferenceSystem rightCRS = right.getCoordinateReferenceSystem();
208+
if (rightCRS == null || rightCRS instanceof DefaultEngineeringCRS) {
209+
rightCRS = DefaultGeographicCRS.WGS84;
210+
}
211+
212+
Geometry transformedLeft = transformGeometryToWGS84(leftGeometry, leftCRS);
213+
Geometry transformedRight = transformGeometryToWGS84(rightGeometry, rightCRS);
214+
return Pair.of(transformedLeft, transformedRight);
215+
}
216+
85217
private static Pair<Geometry, Geometry> convertCRSIfNeeded(
86218
GridCoverage2D raster, Geometry queryWindow) {
87219
Geometry rasterGeometry;

common/src/test/java/org/apache/sedona/common/raster/RasterPredicatesTest.java

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -489,6 +489,124 @@ public void testRasterRasterPredicatesNoCrs() throws FactoryException {
489489
Assert.assertFalse(RasterPredicates.rsIntersects(raster3, raster2));
490490
}
491491

492+
@Test
493+
public void testDWithinWGS84RasterPointMeterSemantics() throws FactoryException {
494+
// 5x5 degree raster at WGS84 origin: hull (0,-5)–(5,0), centroid (2.5, -2.5).
495+
GridCoverage2D raster = RasterConstructors.makeEmptyRaster(1, 5, 5, 0, 0, 1, -1, 0, 0, 4326);
496+
497+
// Point coincident with the raster centroid — distance is 0, so any non-negative
498+
// threshold matches and the unit cannot silently fall back to degrees.
499+
Geometry coincident = GEOMETRY_FACTORY.createPoint(new Coordinate(2.5, -2.5));
500+
coincident.setSRID(4326);
501+
Assert.assertTrue(RasterPredicates.rsDWithin(raster, coincident, 0.0));
502+
Assert.assertTrue(RasterPredicates.rsDWithin(raster, coincident, 1.0));
503+
504+
// Point ~10° east at the same latitude — geodesic distance ≈ 1 112 km. Bracket the
505+
// threshold above and below to catch unit mistakes (1° was the old buggy semantics)
506+
// and projection regressions.
507+
Geometry tenDegreesEast = GEOMETRY_FACTORY.createPoint(new Coordinate(12.5, -2.5));
508+
tenDegreesEast.setSRID(4326);
509+
Assert.assertTrue(RasterPredicates.rsDWithin(raster, tenDegreesEast, 2_000_000.0));
510+
Assert.assertFalse(RasterPredicates.rsDWithin(raster, tenDegreesEast, 500_000.0));
511+
// A degree-sized threshold (the pre-fix buggy unit) must reject this pair under the
512+
// meters contract.
513+
Assert.assertFalse(RasterPredicates.rsDWithin(raster, tenDegreesEast, 10.0));
514+
}
515+
516+
@Test
517+
public void testDWithinSwappedOperands() throws FactoryException {
518+
GridCoverage2D raster = RasterConstructors.makeEmptyRaster(1, 5, 5, 0, 0, 1, -1, 0, 0, 4326);
519+
Geometry point = GEOMETRY_FACTORY.createPoint(new Coordinate(12.5, -2.5));
520+
point.setSRID(4326);
521+
// The (raster, geom) and (geom, raster) overloads share the same minimum geodesic distance,
522+
// so both must agree at and around the threshold.
523+
Assert.assertEquals(
524+
RasterPredicates.rsDWithin(raster, point, 2_000_000.0),
525+
RasterPredicates.rsDWithin(raster, point, 2_000_000.0));
526+
Assert.assertTrue(RasterPredicates.rsDWithin(raster, point, 2_000_000.0));
527+
Assert.assertFalse(RasterPredicates.rsDWithin(raster, point, 500_000.0));
528+
}
529+
530+
@Test
531+
public void testDWithinProjectedRasterReprojects() throws FactoryException {
532+
// UTM 32610 (meters) raster centred near San Francisco. The predicate must reproject
533+
// both sides to WGS84 before measuring, regardless of the input CRS.
534+
GridCoverage2D raster =
535+
RasterConstructors.makeEmptyRaster(
536+
1, "B", 751, 742, 332385, 4258815, 300, -300, 0, 0, 32610);
537+
538+
// Point inside the raster footprint (≈ same area as the centroid): close geodesic
539+
// distance, matches even with a small threshold.
540+
Geometry nearby = GEOMETRY_FACTORY.createPoint(new Coordinate(-122.40, 37.75));
541+
nearby.setSRID(4326);
542+
Assert.assertTrue(RasterPredicates.rsDWithin(raster, nearby, 200_000.0));
543+
544+
// Point in Kansas — ≈ 2 400 km from the UTM-10N raster's WGS84 centroid.
545+
Geometry farAway = GEOMETRY_FACTORY.createPoint(new Coordinate(-95.0, 39.0));
546+
farAway.setSRID(4326);
547+
Assert.assertTrue(RasterPredicates.rsDWithin(raster, farAway, 3_000_000.0));
548+
Assert.assertFalse(RasterPredicates.rsDWithin(raster, farAway, 1_000_000.0));
549+
550+
// Same Kansas point expressed in EPSG:3857 (Web Mercator). Even though neither side is
551+
// in WGS84, the predicate must still reproject and produce identical truth values.
552+
Geometry farAwayMercator = GEOMETRY_FACTORY.createPoint(new Coordinate(-10575352, 4721671));
553+
farAwayMercator.setSRID(3857);
554+
Assert.assertTrue(RasterPredicates.rsDWithin(raster, farAwayMercator, 3_000_000.0));
555+
Assert.assertFalse(RasterPredicates.rsDWithin(raster, farAwayMercator, 1_000_000.0));
556+
}
557+
558+
@Test
559+
public void testDWithinRasterRaster() throws FactoryException {
560+
// Two WGS84 rasters whose centroids are exactly 10° of longitude apart on the equator:
561+
// geodesic centroid distance ≈ 1 112 km.
562+
GridCoverage2D rasterA = RasterConstructors.makeEmptyRaster(1, 5, 5, 0, 0, 1, -1, 0, 0, 4326);
563+
GridCoverage2D rasterB = RasterConstructors.makeEmptyRaster(1, 5, 5, 10, 0, 1, -1, 0, 0, 4326);
564+
Assert.assertTrue(RasterPredicates.rsDWithin(rasterA, rasterB, 2_000_000.0));
565+
Assert.assertFalse(RasterPredicates.rsDWithin(rasterA, rasterB, 500_000.0));
566+
// Symmetry: swapping the operands does not change the truth value.
567+
Assert.assertEquals(
568+
RasterPredicates.rsDWithin(rasterA, rasterB, 2_000_000.0),
569+
RasterPredicates.rsDWithin(rasterB, rasterA, 2_000_000.0));
570+
571+
// Cross-CRS pair (UTM 10N + WGS84): the projected raster must be reprojected before
572+
// distance is computed.
573+
GridCoverage2D utmRaster =
574+
RasterConstructors.makeEmptyRaster(
575+
1, "B", 751, 742, 332385, 4258815, 300, -300, 0, 0, 32610);
576+
// WGS84 raster co-located with the UTM raster's footprint near SF.
577+
GridCoverage2D wgs84Nearby =
578+
RasterConstructors.makeEmptyRaster(1, 10, 10, -122.45, 37.80, 0.01, -0.01, 0, 0, 4326);
579+
Assert.assertTrue(RasterPredicates.rsDWithin(utmRaster, wgs84Nearby, 500_000.0));
580+
Assert.assertEquals(
581+
RasterPredicates.rsDWithin(utmRaster, wgs84Nearby, 500_000.0),
582+
RasterPredicates.rsDWithin(wgs84Nearby, utmRaster, 500_000.0));
583+
}
584+
585+
@Test
586+
public void testDWithinDoesNotMutateCallerGeometry() throws FactoryException {
587+
// Regression: the predicate used to call setSRID(4326) directly on the caller's geometry
588+
// when no orientation change was needed. Verify SRID and coordinate identity are untouched
589+
// after the predicate runs, for both same-CRS (no transform) and cross-CRS paths.
590+
GridCoverage2D raster = RasterConstructors.makeEmptyRaster(1, 5, 5, 0, 0, 1, -1, 0, 0, 4326);
591+
592+
// Same-CRS: no transform happens, so the predicate sees the caller's exact JTS object.
593+
Geometry sameCrsPoint = GEOMETRY_FACTORY.createPoint(new Coordinate(2.5, -2.5));
594+
sameCrsPoint.setSRID(0);
595+
Coordinate originalCoord = sameCrsPoint.getCoordinate().copy();
596+
RasterPredicates.rsDWithin(raster, sameCrsPoint, 1.0);
597+
Assert.assertEquals(0, sameCrsPoint.getSRID());
598+
Assert.assertEquals(originalCoord, sameCrsPoint.getCoordinate());
599+
600+
// Cross-CRS: JTS.transform produces a fresh geometry internally, but the caller's object
601+
// must still be untouched.
602+
GeometryFactory mercatorFactory = new GeometryFactory(new PrecisionModel(), 3857);
603+
Geometry mercatorPoint = mercatorFactory.createPoint(new Coordinate(278300.0, -278300.0));
604+
Coordinate originalMercatorCoord = mercatorPoint.getCoordinate().copy();
605+
RasterPredicates.rsDWithin(raster, mercatorPoint, 1.0);
606+
Assert.assertEquals(3857, mercatorPoint.getSRID());
607+
Assert.assertEquals(originalMercatorCoord, mercatorPoint.getCoordinate());
608+
}
609+
492610
@Test
493611
public void testIsCRSMatchesEPSGCode() throws FactoryException {
494612
CoordinateReferenceSystem epsg4326 = CRS.decode("EPSG:4326");

docs/api/sql/Optimizer.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,24 @@ These queries could be planned as RangeJoin or BroadcastIndexJoin. Here is an ex
282282
+- LocalTableScan [geom#24, id#25]
283283
```
284284

285+
### Raster distance join
286+
287+
`RS_DWithin(left, right, distance)` is recognised as a distance-join predicate. `distance` is always in **meters**: both sides are projected to WGS84 first, each side's envelope is expanded by `distance` meters using the Haversine polar-radius approximation (the same envelope expansion `ST_DistanceSphere` uses), and the resulting envelopes drive an R-tree filter. Surviving candidates are refined per-row by `RS_DWithin`, which delegates to the Geography `dWithin` to compute the minimum geodesic distance between the convex hulls via S2's `ClosestEdgeQuery` (overlap or touch returns 0). `BroadcastIndexJoinExec` is chosen when one side is small enough to broadcast, otherwise `DistanceJoinExec`.
288+
289+
Rasters whose planar WGS84 envelope already spans more than half the globe in longitude or grazes a pole — e.g. polar projections (EPSG:3996, EPSG:3413) and antimeridian-crossing UTM zones (EPSG:32601) — are given a global filter envelope instead. The R-tree filter is intentionally permissive for those rows so they pair with every counterpart, and the per-row S2 predicate produces the final answer. Mid-latitude rasters retain the tight Haversine bound and the partitioning speedup that comes with it.
290+
291+
```sql
292+
-- Raster-geometry distance join (broadcastable), 1 km radius
293+
SELECT /*+ BROADCAST(points) */ r.id, p.id
294+
FROM rasters r
295+
JOIN points p ON RS_DWithin(r.raster, p.geom, 1000)
296+
297+
-- Raster-raster distance join (partitioned spatial join), 5 km radius
298+
SELECT a.id, b.id
299+
FROM rasters a
300+
JOIN rasters b ON RS_DWithin(a.raster, b.raster, 5000)
301+
```
302+
285303
## Google S2 based approximate equi-join
286304

287305
If the performance of Sedona optimized join is not ideal, which is possibly caused by complicated and overlapping geometries, you can resort to Sedona built-in Google S2-based approximate equi-join. This equi-join leverages Spark's internal equi-join algorithm and might be performant given that you can opt to skip the refinement step by sacrificing query accuracy.

docs/api/sql/Raster-Functions.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ These functions test spatial relationships involving raster objects.
102102
| Function | Return type | Description | Since |
103103
| :--- | :--- | :--- | :--- |
104104
| [RS_Contains](Raster-Predicates/RS_Contains.md) | Boolean | Returns true if the geometry or raster on the left side contains the geometry or raster on the right side. The convex hull of the raster is considered in the test. | v1.5.0 |
105+
| [RS_DWithin](Raster-Predicates/RS_DWithin.md) | Boolean | Returns true if the minimum geodesic distance between the raster or geometry on the left side and the raster or geometry on the right side is at most `distance` meters (inputs are projected to WGS84 before the test; overlap or touch returns 0). The convex hull of the raster is considered in the test. | v1.9.1 |
105106
| [RS_Intersects](Raster-Predicates/RS_Intersects.md) | Boolean | Returns true if raster or geometry on the left side intersects with the raster or geometry on the right side. The convex hull of the raster is considered in the test. | v1.5.0 |
106107
| [RS_Within](Raster-Predicates/RS_Within.md) | Boolean | Returns true if the geometry or raster on the left side is within the geometry or raster on the right side. The convex hull of the raster is considered in the test. | v1.5.0 |
107108

0 commit comments

Comments
 (0)