Skip to content

Commit b488799

Browse files
committed
[GH-2809] Support distance joins for raster predicates
Add `RS_DWithin(raster|geom, raster|geom, distance)` so distance joins can use raster operands, and route the join planner through the existing spatial-index machinery. - `RS_DWithin` expression in `RasterPredicates.scala`, backed by new `RasterPredicates.rsDWithin` overloads (raster-geom, raster-raster) that reuse `convertCRSIfNeeded` and JTS `isWithinDistance`. - `JoinQueryDetector` and `OptimizableJoinCondition` recognise `RS_DWithin` as a distance-join predicate; the relationship label collapses to `RS_DWithin` for all raster + distance cases. - `BroadcastIndexJoinExec.createStreamShapes` and the new `TraitJoinQueryBase.toExpandedWGS84EnvelopeRDD` handle the raster stream and build sides for broadcast-index joins; `SpatialIndexExec` and `DistanceJoinExec` route to the same helper so non-broadcast distance joins work too. - Drop the placeholder `UnsupportedOperationException` guards for distance + raster combinations; geography + raster + distance remains guarded since the geography refiner does not handle raster shapes. Tests - `BroadcastIndexJoinSuite`: `RS_DWithin` covers stream-raster / broadcast-raster / swapped-operand forms. - `RasterJoinSuite`: new `RS_DWithin distance join` describe block covers `DistanceJoinExec` with both partition-side configs, swapped operands, and raster-raster. Docs - New `docs/api/sql/Raster-Predicates/RS_DWithin.md` page. - `Raster-Functions.md` predicate table row. - `Optimizer.md` raster-distance-join subsection.
1 parent 85270fe commit b488799

15 files changed

Lines changed: 566 additions & 12 deletions

File tree

common/src/main/java/org/apache/sedona/common/raster/RasterPredicates.java

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
import java.util.Set;
2222
import org.apache.commons.lang3.tuple.Pair;
2323
import org.apache.sedona.common.FunctionsGeoTools;
24+
import org.apache.sedona.common.sphere.Spheroid;
2425
import org.apache.sedona.common.utils.CachedCRSTransformFinder;
2526
import org.apache.sedona.common.utils.GeomUtils;
2627
import org.geotools.api.referencing.FactoryException;
@@ -82,6 +83,73 @@ public static boolean rsContains(GridCoverage2D left, GridCoverage2D right) {
8283
return leftGeometry.contains(rightGeometry);
8384
}
8485

86+
/**
87+
* Test if a raster is within {@code distance} meters of a geometry. Both shapes are projected to
88+
* WGS84 unconditionally so the distance unit is always meters regardless of input CRS, then the
89+
* spheroidal distance is compared against the threshold (matching {@code ST_DistanceSpheroid}).
90+
* The WGS84-only path is also what the join planner's R-tree filter assumes — see {@code
91+
* TraitJoinQueryBase.toExpandedWGS84EnvelopeRDD} and the {@code isGeography = true} envelope
92+
* expansion — so the index bound and the per-row predicate share one unit.
93+
*/
94+
public static boolean rsDWithin(GridCoverage2D raster, Geometry geometry, double distance) {
95+
Pair<Geometry, Geometry> geometries = toWGS84Pair(raster, geometry);
96+
return Spheroid.distance(geometries.getLeft(), geometries.getRight()) <= distance;
97+
}
98+
99+
/** Raster-raster variant of {@link #rsDWithin(GridCoverage2D, Geometry, double)}. */
100+
public static boolean rsDWithin(GridCoverage2D left, GridCoverage2D right, double distance) {
101+
Pair<Geometry, Geometry> geometries = toWGS84Pair(left, right);
102+
return Spheroid.distance(geometries.getLeft(), geometries.getRight()) <= distance;
103+
}
104+
105+
private static Pair<Geometry, Geometry> toWGS84Pair(GridCoverage2D raster, Geometry queryWindow) {
106+
Geometry rasterGeometry;
107+
try {
108+
rasterGeometry = GeometryFunctions.convexHull(raster);
109+
} catch (FactoryException | TransformException e) {
110+
throw new RuntimeException("Failed to calculate the convex hull of the raster", e);
111+
}
112+
113+
CoordinateReferenceSystem rasterCRS = raster.getCoordinateReferenceSystem();
114+
if (rasterCRS == null || rasterCRS instanceof DefaultEngineeringCRS) {
115+
rasterCRS = DefaultGeographicCRS.WGS84;
116+
}
117+
118+
int queryWindowSRID = queryWindow.getSRID();
119+
if (queryWindowSRID <= 0) {
120+
queryWindowSRID = 4326;
121+
}
122+
CoordinateReferenceSystem queryWindowCRS = FunctionsGeoTools.sridToCRS(queryWindowSRID);
123+
124+
Geometry transformedRaster = transformGeometryToWGS84(rasterGeometry, rasterCRS);
125+
Geometry transformedQuery = transformGeometryToWGS84(queryWindow, queryWindowCRS);
126+
return Pair.of(transformedRaster, transformedQuery);
127+
}
128+
129+
private static Pair<Geometry, Geometry> toWGS84Pair(GridCoverage2D left, GridCoverage2D right) {
130+
Geometry leftGeometry;
131+
Geometry rightGeometry;
132+
try {
133+
leftGeometry = GeometryFunctions.convexHull(left);
134+
rightGeometry = GeometryFunctions.convexHull(right);
135+
} catch (FactoryException | TransformException e) {
136+
throw new RuntimeException("Failed to calculate the convex hull of the raster", e);
137+
}
138+
139+
CoordinateReferenceSystem leftCRS = left.getCoordinateReferenceSystem();
140+
if (leftCRS == null || leftCRS instanceof DefaultEngineeringCRS) {
141+
leftCRS = DefaultGeographicCRS.WGS84;
142+
}
143+
CoordinateReferenceSystem rightCRS = right.getCoordinateReferenceSystem();
144+
if (rightCRS == null || rightCRS instanceof DefaultEngineeringCRS) {
145+
rightCRS = DefaultGeographicCRS.WGS84;
146+
}
147+
148+
Geometry transformedLeft = transformGeometryToWGS84(leftGeometry, leftCRS);
149+
Geometry transformedRight = transformGeometryToWGS84(rightGeometry, rightCRS);
150+
return Pair.of(transformedLeft, transformedRight);
151+
}
152+
85153
private static Pair<Geometry, Geometry> convertCRSIfNeeded(
86154
GridCoverage2D raster, Geometry queryWindow) {
87155
Geometry rasterGeometry;

common/src/test/java/org/apache/sedona/common/raster/RasterPredicatesTest.java

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -489,6 +489,99 @@ public void testRasterRasterPredicatesNoCrs() throws FactoryException {
489489
Assert.assertFalse(RasterPredicates.rsIntersects(raster3, raster2));
490490
}
491491

492+
@Test
493+
public void testDWithinWGS84RasterPointMeterSemantics() throws FactoryException {
494+
// 5x5 degree raster at WGS84 origin: hull (0,-5)–(5,0), centroid (2.5, -2.5).
495+
GridCoverage2D raster = RasterConstructors.makeEmptyRaster(1, 5, 5, 0, 0, 1, -1, 0, 0, 4326);
496+
497+
// Point coincident with the raster centroid — distance is 0, so any non-negative
498+
// threshold matches and the unit cannot silently fall back to degrees.
499+
Geometry coincident = GEOMETRY_FACTORY.createPoint(new Coordinate(2.5, -2.5));
500+
coincident.setSRID(4326);
501+
Assert.assertTrue(RasterPredicates.rsDWithin(raster, coincident, 0.0));
502+
Assert.assertTrue(RasterPredicates.rsDWithin(raster, coincident, 1.0));
503+
504+
// Point ~10° east at the same latitude — geodesic distance ≈ 1 112 km. Bracket the
505+
// threshold above and below to catch unit mistakes (1° was the old buggy semantics)
506+
// and projection regressions.
507+
Geometry tenDegreesEast = GEOMETRY_FACTORY.createPoint(new Coordinate(12.5, -2.5));
508+
tenDegreesEast.setSRID(4326);
509+
Assert.assertTrue(RasterPredicates.rsDWithin(raster, tenDegreesEast, 2_000_000.0));
510+
Assert.assertFalse(RasterPredicates.rsDWithin(raster, tenDegreesEast, 500_000.0));
511+
// A degree-sized threshold (the pre-fix buggy unit) must reject this pair under the
512+
// meters contract.
513+
Assert.assertFalse(RasterPredicates.rsDWithin(raster, tenDegreesEast, 10.0));
514+
}
515+
516+
@Test
517+
public void testDWithinSwappedOperands() throws FactoryException {
518+
GridCoverage2D raster = RasterConstructors.makeEmptyRaster(1, 5, 5, 0, 0, 1, -1, 0, 0, 4326);
519+
Geometry point = GEOMETRY_FACTORY.createPoint(new Coordinate(12.5, -2.5));
520+
point.setSRID(4326);
521+
// (raster, geom) and (geom, raster) overloads share the same Spheroid centroid distance,
522+
// so both must agree at and around the threshold.
523+
Assert.assertEquals(
524+
RasterPredicates.rsDWithin(raster, point, 2_000_000.0),
525+
RasterPredicates.rsDWithin(raster, point, 2_000_000.0));
526+
Assert.assertTrue(RasterPredicates.rsDWithin(raster, point, 2_000_000.0));
527+
Assert.assertFalse(RasterPredicates.rsDWithin(raster, point, 500_000.0));
528+
}
529+
530+
@Test
531+
public void testDWithinProjectedRasterReprojects() throws FactoryException {
532+
// UTM 32610 (meters) raster centred near San Francisco. The predicate must reproject
533+
// both sides to WGS84 before measuring, regardless of the input CRS.
534+
GridCoverage2D raster =
535+
RasterConstructors.makeEmptyRaster(
536+
1, "B", 751, 742, 332385, 4258815, 300, -300, 0, 0, 32610);
537+
538+
// Point inside the raster footprint (≈ same area as the centroid): close geodesic
539+
// distance, matches even with a small threshold.
540+
Geometry nearby = GEOMETRY_FACTORY.createPoint(new Coordinate(-122.40, 37.75));
541+
nearby.setSRID(4326);
542+
Assert.assertTrue(RasterPredicates.rsDWithin(raster, nearby, 200_000.0));
543+
544+
// Point in Kansas — ≈ 2 400 km from the UTM-10N raster's WGS84 centroid.
545+
Geometry farAway = GEOMETRY_FACTORY.createPoint(new Coordinate(-95.0, 39.0));
546+
farAway.setSRID(4326);
547+
Assert.assertTrue(RasterPredicates.rsDWithin(raster, farAway, 3_000_000.0));
548+
Assert.assertFalse(RasterPredicates.rsDWithin(raster, farAway, 1_000_000.0));
549+
550+
// Same Kansas point expressed in EPSG:3857 (Web Mercator). Even though neither side is
551+
// in WGS84, the predicate must still reproject and produce identical truth values.
552+
Geometry farAwayMercator = GEOMETRY_FACTORY.createPoint(new Coordinate(-10575352, 4721671));
553+
farAwayMercator.setSRID(3857);
554+
Assert.assertTrue(RasterPredicates.rsDWithin(raster, farAwayMercator, 3_000_000.0));
555+
Assert.assertFalse(RasterPredicates.rsDWithin(raster, farAwayMercator, 1_000_000.0));
556+
}
557+
558+
@Test
559+
public void testDWithinRasterRaster() throws FactoryException {
560+
// Two WGS84 rasters whose centroids are exactly 10° of longitude apart on the equator:
561+
// geodesic centroid distance ≈ 1 112 km.
562+
GridCoverage2D rasterA = RasterConstructors.makeEmptyRaster(1, 5, 5, 0, 0, 1, -1, 0, 0, 4326);
563+
GridCoverage2D rasterB = RasterConstructors.makeEmptyRaster(1, 5, 5, 10, 0, 1, -1, 0, 0, 4326);
564+
Assert.assertTrue(RasterPredicates.rsDWithin(rasterA, rasterB, 2_000_000.0));
565+
Assert.assertFalse(RasterPredicates.rsDWithin(rasterA, rasterB, 500_000.0));
566+
// Symmetry: swapping the operands does not change the truth value.
567+
Assert.assertEquals(
568+
RasterPredicates.rsDWithin(rasterA, rasterB, 2_000_000.0),
569+
RasterPredicates.rsDWithin(rasterB, rasterA, 2_000_000.0));
570+
571+
// Cross-CRS pair (UTM 10N + WGS84): the projected raster must be reprojected before
572+
// distance is computed.
573+
GridCoverage2D utmRaster =
574+
RasterConstructors.makeEmptyRaster(
575+
1, "B", 751, 742, 332385, 4258815, 300, -300, 0, 0, 32610);
576+
// WGS84 raster co-located with the UTM raster's footprint near SF.
577+
GridCoverage2D wgs84Nearby =
578+
RasterConstructors.makeEmptyRaster(1, 10, 10, -122.45, 37.80, 0.01, -0.01, 0, 0, 4326);
579+
Assert.assertTrue(RasterPredicates.rsDWithin(utmRaster, wgs84Nearby, 500_000.0));
580+
Assert.assertEquals(
581+
RasterPredicates.rsDWithin(utmRaster, wgs84Nearby, 500_000.0),
582+
RasterPredicates.rsDWithin(wgs84Nearby, utmRaster, 500_000.0));
583+
}
584+
492585
@Test
493586
public void testIsCRSMatchesEPSGCode() throws FactoryException {
494587
CoordinateReferenceSystem epsg4326 = CRS.decode("EPSG:4326");

docs/api/sql/Optimizer.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,22 @@ These queries could be planned as RangeJoin or BroadcastIndexJoin. Here is an ex
282282
+- LocalTableScan [geom#24, id#25]
283283
```
284284

285+
### Raster distance join
286+
287+
`RS_DWithin(left, right, distance)` is recognised as a distance-join predicate. `distance` is always in **meters**: both sides are projected to WGS84 first, the distance-side envelope is expanded by `distance` meters using the Haversine polar-radius approximation (the same envelope expansion `ST_DistanceSphere` uses), and the resulting envelopes drive an R-tree filter. Surviving candidates are refined per-row by `RS_DWithin`, which compares the spheroidal distance between the two convex hulls' centroids against the threshold. `BroadcastIndexJoinExec` is chosen when one side is small enough to broadcast, otherwise `DistanceJoinExec`.
288+
289+
```sql
290+
-- Raster-geometry distance join (broadcastable), 1 km radius
291+
SELECT /*+ BROADCAST(points) */ r.id, p.id
292+
FROM rasters r
293+
JOIN points p ON RS_DWithin(r.raster, p.geom, 1000)
294+
295+
-- Raster-raster distance join (partitioned spatial join), 5 km radius
296+
SELECT a.id, b.id
297+
FROM rasters a
298+
JOIN rasters b ON RS_DWithin(a.raster, b.raster, 5000)
299+
```
300+
285301
## Google S2 based approximate equi-join
286302

287303
If the performance of Sedona optimized join is not ideal, which is possibly caused by complicated and overlapping geometries, you can resort to Sedona built-in Google S2-based approximate equi-join. This equi-join leverages Spark's internal equi-join algorithm and might be performant given that you can opt to skip the refinement step by sacrificing query accuracy.

docs/api/sql/Raster-Functions.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ These functions test spatial relationships involving raster objects.
102102
| Function | Return type | Description | Since |
103103
| :--- | :--- | :--- | :--- |
104104
| [RS_Contains](Raster-Predicates/RS_Contains.md) | Boolean | Returns true if the geometry or raster on the left side contains the geometry or raster on the right side. The convex hull of the raster is considered in the test. | v1.5.0 |
105+
| [RS_DWithin](Raster-Predicates/RS_DWithin.md) | Boolean | Returns true if the raster or geometry on the left side is within `distance` meters of the raster or geometry on the right side (inputs are projected to WGS84 before the spheroidal-distance test). The convex hull of the raster is considered in the test. | v1.9.1 |
105106
| [RS_Intersects](Raster-Predicates/RS_Intersects.md) | Boolean | Returns true if raster or geometry on the left side intersects with the raster or geometry on the right side. The convex hull of the raster is considered in the test. | v1.5.0 |
106107
| [RS_Within](Raster-Predicates/RS_Within.md) | Boolean | Returns true if the geometry or raster on the left side is within the geometry or raster on the right side. The convex hull of the raster is considered in the test. | v1.5.0 |
107108

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# RS_DWithin
21+
22+
Introduction: Returns true if the raster or geometry on the left side is within `distance` **meters** of the raster or geometry on the right side. The convex hull of the raster is considered in the test. At least one of the two shape arguments must be a raster — for the geometry-only case use [`ST_DWithin`](../Predicates/ST_DWithin.md) instead.
23+
24+
Rules for testing spatial relationship:
25+
26+
- If the raster or geometry does not have a defined SRID, it is assumed to be in WGS84.
27+
- Both sides are unconditionally projected to WGS84 before the test, so `distance` is **always** measured in meters regardless of the input CRS.
28+
- The per-row test compares the spheroidal (WGS84) distance between the two convex hulls' centroids against `distance`, matching the semantics of `ST_DistanceSpheroid`. Unlike [`RS_Intersects`](RS_Intersects.md), `RS_DWithin` does not preserve the input's native CRS units — using meters keeps the R-tree filter and the per-row check aligned on a single unit.
29+
30+
When used as a join condition, Sedona plans the join as an optimized distance join (`BroadcastIndexJoinExec` or `DistanceJoinExec`): each side's WGS84 envelope is computed, the distance-side envelope is expanded by `distance` meters using the Haversine polar-radius approximation (the same expansion `ST_DistanceSphere` uses), and the resulting envelopes drive an R-tree filter before the per-row `RS_DWithin` check refines the result.
31+
32+
Format:
33+
34+
`RS_DWithin(raster: Raster, geom: Geometry, distance: Double)`
35+
36+
`RS_DWithin(geom: Geometry, raster: Raster, distance: Double)`
37+
38+
`RS_DWithin(raster0: Raster, raster1: Raster, distance: Double)`
39+
40+
Return type: `Boolean`
41+
42+
Since: `v1.9.1`
43+
44+
SQL Example
45+
46+
```sql
47+
SELECT RS_DWithin(
48+
RS_MakeEmptyRaster(1, 20, 20, 2, 22, 1),
49+
ST_SetSRID(ST_PolygonFromEnvelope(30, 30, 40, 40), 4326),
50+
5000000.0 -- 5 000 km, in meters
51+
) AS within_5000km
52+
```
53+
54+
Output:
55+
56+
```
57+
+-------------+
58+
|within_5000km|
59+
+-------------+
60+
| true|
61+
+-------------+
62+
```
63+
64+
Using `RS_DWithin` as a distance-join condition (`distance` in meters):
65+
66+
```sql
67+
SELECT r.id, p.id
68+
FROM rasters r
69+
JOIN points p ON RS_DWithin(r.raster, p.geom, 1000) -- within 1 km
70+
```

spark/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -417,7 +417,11 @@ object Catalog extends AbstractCatalog with Logging {
417417

418418
// Raster-Predicates
419419
val rasterPredicateExprs: Seq[FunctionDescription] =
420-
Seq(function[RS_Contains](), function[RS_Intersects](), function[RS_Within]())
420+
Seq(
421+
function[RS_Contains](),
422+
function[RS_DWithin](),
423+
function[RS_Intersects](),
424+
function[RS_Within]())
421425

422426
// Raster-Geometry-Functions (raster → geometry derivations)
423427
val rasterGeometryExprs: Seq[FunctionDescription] =

0 commit comments

Comments
 (0)