Skip to content

Commit cd95746

Browse files
authored
rasterize: slice line_props/point_props per tile in dask backends (#2020) (#2023)
* rasterize: slice line_props/point_props per tile in dask backends (#2020) The dask backends embedded the full per-type props tables into every delayed tile task, despite the polygon path already filtering via poly_props[pmask]. For workloads with many localized geometries this amplified scheduler overhead by O(N_geoms * N_tiles). Adds _slice_props_for_tile that returns the subset of props referenced by a tile and a remapped local-index array. Wired into _run_dask_numpy and _run_dask_cupy alongside the existing segment / point tile filters. Measured: 5000 points x 8 cols / 100 tiles graph shrinks ~30 MB -> 0.3 MB; localized lines ~32 MB -> 1.1 MB. Dask + cupy parity verified. * rasterize: address review suggestions on per-tile slicing Add an explicit straddling-line test that asserts a line crossing a tile seam renders on both sides. Fix the np.unique comment (inverse map is a new array, not in-place), normalise the empty-branch local_idx dtype to int32, and note in the cupy comment that for sprawling-line workloads the cost shifts from graph payload to per-tile PCIe uploads.
1 parent 4d24a56 commit cd95746

3 files changed

Lines changed: 327 additions & 7 deletions

File tree

.claude/sweep-performance-state.csv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ perlin,2026-03-31T18:00:00Z,WILL OOM,memory-bound,0,,
3232
polygon_clip,2026-04-16T12:00:00Z,SAFE,compute-bound,0,1207,Re-audit 2026-04-16: fix verified SAFE. Mask stays lazy via rasterize chunks kwarg; per-chunk peak bounded.
3333
polygonize,2026-04-16T12:00:00Z,RISKY,compute-bound,0,,Re-audit 2026-04-16 after PR 1190 NaN fix + 1176 simplification. No HIGH. MEDIUM: sequential per-chunk dask.compute loop at L1528 serializes work; _polygonize_cupy full GPU->CPU transfer at L665; per-value bin_mask alloc in _calculate_regions_cupy.
3434
proximity,2026-03-31T18:00:00Z,WILL OOM,memory-bound,3,1111,Memory guard added to line-sweep path. KDTree path (EUCLIDEAN/MANHATTAN + scipy) already had guards. GREAT_CIRCLE unbounded path already guarded.
35-
rasterize,2026-04-15T12:00:00Z,SAFE,graph-bound,0,false-positive,Downgraded. Tile-by-tile graph construction with per-tile geometry filtering is the correct pattern. Pre-filtering ensures each delayed task gets only its relevant subset. O(n_tiles) bbox check at graph-build time is vectorized and fast.
35+
rasterize,2026-05-17,SAFE,graph-bound,0,2020,"Pass 2 (2026-05-17): re-audit identified MEDIUM Cat-2/Cat-3 graph-bound waste in _run_dask_numpy/_run_dask_cupy -- full line_props/point_props embedded in every delayed tile task (polygon path already filtered via poly_props[pmask]). Filed #2020 and fixed: added _slice_props_for_tile helper to remap geom_idx and slice props per tile (mirrors polygon path). Measured 5000 points x 8 cols / 100 tiles graph shrank from ~30 MB to <0.3 MB (37x); localized lines from ~32 MB to ~1.1 MB. 9 new tests in test_rasterize_tile_props_slice_2020.py (helper unit tests + graph-payload bound + numpy/dask output parity for lines/points/sum-merge). All 184 existing rasterize tests pass; dask+cupy parity verified. Dask graph probe: 2560x2560 chunks=256 yields 400 tasks (4 tasks/chunk constant); 25600x25600 chunks=1024 yields 2500 tasks. cupy 512x512 returns cupy.ndarray with no host round-trip. CUDA _scanline_fill_gpu: 39 regs/thread, 24576 B local_mem/thread (matches static cuda.local.array allocations 2048*8 + 2048*4 bytes). SAFE/graph-bound verdict holds; previous 2026-04-15 false-positive on polygon filtering still valid. | Original (2026-04-15): Tile-by-tile graph construction with per-tile geometry filtering is the correct pattern. Pre-filtering ensures each delayed task gets only its relevant subset."
3636
reproject,2026-05-10,SAFE,compute-bound,1,1571,"Pass 5 (2026-05-10): 1 HIGH filed and fixed in tree -- issue #1571 + fix _merge_block_adapter same-CRS dask path. _place_same_crs in the dask adapter previously called src_data.compute() on the full source per output chunk (68x amplification measured on 256x256x2 source split into 32x32 output chunks, 8.9M pixels materialized vs 131K total source). Fix: added _place_same_crs_lazy at __init__.py:1716 that slices the source window first then computes only that slice. Verified post-fix: 1.00x ratio, 131K pixels materialized for 131K source. New regression test test_merge_dask_same_crs_bounded_materialization codifies the bound. Other audits clean: CUDA resample kernels use 16x16 blocks (cubic=46 regs, bilinear=36, nearest=22 -- well under the 64K-per-block limit, 0 local mem). _reproject_chunk_numpy/cupy already slice source first before .compute(). Dask graph at 25600x25600 src with 1024 chunks yields 4752 tasks (no per-chunk source dependency). _apply_vertical_shift uses in-place += that may not work on dask arrays -- correctness concern, not perf, defer to accuracy sweep."
3737
resample,2026-04-15T12:00:00Z,SAFE,compute-bound,0,false-positive,Downgraded. GPU-CPU-GPU round-trip only in aggregate path for non-integer scale factors. Interpolation (nearest/bilinear/cubic) stays on GPU. No GPU kernel exists for irregular per-pixel binning.
3838
sieve,2026-04-14T12:00:00Z,WILL OOM,memory-bound,0,false-positive,False positive. Memory guards already in place on both dask paths. CCL is inherently global — documented limitation. CuPy CPU fallback is deliberate and documented.

xrspatial/rasterize.py

Lines changed: 57 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1578,6 +1578,33 @@ def _points_for_tile(rows, cols, geom_idx, r_start, r_end, c_start, c_end):
15781578
return (rows[mask] - r_start, cols[mask] - c_start, geom_idx[mask])
15791579

15801580

1581+
def _slice_props_for_tile(geom_idx, props):
1582+
"""Slice a per-type props table to only entries referenced by *geom_idx*.
1583+
1584+
Returns ``(local_idx, sliced_props)`` where ``local_idx`` is a remapped
1585+
int32 array of the same length as ``geom_idx`` but with values in
1586+
``range(len(sliced_props))``, and ``sliced_props`` is the
1587+
contiguous subset of ``props`` that those local indices index into.
1588+
1589+
This is the line / point counterpart to the polygon path's
1590+
``poly_wkb_arr[pmask]`` / ``poly_props[pmask]`` slicing. Without it,
1591+
every dask tile task would embed the full ``line_props`` /
1592+
``point_props`` table in its closure, even when the tile sees only
1593+
a handful of segments. At a few thousand geometries this is a
1594+
multi-MB scheduler overhead per task; at 1 M geometries x 100 tiles
1595+
the difference is ~800 MB of redundant graph state.
1596+
1597+
If ``geom_idx`` is empty, returns an empty ``(0,)`` int32 array and
1598+
a zero-row slice of ``props`` (preserving column count).
1599+
"""
1600+
if len(geom_idx) == 0:
1601+
return geom_idx.astype(np.int32, copy=False), props[:0]
1602+
# np.unique returns sorted unique values and an inverse map that
1603+
# points each entry in geom_idx to its position in unique_idx.
1604+
unique_idx, local_idx = np.unique(geom_idx, return_inverse=True)
1605+
return local_idx.astype(np.int32), props[unique_idx]
1606+
1607+
15811608
def _polys_to_wkb(geoms):
15821609
"""Pre-serialize polygon geometries to WKB for cheap pickling."""
15831610
return [g.wkb for g in geoms]
@@ -1698,10 +1725,22 @@ def _run_dask_numpy(geometries, props_array, bounds, height, width, fill,
16981725
tp = _points_for_tile(pt_rows, pt_cols, pt_geom_idx,
16991726
r_start, r_end, c_start, c_end)
17001727

1728+
# Slice line_props / point_props to only the geometries this
1729+
# tile actually references, remapping the geom_idx arrays to
1730+
# local indices. Mirrors the polygon path's poly_props[pmask]
1731+
# slicing. Skipped for empty filters (the helper returns the
1732+
# already-empty inputs unchanged).
1733+
ts_local_idx, tile_line_props = _slice_props_for_tile(
1734+
ts[4], line_props)
1735+
ts = (*ts[:4], ts_local_idx)
1736+
tp_local_idx, tile_point_props = _slice_props_for_tile(
1737+
tp[2], point_props)
1738+
tp = (*tp[:2], tp_local_idx)
1739+
17011740
delayed_tile = dask.delayed(_rasterize_tile_numpy)(
17021741
tile_wkb, tile_poly_props, tile_bounds,
17031742
tile_h, tile_w, fill, dtype, all_touched, merge_fn,
1704-
*ts, line_props, *tp, point_props)
1743+
*ts, tile_line_props, *tp, tile_point_props)
17051744
blocks[i][j] = da.from_delayed(
17061745
delayed_tile, shape=(tile_h, tile_w), dtype=dtype)
17071746
ri += 1
@@ -1818,10 +1857,15 @@ def _run_dask_cupy(geometries, props_array, bounds, height, width, fill,
18181857
# Pre-compute segment bboxes once (avoids redundant min/max per tile)
18191858
seg_bboxes = _segment_bboxes(seg_r0, seg_c0, seg_r1, seg_c1)
18201859

1821-
# Transfer shared props tables to GPU once (avoids repeated PCIe
1822-
# transfers in each tile worker).
1823-
d_line_props = cupy.asarray(line_props)
1824-
d_point_props = cupy.asarray(point_props)
1860+
# Per-tile props slicing keeps PCIe traffic and graph payload small:
1861+
# each tile only references a subset of geometries, so we slice
1862+
# line_props / point_props per tile (as np arrays in the graph) and
1863+
# transfer only that subset to the GPU inside the worker. Mirrors
1864+
# the polygon path's poly_props[pmask] pattern. Tradeoff: for
1865+
# sprawling-line workloads where every tile references most rows,
1866+
# the GPU now sees N_tiles small uploads instead of one shared
1867+
# driver-side upload, shifting cost from graph payload to PCIe
1868+
# traffic. Localized geometries (the realistic case) still win.
18251869

18261870
tiles = _tile_grid(bounds, height, width, row_chunks, col_chunks)
18271871
n_row_chunks = len(row_chunks)
@@ -1850,10 +1894,17 @@ def _run_dask_cupy(geometries, props_array, bounds, height, width, fill,
18501894
tp = _points_for_tile(pt_rows, pt_cols, pt_geom_idx,
18511895
r_start, r_end, c_start, c_end)
18521896

1897+
ts_local_idx, tile_line_props = _slice_props_for_tile(
1898+
ts[4], line_props)
1899+
ts = (*ts[:4], ts_local_idx)
1900+
tp_local_idx, tile_point_props = _slice_props_for_tile(
1901+
tp[2], point_props)
1902+
tp = (*tp[:2], tp_local_idx)
1903+
18531904
delayed_tile = dask.delayed(_rasterize_tile_cupy)(
18541905
tile_wkb, tile_poly_props, tile_bounds,
18551906
tile_h, tile_w, fill, dtype, all_touched, merge_fn,
1856-
*ts, d_line_props, *tp, d_point_props)
1907+
*ts, tile_line_props, *tp, tile_point_props)
18571908
blocks[i][j] = da.from_delayed(
18581909
delayed_tile, shape=(tile_h, tile_w), dtype=dtype,
18591910
meta=cupy.empty(()))
Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
"""Per-tile props slicing in dask backends (issue #2020).
2+
3+
Locks in that ``_run_dask_numpy`` and ``_run_dask_cupy`` no longer embed
4+
the full ``line_props`` / ``point_props`` table into every delayed tile
5+
task. Only the geometries referenced by each tile end up in that tile's
6+
closure, mirroring the polygon path's ``poly_props[pmask]`` slicing.
7+
"""
8+
from __future__ import annotations
9+
10+
import pickle
11+
12+
import numpy as np
13+
import pytest
14+
15+
try:
16+
from shapely.geometry import LineString, Point, box # noqa: F401
17+
has_shapely = True
18+
except ImportError:
19+
has_shapely = False
20+
21+
try:
22+
import geopandas as gpd
23+
has_geopandas = True
24+
except ImportError:
25+
has_geopandas = False
26+
27+
try:
28+
import dask.array as da # noqa: F401
29+
has_dask = True
30+
except ImportError:
31+
has_dask = False
32+
33+
if has_shapely:
34+
from xrspatial.rasterize import (
35+
_slice_props_for_tile,
36+
rasterize,
37+
)
38+
39+
pytestmark = [
40+
pytest.mark.skipif(not has_shapely, reason="shapely not installed"),
41+
pytest.mark.skipif(not has_dask, reason="dask not installed"),
42+
pytest.mark.skipif(not has_geopandas, reason="geopandas not installed"),
43+
]
44+
45+
46+
# ---------------------------------------------------------------------------
47+
# _slice_props_for_tile helper
48+
# ---------------------------------------------------------------------------
49+
50+
class TestSlicePropsForTile:
51+
def test_empty_geom_idx_returns_empty_props(self):
52+
props = np.arange(50, dtype=np.float64).reshape(10, 5)
53+
empty_idx = np.empty(0, dtype=np.int32)
54+
local_idx, sliced = _slice_props_for_tile(empty_idx, props)
55+
assert len(local_idx) == 0
56+
assert sliced.shape == (0, 5)
57+
assert sliced.dtype == props.dtype
58+
59+
def test_remap_round_trips_props(self):
60+
"""``props[geom_idx]`` and ``sliced[local_idx]`` agree row-wise."""
61+
props = np.arange(20, dtype=np.float64).reshape(10, 2)
62+
geom_idx = np.array([7, 2, 7, 9, 2, 3], dtype=np.int32)
63+
local_idx, sliced = _slice_props_for_tile(geom_idx, props)
64+
assert local_idx.dtype == np.int32
65+
# local_idx values must index into sliced
66+
assert local_idx.max() < len(sliced)
67+
np.testing.assert_array_equal(sliced[local_idx], props[geom_idx])
68+
69+
def test_single_geom_compacts_to_one_row(self):
70+
props = np.arange(50, dtype=np.float64).reshape(10, 5)
71+
geom_idx = np.array([4, 4, 4], dtype=np.int32)
72+
local_idx, sliced = _slice_props_for_tile(geom_idx, props)
73+
assert sliced.shape == (1, 5)
74+
assert np.all(local_idx == 0)
75+
np.testing.assert_array_equal(sliced[0], props[4])
76+
77+
def test_preserves_column_count_on_empty(self):
78+
"""Empty input must keep the column dimension so downstream
79+
indexing into ``[:, j]`` does not raise."""
80+
props = np.empty((0, 3), dtype=np.float64)
81+
empty_idx = np.empty(0, dtype=np.int32)
82+
_, sliced = _slice_props_for_tile(empty_idx, props)
83+
assert sliced.shape == (0, 3)
84+
85+
86+
# ---------------------------------------------------------------------------
87+
# Graph-size regression: localized points should not embed full point_props
88+
# ---------------------------------------------------------------------------
89+
90+
class TestDaskGraphPayloadBounded:
91+
"""Per-tile delayed args must not embed the full per-type props table."""
92+
93+
def test_point_graph_scales_with_per_tile_points_not_total(self):
94+
rng = np.random.default_rng(0)
95+
n_points = 5_000
96+
# Multi-column props amplify any per-task embedding waste
97+
n_cols = 8
98+
xs = rng.uniform(-180, 180, n_points)
99+
ys = rng.uniform(-90, 90, n_points)
100+
data = {f"c{j}": rng.random(n_points) for j in range(n_cols)}
101+
data["geometry"] = [Point(x, y) for x, y in zip(xs, ys)]
102+
gdf = gpd.GeoDataFrame(data, crs="EPSG:4326")
103+
104+
result = rasterize(
105+
gdf,
106+
width=2560,
107+
height=2560,
108+
bounds=(-180, -90, 180, 90),
109+
chunks=(256, 256),
110+
columns=[f"c{j}" for j in range(n_cols)],
111+
)
112+
graph = result.data.__dask_graph__()
113+
graph_bytes = len(pickle.dumps(dict(graph)))
114+
115+
# Without the fix: 100 tiles * 5000 points * 8 cols * 8 bytes
116+
# = 32 MB just for point_props duplicates, on top of segment data.
117+
# With the fix: each tile sees ~50 points so each task embeds
118+
# ~3 KB of point_props; total stays well under 4 MB end-to-end.
119+
assert graph_bytes < 4_000_000, (
120+
f"graph pickled to {graph_bytes:,} bytes -- the per-tile "
121+
f"point_props slice may have regressed and the full table is "
122+
f"being embedded into every delayed task."
123+
)
124+
125+
def test_localized_line_graph_scales_with_per_tile_lines(self):
126+
rng = np.random.default_rng(1)
127+
n_lines = 5_000
128+
n_cols = 8
129+
geoms = []
130+
for _ in range(n_lines):
131+
# Each line spans a single ~1-degree cell, so most tiles
132+
# see only a handful of segments.
133+
cx = rng.uniform(-170, 170)
134+
cy = rng.uniform(-80, 80)
135+
geoms.append(
136+
LineString([(cx, cy),
137+
(cx + rng.uniform(-1, 1),
138+
cy + rng.uniform(-1, 1))])
139+
)
140+
data = {f"c{j}": rng.random(n_lines) for j in range(n_cols)}
141+
data["geometry"] = geoms
142+
gdf = gpd.GeoDataFrame(data, crs="EPSG:4326")
143+
144+
result = rasterize(
145+
gdf,
146+
width=2560,
147+
height=2560,
148+
bounds=(-180, -90, 180, 90),
149+
chunks=(256, 256),
150+
columns=[f"c{j}" for j in range(n_cols)],
151+
)
152+
graph = result.data.__dask_graph__()
153+
graph_bytes = len(pickle.dumps(dict(graph)))
154+
155+
# Without the fix: ~32 MB. With the fix: ~1-2 MB for localized
156+
# lines. Bound the regression check at 5 MB to leave headroom
157+
# for pickle overhead.
158+
assert graph_bytes < 5_000_000, (
159+
f"graph pickled to {graph_bytes:,} bytes -- localized lines "
160+
f"should not embed the full line_props in every tile."
161+
)
162+
163+
164+
# ---------------------------------------------------------------------------
165+
# Correctness: filtered props slice produces identical pixels to baseline
166+
# ---------------------------------------------------------------------------
167+
168+
class TestDaskBackendOutputUnchanged:
169+
"""Slicing per tile must not change a single rasterized pixel."""
170+
171+
def _make_gdf(self, seed=42, n_lines=30, n_points=30):
172+
rng = np.random.default_rng(seed)
173+
geoms = []
174+
for _ in range(n_lines):
175+
cx = rng.uniform(-150, 150)
176+
cy = rng.uniform(-70, 70)
177+
geoms.append(
178+
LineString([(cx, cy),
179+
(cx + rng.uniform(-5, 5),
180+
cy + rng.uniform(-5, 5))])
181+
)
182+
for _ in range(n_points):
183+
geoms.append(
184+
Point(rng.uniform(-150, 150), rng.uniform(-70, 70))
185+
)
186+
vals = list(range(1, len(geoms) + 1))
187+
return gpd.GeoDataFrame(
188+
{"value": vals, "geometry": geoms}, crs="EPSG:4326"
189+
)
190+
191+
def test_numpy_vs_dask_lines_and_points(self):
192+
gdf = self._make_gdf()
193+
np_res = rasterize(
194+
gdf, column="value",
195+
width=512, height=512, bounds=(-180, -90, 180, 90),
196+
fill=0.0,
197+
)
198+
dk_res = rasterize(
199+
gdf, column="value",
200+
width=512, height=512, bounds=(-180, -90, 180, 90),
201+
fill=0.0, chunks=(64, 64),
202+
)
203+
np.testing.assert_array_equal(np_res.values, dk_res.values)
204+
205+
def test_numpy_vs_dask_sum_merge(self):
206+
"""Slicing must preserve overlap-sensitive merges (sum)."""
207+
gdf = self._make_gdf(seed=7, n_lines=80, n_points=80)
208+
np_res = rasterize(
209+
gdf, column="value",
210+
width=256, height=256, bounds=(-180, -90, 180, 90),
211+
fill=0.0, merge="sum",
212+
)
213+
dk_res = rasterize(
214+
gdf, column="value",
215+
width=256, height=256, bounds=(-180, -90, 180, 90),
216+
fill=0.0, merge="sum", chunks=(32, 32),
217+
)
218+
np.testing.assert_array_equal(np_res.values, dk_res.values)
219+
220+
def test_numpy_vs_dask_no_lines_no_points(self):
221+
"""Polygon-only input must still produce identical results."""
222+
gdf = gpd.GeoDataFrame(
223+
{"value": [1.0, 2.0]},
224+
geometry=[box(0, 0, 4, 4), box(6, 6, 10, 10)],
225+
crs="EPSG:4326",
226+
)
227+
np_res = rasterize(
228+
gdf, column="value",
229+
width=64, height=64, bounds=(0, 0, 10, 10), fill=0.0,
230+
)
231+
dk_res = rasterize(
232+
gdf, column="value",
233+
width=64, height=64, bounds=(0, 0, 10, 10),
234+
fill=0.0, chunks=(16, 16),
235+
)
236+
np.testing.assert_array_equal(np_res.values, dk_res.values)
237+
238+
def test_line_straddling_tile_boundary_renders_contiguously(self):
239+
"""A line whose pixel bbox overlaps two tiles must be present in
240+
both tile closures. If the props slice ever drops the geometry
241+
from one of the two tiles, the dask output will diverge from
242+
numpy along the seam (gap pixels) or carry the fill value where
243+
the line should be."""
244+
# Single horizontal line crossing the vertical seam at x=0 on a
245+
# bounds=(-10,-10,10,10) raster with width=80, chunks=(40,40).
246+
# Tile seam in pixel space sits at column 40; the line spans
247+
# x=-5..5 (pixel columns ~20..60), straddling the seam.
248+
gdf = gpd.GeoDataFrame(
249+
{"value": [7.0]},
250+
geometry=[LineString([(-5.0, 0.0), (5.0, 0.0)])],
251+
crs="EPSG:4326",
252+
)
253+
np_res = rasterize(
254+
gdf, column="value",
255+
width=80, height=80, bounds=(-10, -10, 10, 10), fill=0.0,
256+
)
257+
dk_res = rasterize(
258+
gdf, column="value",
259+
width=80, height=80, bounds=(-10, -10, 10, 10),
260+
fill=0.0, chunks=(40, 40),
261+
)
262+
# Whole-array equality: any dropped pixel along the seam fails.
263+
np.testing.assert_array_equal(np_res.values, dk_res.values)
264+
# Structural assertion: the value must appear on both sides of
265+
# the column-40 seam so a future bug that only renders one half
266+
# is caught even if the numpy reference also regressed.
267+
burned = (dk_res.values == 7.0)
268+
assert burned[:, :40].any(), "line missing from left tile"
269+
assert burned[:, 40:].any(), "line missing from right tile"

0 commit comments

Comments
 (0)