orthophoto merge: parallelize the block loop (in-order writer)#2035
Open
Chouffe wants to merge 2 commits into
Open
orthophoto merge: parallelize the block loop (in-order writer)#2035Chouffe wants to merge 2 commits into
Chouffe wants to merge 2 commits into
Conversation
4fdec37 to
bbd0295
Compare
merge() reads each source window with rasterio boundless=True, which builds an in-memory VRT and serializes it via Python's ElementTree (_serialize_xml) on every read — a large per-read overhead on big merges (tens of thousands of blocks x 3 passes x N submodels). _read_window_gated() keeps identical output but avoids the VRT for the common cases: a plain non-boundless read when the window is fully inside the source, zeros when fully outside (== the 0 nodata fill boundless produces there), and boundless only for the rare partial-edge windows. Pixel-identical (verified: hundreds of fully-in-bounds windows across a real merge grid compared boundless vs plain read, 0 mismatches). Serial; no behavior change beyond the speedup. Also a prerequisite for parallelizing the merge: boundless's per-read VRT serialization is pathological under concurrency.
…orkers) With boundless reads gated (previous commit), parallelize the per-block blend loop. Blocks are computed in a ThreadPoolExecutor and written from a single thread in strict block order with a bounded look-ahead (cap = 2 * max_workers), so writes to the compressed, tiled GeoTIFF stay sequential and incrementally flushable and memory stays small. GDAL's block cache is bounded during the merge (restored on exit). Per-thread source handles (GDAL/rasterio datasets are not thread-safe). Preserves --merge-skip-blending. Wired from stages/splitmerge.py as max_workers=args.max_concurrency. max_workers<=1 is byte-for-byte identical to the original serial loop.
bbd0295 to
fa47f06
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Parallelizes the per-block blend loop in
merge(), behind amax_workersparameter (default1= unchanged serial behavior), wired fromstages/splitmerge.pyasargs.max_concurrency.Why it needs the gated reads (#2036)
The block loop is embarrassingly parallel, but a naive thread pool stalls: rasterio's
boundless=Truereads serialize a VRT (_serialize_xml) on every read, which under concurrency dominates and effectively hangs the merge (workers park in_serialize_xml). #2036 replaces those with gated reads (no VRT for in-bounds windows), which makes a parallel loop viable.Parallelization
ThreadPoolExecutor; one thread writes them in strict block order with a bounded look-ahead (cap = 2 * max_workers). A naive "write each block as it finishes" version can't flush incrementally — GDAL needs writes in row-major (block) order — so dirty blocks accumulate in RAM until it OOMs; the in-order writer keeps writes sequential/flushable and memory small.--merge-skip-blending.max_workers <= 1is a plain serial compute-then-write path, byte-for-byte identical to the original loop.Correctness
max_workers=1(serial, ~28 min) andmax_workers=16(~8 min) — the in-order writer makes the result independent of worker count, and the serial/default path completes cleanly with no read↔write stall.Notes
max_workers=1).