Skip to content

orthophoto merge: parallelize the block loop (in-order writer)#2035

Open
Chouffe wants to merge 2 commits into
OpenDroneMap:masterfrom
Chouffe:parallel-orthophoto-merge-2034
Open

orthophoto merge: parallelize the block loop (in-order writer)#2035
Chouffe wants to merge 2 commits into
OpenDroneMap:masterfrom
Chouffe:parallel-orthophoto-merge-2034

Conversation

@Chouffe

@Chouffe Chouffe commented Jun 19, 2026

Copy link
Copy Markdown

Summary

Parallelizes the per-block blend loop in merge(), behind a max_workers parameter (default 1 = unchanged serial behavior), wired from stages/splitmerge.py as args.max_concurrency.

Stacked on #2036 (gated reads). This PR currently shows both commits; once #2036 merges it reduces to just the parallelization. The gating is a prerequisite — see below.

Why it needs the gated reads (#2036)

The block loop is embarrassingly parallel, but a naive thread pool stalls: rasterio's boundless=True reads serialize a VRT (_serialize_xml) on every read, which under concurrency dominates and effectively hangs the merge (workers park in _serialize_xml). #2036 replaces those with gated reads (no VRT for in-bounds windows), which makes a parallel loop viable.

Parallelization

  • Parallel compute + single in-order writer. Blocks are computed in a ThreadPoolExecutor; one thread writes them in strict block order with a bounded look-ahead (cap = 2 * max_workers). A naive "write each block as it finishes" version can't flush incrementally — GDAL needs writes in row-major (block) order — so dirty blocks accumulate in RAM until it OOMs; the in-order writer keeps writes sequential/flushable and memory small.
  • Bounded GDAL block cache during the merge (restored on exit) so dirty-tile eviction/flush under GDAL's global lock stays prompt.
  • Per-thread source handles — GDAL/rasterio datasets are not thread-safe.
  • Preserves --merge-skip-blending.

max_workers <= 1 is a plain serial compute-then-write path, byte-for-byte identical to the original loop.

Correctness

  • Deterministic across worker counts. On a real 15.9 Gpx survey (3 submodels, ~61k blocks), the merged orthophoto is byte-for-byte identical at max_workers=1 (serial, ~28 min) and max_workers=16 (~8 min) — the in-order writer makes the result independent of worker count, and the serial/default path completes cleanly with no read↔write stall.
  • End-to-end. Verified on the same survey: the merge completes with zero stalled blocks and produces a valid orthophoto, pixel-identical to the serial baseline (confirmed for both LZW-compressed and uncompressed output).

Notes

  • No new dependencies.
  • Default behavior unchanged (max_workers=1).

@Chouffe Chouffe changed the title orthophoto merge: parallelize block processing with parallel_map orthophoto merge: parallelize block processing (in-order writer + gated reads) Jun 23, 2026
@Chouffe Chouffe force-pushed the parallel-orthophoto-merge-2034 branch from 4fdec37 to bbd0295 Compare June 23, 2026 21:34
Chouffe added 2 commits June 23, 2026 23:54
merge() reads each source window with rasterio boundless=True, which builds an
in-memory VRT and serializes it via Python's ElementTree (_serialize_xml) on
every read — a large per-read overhead on big merges (tens of thousands of
blocks x 3 passes x N submodels).

_read_window_gated() keeps identical output but avoids the VRT for the common
cases: a plain non-boundless read when the window is fully inside the source,
zeros when fully outside (== the 0 nodata fill boundless produces there), and
boundless only for the rare partial-edge windows.

Pixel-identical (verified: hundreds of fully-in-bounds windows across a real
merge grid compared boundless vs plain read, 0 mismatches). Serial; no behavior
change beyond the speedup. Also a prerequisite for parallelizing the merge:
boundless's per-read VRT serialization is pathological under concurrency.
…orkers)

With boundless reads gated (previous commit), parallelize the per-block blend
loop. Blocks are computed in a ThreadPoolExecutor and written from a single
thread in strict block order with a bounded look-ahead (cap = 2 * max_workers),
so writes to the compressed, tiled GeoTIFF stay sequential and incrementally
flushable and memory stays small. GDAL's block cache is bounded during the merge
(restored on exit). Per-thread source handles (GDAL/rasterio datasets are not
thread-safe). Preserves --merge-skip-blending.

Wired from stages/splitmerge.py as max_workers=args.max_concurrency.
max_workers<=1 is byte-for-byte identical to the original serial loop.
@Chouffe Chouffe force-pushed the parallel-orthophoto-merge-2034 branch from bbd0295 to fa47f06 Compare June 23, 2026 21:56
@Chouffe Chouffe changed the title orthophoto merge: parallelize block processing (in-order writer + gated reads) orthophoto merge: parallelize the block loop (in-order writer) Jun 23, 2026
@smathermather smathermather requested a review from spwoodcock June 25, 2026 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant