Skip to content

extract: write output asynchronously using a background thread per extract#315

Open
yuiseki wants to merge 1 commit intoosmcode:masterfrom
yuiseki:feat/extract-complete-ways-async-writer
Open

extract: write output asynchronously using a background thread per extract#315
yuiseki wants to merge 1 commit intoosmcode:masterfrom
yuiseki:feat/extract-complete-ways-async-writer

Conversation

@yuiseki
Copy link
Copy Markdown

@yuiseki yuiseki commented Apr 30, 2026

See #312. Split from #313 as requested.

In --strategy=complete_ways, Extract::write() calls
osmium::io::Writer::operator()(Buffer&&) synchronously for each extract.
With N output files (e.g. 16 tiles for a z=2 planet split), the main thread
serialises the flush of each writer in sequence.

This patch adds a per-Extract background writer thread. The main thread
fills a front buffer; when full it swaps with the back buffer under a mutex
and returns immediately. The background thread drains the back buffer to
osmium::io::Writer. Exceptions are captured in std::exception_ptr and
re-thrown on the next foreground synchronisation point. The Extract
destructor joins the background thread before returning.

Benchmark

All runs: --strategy=complete_ways, same output config, 32-core machine.

japan-260423.osm.pbf (~1 GB), 8-tile extraction:

Version Elapsed vs upstream
upstream 1m 29s baseline
this PR 1m 07s -25%

planet-260413.osm.pbf (~86 GB), 16-tile z=2 extraction:

Version Elapsed vs upstream
upstream 51m 36s baseline
this PR 42m 35s -18%

Output verified to be identical to the upstream result for all tiles.

Note on the libosmium-side approach (re: #313)

In #313 the question was raised whether this belongs in libosmium so all
commands benefit. That approach was implemented and measured as a
per-Writer background encode thread on a libosmium branch:

japan-260423.osm.pbf, 8-tile:

Version Elapsed vs upstream
upstream 1m 29s baseline
libosmium per-Writer async encode 1m 13s -18%
this PR (osmium-tool per-Extract) 1m 07s -25%

planet-260413.osm.pbf, 16-tile:

Version Elapsed vs upstream
upstream 51m 36s baseline
libosmium per-Writer async encode 45m 41s -11%
this PR (osmium-tool per-Extract) 42m 35s -18%

The 7% gap is consistent across both scales. In PBF mode,
Writer::write_buffer() submits tasks to the libosmium thread pool and
returns quickly, so a per-Writer encode thread backgrounds only a small
slice of the call. Backgrounding the full (*m_writer)(buffer) call from
Extract::write() covers the broader call-site overhead across N concurrent
extracts.

Both approaches can coexist. Sharing the numbers so the placement decision
is informed by data.

Note on exception safety (re: #313)

Acknowledged. Cross-thread state is limited to the two buffers swapped
under a single mutex. The osmium::io::Writer instance is owned exclusively
by the background thread after construction. Exceptions in the background
thread propagate to the foreground via std::exception_ptr and are re-thrown
at the next write() or close() call. Output correctness was verified by
per-tile object-count comparison against upstream (identical).

Specific test cases can be added if there is a scenario to cover.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant