extract: write output asynchronously using a background thread per extract#315
Open
yuiseki wants to merge 1 commit intoosmcode:masterfrom
Open
extract: write output asynchronously using a background thread per extract#315yuiseki wants to merge 1 commit intoosmcode:masterfrom
yuiseki wants to merge 1 commit intoosmcode:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See #312. Split from #313 as requested.
In
--strategy=complete_ways,Extract::write()callsosmium::io::Writer::operator()(Buffer&&)synchronously for each extract.With N output files (e.g. 16 tiles for a z=2 planet split), the main thread
serialises the flush of each writer in sequence.
This patch adds a per-
Extractbackground writer thread. The main threadfills a front buffer; when full it swaps with the back buffer under a mutex
and returns immediately. The background thread drains the back buffer to
osmium::io::Writer. Exceptions are captured instd::exception_ptrandre-thrown on the next foreground synchronisation point. The
Extractdestructor joins the background thread before returning.
Benchmark
All runs:
--strategy=complete_ways, same output config, 32-core machine.japan-260423.osm.pbf (~1 GB), 8-tile extraction:
planet-260413.osm.pbf (~86 GB), 16-tile z=2 extraction:
Output verified to be identical to the upstream result for all tiles.
Note on the libosmium-side approach (re: #313)
In #313 the question was raised whether this belongs in libosmium so all
commands benefit. That approach was implemented and measured as a
per-
Writerbackground encode thread on a libosmium branch:japan-260423.osm.pbf, 8-tile:
planet-260413.osm.pbf, 16-tile:
The 7% gap is consistent across both scales. In PBF mode,
Writer::write_buffer()submits tasks to the libosmium thread pool andreturns quickly, so a per-
Writerencode thread backgrounds only a smallslice of the call. Backgrounding the full
(*m_writer)(buffer)call fromExtract::write()covers the broader call-site overhead across N concurrentextracts.
Both approaches can coexist. Sharing the numbers so the placement decision
is informed by data.
Note on exception safety (re: #313)
Acknowledged. Cross-thread state is limited to the two buffers swapped
under a single mutex. The
osmium::io::Writerinstance is owned exclusivelyby the background thread after construction. Exceptions in the background
thread propagate to the foreground via
std::exception_ptrand are re-thrownat the next
write()orclose()call. Output correctness was verified byper-tile object-count comparison against upstream (identical).
Specific test cases can be added if there is a scenario to cover.