Skip to content

feat(client): stream ParallelGet in-order to an io.Writer#360

Draft
worstell wants to merge 1 commit into
mainfrom
worstell/parallel-get-stream
Draft

feat(client): stream ParallelGet in-order to an io.Writer#360
worstell wants to merge 1 commit into
mainfrom
worstell/parallel-get-stream

Conversation

@worstell

@worstell worstell commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

ParallelGet — and the DownloadGitSnapshot helper that wraps it — previously wrote chunks to an io.WriterAt, which requires a seekable destination (e.g. a temp file) and so prevents a consumer from overlapping the download with processing. They now fetch chunks in parallel but emit in-order bytes to a plain io.Writer via a bounded reorder buffer, letting a streaming consumer (e.g. a decompress/extract pipeline) run concurrently with the download.

A concurrency-sized window caps fetched-but-unwritten chunks, and the reorder buffer is a ring of that many slots, bounding peak memory to O(concurrency * chunkSize) regardless of object size or consumer speed. A chunk whose body length differs from its requested range (short, or overlong from a backend that ignored the range) is rejected rather than spliced or truncated. Revision safety (ETag pinning via If-Range), empty-object handling, range-ignore degrade, and the concurrency == 1 shortcut are unchanged.

The io.WriterAt variant is removed: no consumer benefited from scatter-writes, and the only use — download-to-temp-file then extract — is slower than streaming because it gives up download/extract overlap. *os.File satisfies io.Writer, so the CLI caller is unaffected.

Tests cover in-order reassembly, out-of-order completion, the single-worker/empty-object/range-ignore fallbacks, ETag-mismatch and overlong-chunk rejection, and error propagation, all under -race.

@worstell worstell force-pushed the worstell/parallel-get-stream branch from 4d97fb3 to 83e3e16 Compare June 26, 2026 00:30
@worstell worstell changed the title feat(client): add ParallelGetStream for in-order streaming parallel downloads feat(client): stream ParallelGet in-order to an io.Writer Jun 26, 2026
ParallelGet (and the DownloadGitSnapshot helper that wraps it) previously wrote
chunks to an io.WriterAt, which requires a seekable destination (e.g. a temp
file) and so prevents a consumer from overlapping the download with processing.
They now fetch chunks in parallel but emit in-order bytes to a plain io.Writer
via a bounded reorder buffer, letting a streaming consumer (e.g. a
decompress/extract pipeline) run concurrently with the download.

A concurrency-sized window caps fetched-but-unwritten chunks, and the reorder
buffer is a ring of that many slots, bounding peak memory to
O(concurrency * chunkSize) regardless of object size or consumer speed. A chunk
whose body length differs from its requested range (short or overlong, e.g. a
backend that ignored the range) is rejected rather than splicing or truncating.
Revision safety (ETag pinning via If-Range), empty-object handling, range-ignore
degrade, and the concurrency==1 shortcut are unchanged.

The io.WriterAt variant is removed: no consumer benefited from scatter-writes,
and the only use (download-to-temp-file then extract) is slower than streaming
because it gives up download/extract overlap.

Amp-Thread-ID: https://ampcode.com/threads/T-019ef6a9-a407-7389-bc43-001405e3ae9e
Co-authored-by: Amp <amp@ampcode.com>
@worstell worstell force-pushed the worstell/parallel-get-stream branch from 83e3e16 to 250ab5b Compare June 26, 2026 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant