Skip to content

uploaders/pulp: parallel chunks, direct artifacts, config concurrency#179

Open
andrewlukoshko wants to merge 1 commit intomasterfrom
feat-optimize-files-upload
Open

uploaders/pulp: parallel chunks, direct artifacts, config concurrency#179
andrewlukoshko wants to merge 1 commit intomasterfrom
feat-optimize-files-upload

Conversation

@andrewlukoshko
Copy link
Copy Markdown
Member

@andrewlukoshko andrewlukoshko commented Apr 6, 2026

  • Replace sequential filesplit-based disk splitting with parallel ThreadPoolExecutor chunk uploads using byte-range reads via temp files
  • Add direct artifact creation via ArtifactsApi.create() for files smaller than chunk_size, bypassing the 3-step upload+commit flow
  • Add configurable upload_workers setting (default 4) to SignNodeConfig, wired through Signer to PulpBaseUploader
  • Add per-file upload timing logs at INFO level for all three upload paths (direct artifact, single chunk, multi-chunk)
  • Remove filesplit dependency from requirements.txt
  • Add 6 new tests: parallel chunk upload, chunk error propagation, direct artifact creation, configurable workers, default workers, and upload timing log verification

Pulp Upload Benchmark Results

Server: pulpcore 3.68.1 (Docker pulp/pulp:3.68)
Client: pulpcore-client 3.68.0
Chunk size: 8 MB
Upload workers: 4 (ThreadPoolExecutor)
Iterations: 3 per scenario

Old vs New Upload Path

File Size Old (sequential) New (parallel/direct) Speedup Method
4 MB 0.724s 0.217s 3.3x Direct artifact creation (1 API call vs 3 + task poll)
50 MB 2.669s 1.922s 1.4x Parallel chunk upload (7 chunks, 4 workers)
500 MB 22.004s 11.127s 2.0x Parallel chunk upload (63 chunks, 4 workers)

Concurrency Scaling (500 MB file, 63 chunks)

Workers Mean Time Speedup vs Sequential
Sequential 22.494s 1.00x
1 21.750s 1.03x
2 15.358s 1.46x
4 11.303s 1.99x
8 11.371s 1.98x

- Replace sequential filesplit-based disk splitting with parallel
  ThreadPoolExecutor chunk uploads using byte-range reads via temp files
- Add direct artifact creation via ArtifactsApi.create() for files
  smaller than chunk_size, bypassing the 3-step upload+commit flow
- Add configurable upload_workers setting (default 4) to SignNodeConfig,
  wired through Signer to PulpBaseUploader
- Add per-file upload timing logs at INFO level for all three upload
  paths (direct artifact, single chunk, multi-chunk)
- Remove filesplit dependency from requirements.txt
- Add 6 new tests: parallel chunk upload, chunk error propagation,
  direct artifact creation, configurable workers, default workers,
  and upload timing log verification
@andrewlukoshko andrewlukoshko changed the title uploaders/pulp: parallel chunks, direct artifact creation, configurable concurrency uploaders/pulp: parallel chunks, direct artifacts, config concurrency Apr 6, 2026
@andrewlukoshko andrewlukoshko force-pushed the feat-optimize-files-upload branch from 1e9da96 to 73dc58a Compare April 6, 2026 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant