Skip to content

Fix: blocking PATCH requests and implement true streaming for chunked…#120

Open
njuptlzf wants to merge 3 commits intocloudflare:mainfrom
njuptlzf:optimize_large_image_push
Open

Fix: blocking PATCH requests and implement true streaming for chunked…#120
njuptlzf wants to merge 3 commits intocloudflare:mainfrom
njuptlzf:optimize_large_image_push

Conversation

@njuptlzf
Copy link
Copy Markdown

@njuptlzf njuptlzf commented Feb 6, 2026

Context

Previously, the Registry Worker suffered from significant performance and stability issues during large image pushes. When a client (e.g., Docker, Podman, or regctl) sent a PATCH request without a Content-Length header (common in chunked uploads), the Worker would call await req.blob().

The Problem

  1. Memory Exhaustion: await req.blob() buffers the entire chunk into the Worker's memory. Large chunks (50MB+) often hit the memory limit, causing OOM.
  2. Timeouts: Buffering the entire body before starting the R2 upload consumes the 10-minute request limit unnecessarily.
  3. Progress Bar Jitter: The client-side progress bar would "freeze" while the Worker buffered, then "jump" when the R2 upload finally started.

Changes

  • Non-blocking Size Detection: Introduced getStreamSize in src/utils.ts. It extracts the chunk size from Content-Length or Content-Range using robust regex, avoiding body consumption.
  • True Streaming PATCH: Refactored the PATCH route in src/router.ts to pass req.body (a ReadableStream) directly to the R2 client.
  • Robust Stream Handling:
    • Updated the limit function in src/chunk.ts to use a TransformStream pattern, ensuring proper backpressure handling and avoiding stream hangs.
    • Improved PUSH_COMPATIBILITY_MODE in src/registry/r2.ts to consume teed streams concurrently using Promise.all, preventing deadlocks.
  • Protocol Compliance: Fixed Content-Range parsing to handle the bytes prefix and ensured the Range response header follows the standard 0-N format.
  • Parallel Tee Consumption (Pull‑through): In GET /v2/:name/blobs/:digest, when PUSH_COMPATIBILITY_MODE !== "none", immediately start R2 upload in parallel with returning the response to the client. This reduces backpressure recovery time from 10–30 seconds to 1–5 seconds during large‑layer pulls 1 .

Impact

  • Stability: Massive reduction in OOM errors during pushes.
  • Performance: Uploads to R2 now start immediately as the first byte arrives from the client.
  • UX: Smooth, continuous progress bars in CLI tools.
  • Pull‑through Throughput: Large‑layer copies see ~50% higher throughput and ~66% lower elapsed time due to much shorter backpressure stalls.

@njuptlzf
Copy link
Copy Markdown
Author

njuptlzf commented Feb 25, 2026

push

before:

# time regctl image copy --fast app.test.com/test/image/app:7.1.001 r2.test.site/test/image/app:7.1.001
time=2026-02-24T16:54:02.797+08:00 level=WARN msg="API field has been deprecated" api=default host=r2.test.site
time=2026-02-24T16:54:02.797+08:00 level=WARN msg="Changing reqPerSec settings for registry" orig=3 new=4 host=r2.test.site
time=2026-02-24T16:54:02.819+08:00 level=WARN msg="failed to setup CA pool" err="failed to load host specific ca (registry: r2.test.site): pem.Decode is nil: system"
Manifests: 5/5 | Blobs: 1.980GB copied, 32.000B skipped | Elapsed: 984s
r2.test.site/test/image/app:7.1.001

after:

# time regctl image copy --fast app.test.com/test/image/app:7.1.003.009 r2.test.site/test/image/app:7.1.003.009
time=2026-02-25T18:03:48.703+08:00 level=WARN msg="API field has been deprecated" api=default host=r2.test.site
time=2026-02-25T18:03:48.704+08:00 level=WARN msg="Changing reqPerSec settings for registry" orig=3 new=4 host=r2.test.site
time=2026-02-25T18:03:48.727+08:00 level=WARN msg="failed to setup CA pool" err="failed to load host specific ca (registry: r2.test.site): pem.Decode is nil: system"
Manifests: 5/5 | Blobs: 1.013GB copied, 833.410MB skipped | Elapsed: 330s
r2.test.site/test/image/app:7.1.003.009

real    5m30.821s
user    0m13.727s
sys     0m12.754s

metrics

Metric Before After Change
Copied Data 1.980 GB 1.013 GB
Elapsed Time 984 s 330 s -66%
Throughput ≈2.01 MB/s ≈3.07 MB/s +52%

@njuptlzf njuptlzf force-pushed the optimize_large_image_push branch 2 times, most recently from c0b5bc5 to 889cc09 Compare February 25, 2026 11:35
roshanjonah added a commit to roshanjonah/serverless-registry that referenced this pull request Mar 14, 2026
Apply streaming fixes from upstream PR cloudflare#120:
- Eliminate blocking await req.blob() in PATCH handler by extracting
  size from Content-Length/Content-Range headers via getStreamSize()
- Fix limit() to use TransformStream for proper backpressure handling
- Parallel tee consumption for pull-through layer copies
- Run R2 part upload and helper object write in parallel

Additional push tool improvements:
- Restore chunk size to 95MB (was 10MB, causing excessive round-trips)
- Reduce push concurrency from 5 to 2 to avoid R2 contention
- Add exponential backoff on retry (5s, 10s, 20s)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance Bottleneck and OOM in PATCH Requests During Large Image Pushes

1 participant