Skip to content

[BUG] R2 cache population fails with 180K+ objects — no resume after failure #1173

@beobungbu

Description

@beobungbu

Description

populateCache in v1.18.0 fails when uploading ~180,000 SSG cache entries to R2. The worker binding approach (introduced in v1.18.0 to bypass API rate limits) still encounters frequent 502 Bad Gateway errors from the local dev worker, and after 5 retry attempts on a single object, the entire process crashes — requiring a full restart from object 0.

Environment

  • @opennextjs/cloudflare: 1.18.0
  • next: 16.1.5
  • wrangler: latest
  • Node.js: 22.22.1
  • macOS (Apple Silicon)
  • R2 bucket region: auto
  • Cloudflare Workers paid plan

Steps to Reproduce

  1. Next.js app with revalidate = false on all public routes
  2. generateStaticParams produces 180,406 cache entries
  3. Run opennextjs-cloudflare deploy
  4. populateCache starts uploading to R2 via worker binding

Observed Behavior

Run 1 (v1.18.0 worker binding):

  • Upload starts at ~30 it/s
  • Sporadic 502 Bad Gateway errors every ~1,000-2,000 objects (retry succeeds)
  • At 64% (116,690/180,406), a burst of "fetch failed" errors occurs
  • After 5 failed attempts on one object, the entire process crashes
  • Total time wasted: ~65 minutes

Run 2 (restart):

  • Starts from 0/180,406 again — no resume capability
  • Same 502 pattern begins immediately
  • Expected to fail again around 60-70%

Error types observed:

ERROR Attempt 1 to write "...cache" failed with a retryable error: Worker returned a 502 Bad Gateway response. Retrying...
ERROR Attempt 1 to write "...cache" failed with a retryable error: Failed to send request to R2 worker: fetch failed. Retrying...
ERROR Attempt 1 to write "...cache" failed with a retryable error: put: Unspecified error (0). Retrying...

Previous behavior (v1.17.3):

  • Used wrangler r2 bulk put — hit Cloudflare API rate limit (1,200 req/5min)
  • Failed at ~50/180,406 objects consistently

Expected Behavior

  1. Resume from last successful object instead of restarting from 0
  2. More aggressive retry/backoff for 502 errors (the worker binding should be more resilient than API calls)
  3. Batch tracking/checkpointing so progress isn't lost after hours of uploading

Suggested Improvements

  1. Track uploaded objects (e.g., write a .progress manifest file) so populateCache can skip already-uploaded entries on retry
  2. Increase retry attempts for large deployments (5 attempts may not be enough when 502s are systemic)
  3. Reduce concurrency option — allow users to set write concurrency lower than default to avoid overwhelming the worker
  4. Graceful degradation — on fatal failure, report how many objects succeeded and provide a resume command

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions