Skip to content

fix: reduce retry attempts and use exponential backoff for presign/stage upload#401

Merged
youngsofun merged 1 commit intodatabendlabs:mainfrom
youngsofun:fix/retry-backoff
May 8, 2026
Merged

fix: reduce retry attempts and use exponential backoff for presign/stage upload#401
youngsofun merged 1 commit intodatabendlabs:mainfrom
youngsofun:fix/retry-backoff

Conversation

@youngsofun
Copy link
Copy Markdown
Member

Summary

Fixes retry behavior for presign upload/download and stage upload.

Changes

  • MaxRetryAttempts: 20 → 5 for both PresignClient and stage upload
  • Exponential backoff: replace linear attempts*100ms with min(1s * 2^n, 30s) (max ~31s total wait across 5 retries)
  • Remove sinceStart timeout: the 900s wall-clock check incorrectly included actual file transfer time — a large file that took 800s to transfer would exhaust retries immediately after one failure
  • Reduce timeouts: connectTimeout 600s→30s, write/readTimeout 900s/600s→300s (300s is still generous for large files; 600s connect timeout was meaningless)
  • Fix false retry on NonRetryableHttpStatusException: short-circuit isRetryableIOException before checking error keywords, preventing response bodies containing "timeout" from triggering retries
  • Remove unused RetryableHttpFailure class

Testing

Updated testDownloadStreamPresignedRetryExhaustionRaisesSQLException to reflect new attempt count (6) and extended timeout to 60s to accommodate exponential backoff.

…age upload

- MaxRetryAttempts: 20 -> 5 for both PresignClient and stage upload
- Replace linear sleep (attempts*100ms) with exponential backoff (min(1s*2^n, 30s))
- Remove sinceStart >= 900s check: it incorrectly included file transfer time,
  causing large file uploads to exhaust retries after a single slow transfer
- Reduce timeouts: connectTimeout 600s->30s, write/readTimeout 900s/600s->300s
- Short-circuit isRetryableIOException for NonRetryableHttpStatusException to
  prevent body keywords like "timeout" from triggering false retries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@youngsofun youngsofun merged commit 0b256f5 into databendlabs:main May 8, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant