feat(storage/s3): make S3 socket timeout configurable#233
Conversation
The hardcoded socketTimeout: 3000 (introduced in d682bc0 along with stream-based downloads) aborts in-flight R2/S3 GetObject body sockets whenever the response stream is paused for >3s by backpressure from either the client response or the merge upload, surfacing as ECONNRESET on /download/<id>. Expose STORAGE_S3_SOCKET_TIMEOUT_MS so deployments hitting this can raise it without forking. Default raised to 10000 to give backpressure pauses more headroom on first-download proxy paths.
3a972b0 to
b00f196
Compare
|
Thanks for working on this. This looks very useful. I hit this in a Kubernetes deployment using S3 storage backed by SeaweedFS over a 10G internal network with NVMe storage. SeaweedFS metrics showed very fast PUT latency, around 50-100ms p99, with 200 responses and no volume write failures, but cache-server still logged intermittent Making the S3 timeout configurable would help a lot for setups like this. |
I made a Go version of this server to investigate the performance issue: https://github.com/MxOrbit/GitHubActionCacheServer It is designed to be fully compatible with the original server, except for GCS support. In most deployments it can be deployed by replacing the container image while keeping the same environment variables. Unfortunately, after testing it, the performance issue appears to be related to the filer IP lifecycle in SeaweedFS ( seaweedfs/seaweedfs#9692 ) rather than the cache server implementation. |
Problem
The S3 client uses a hardcoded
socketTimeout: 3000(ms), introduced in d682bc0 ("perf: reduce memory usage by using node streams") together with stream-based downloads.Because this timeout applies to the GetObject body socket, any pause longer than 3s in
pumpPartsToStreams(lib/storage.ts) — caused by backpressure from either the response stream or the merge upload (mergerStream,partSize: 5MB,queueSize: 1) — aborts the in-flight S3/R2 socket withECONNRESET. The handler has already streamed the headers, so Nitro then surfaces an unhandledH3Error: abortedfollowed byERR_HTTP_HEADERS_SENT.This is reproducible on R2 with non-trivial caches: the very first download of any cache entry always proxies through the server (since
mergedAtis null until the merge finishes), and 3s of stalled read while backpressure resolves is easy to hit.Fix
Expose
STORAGE_S3_SOCKET_TIMEOUT_MSso deployments hitting this can raise the limit without forking. Default raised to10000to give backpressure pauses more headroom on first-download proxy paths.Notes
lib/storage.ts; new env entry inlib/schemas.ts.