Retry upstream "unexpected EOF" errors instead of returning them to the client#3726
Retry upstream "unexpected EOF" errors instead of returning them to the client#3726hex-ci wants to merge 16 commits into
Conversation
Co-Authored-By: Hex <hex@codeigniter.org.cn>
Retry upstream "unexpected EOF" errors instead of returning them to the client
|
This pull request targeted The base branch has been automatically changed to |
There was a problem hiding this comment.
Code Review
This pull request introduces handling and automatic retries for transient "unexpected EOF" errors in the auth conductor. It adds a helper function isUnexpectedEOFError to detect these errors and updates shouldRetryAfterError to trigger an immediate retry when they occur, provided retries are allowed. Additionally, unit tests have been added to verify this behavior and ensure that retry limits are respected. There are no review comments, so I have no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7a863ebfa9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Co-Authored-By: Hex <hex@codeigniter.org.cn>
Limit immediate unexpected-EOF retry to statusless errors
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5a0adba8ec
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Co-Authored-By: Hex <hex@codeigniter.org.cn>
Honor per-auth request_retry override for unexpected-EOF retries
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c67f875192
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Co-Authored-By: Hex <hex@codeigniter.org.cn>
…auth Record the EOF-producing auth at error time for per-auth retry overrides
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0473c0d3d6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Co-Authored-By: Hex <hex@codeigniter.org.cn>
…-interval Keep unexpected-EOF retries working when max-retry-interval is 0
|
I hit a Grok Build/xAI case that appears to match this PR while testing the Environment:
Observed during upgrade smoke:
I then reproduced safely with an isolated Code inspection of No token or auth values are included here. |
|
I also built and tested this together with PR #3734 on top of the Build under test:
Focused tests passed: That includes the new conductor tests from this PR for statusless Live smoke results against the combined build: I could not force a deterministic live No token or auth values are included here. |
Summary
When an upstream SSE/HTTP response is cut off mid-flight, executors surface the read failure as a plain
unexpected EOF(e.g.scanner.Err()inclaude_executor.go:512-519). This error carries no HTTP status (statusCodeFromErrorreturns0), soManager.shouldRetryAfterErrorfell through its 429-only retry path and returned the error straight to the client. Some clients abort the whole request on seeingAPI Error: unexpected EOF.This makes
shouldRetryAfterErrortreatunexpected EOFlike a normal retryable API error: it now retries immediately (honoring the configuredrequest-retrycount) rather than surfacing the error.isUnexpectedEOFErrormatches botherrors.Is(err, io.ErrUnexpectedEOF)and a case-insensitive"unexpected eof"substring, so it works through wraps and across executors. Because the detection lives in the shared conductor retry loop, it covers non-streamingExecuteand the streaming bootstrap phase (EOF before the first payload, surfaced viastreamBootstrapError, whoseError()/Unwrap()preserve the cause).Returning
wait=0is deliberate: a statusless EOF leaves the credential withNextRetryAfter == 0, soisAuthBlockedForModelkeeps it immediately re-pickable; this also avoids thewait > maxWaitearly-return that would otherwise block a retry when the error happens to carry a 5xx status (which sets a 60s cooldown > the default 30smax-retry-interval).Scope note: an EOF that occurs mid-stream (after bytes were already streamed to the client) is forwarded through
wrapStreamResultand is intentionally not retried, since partial content has already been sent.Tests
TestManager_ShouldRetryAfterError_RetriesUnexpectedEOF— statuslessunexpected EOFand wrappedio.ErrUnexpectedEOFretry immediately (wait==0); stops oncerequest-retryis exhausted.TestManager_ShouldRetryAfterError_UnexpectedEOFRespectsRetryDisabled— no retry whenrequest-retry=0.go build ./...,go vet ./sdk/cliproxy/auth/, andgo test ./...all pass.