fix(github-sync): retry transient network errors during GraphQL batch#144
Merged
fix(github-sync): retry transient network errors during GraphQL batch#144
Conversation
GitHub's GraphQL endpoint occasionally closes connections mid-response, raising EOFError (and siblings like Errno::ECONNRESET / OpenSSL::SSL::SSLError) out of Net::HTTP. The retry loop in User::GithubSyncable#github_graphql_request only caught Net::OpenTimeout/Net::ReadTimeout, so any connection reset escaped the rescue and failed the entire batch of users without retrying — observed in production as EOFError "end of file reached" from UpdateGithubDataJob. Extract the full set of transient network error classes into TRANSIENT_NETWORK_ERRORS and rescue them uniformly so they trigger the existing backoff/retry instead of aborting the batch. Added regression tests covering both paths: retry-then-succeed and retries-exhausted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a production
EOFError: end of file reachedraised fromUpdateGithubDataJob→User::GithubSyncable#github_graphql_request.Root cause
GitHub's GraphQL endpoint occasionally closes connections mid-response, surfacing in Ruby as
EOFError(and siblings likeErrno::ECONNRESET,OpenSSL::SSL::SSLError). The retry loop ingithub_graphql_requestonly rescuedNet::OpenTimeout/Net::ReadTimeout, so any connection reset escaped the rescue and failed the entire batch of users without retrying. TheUpdateGithubDataJobwould bail out partway through, leaving users un-synced.Fix
Extracted the set of transient network errors into a
TRANSIENT_NETWORK_ERRORSconstant and rescue them uniformly in the retry loop, so they trigger the existing backoff/retry (2 attempts, 2s sleep) just like timeouts already did. This matches the user's request: don't skip the error — make it work by actually retrying.Test plan
test/models/concerns/user/github_syncable_test.rbwith two cases:batch_sync_github_data!retries onEOFErrorand succeeds on the second attemptbatch_sync_github_data!returns a network error after exhausting retriesrails test— full suite (334 tests) passes.rubocopclean.🤖 Generated with Claude Code