Skip to content

[SharovBot] fix: resolve WebSocket client deadlock on oversized message#20897

Open
erigon-copilot[bot] wants to merge 2 commits intomainfrom
fix/websocket-large-call-deadlock
Open

[SharovBot] fix: resolve WebSocket client deadlock on oversized message#20897
erigon-copilot[bot] wants to merge 2 commits intomainfrom
fix/websocket-large-call-deadlock

Conversation

@erigon-copilot
Copy link
Copy Markdown
Contributor

Summary

  • Fix a race condition in the RPC client dispatch loop that caused TestWebsocketLargeCall to hang indefinitely (deadlock)
  • When a client sends a message exceeding the server's read limit, the server closes the connection. If the client's write succeeds (TCP-buffered) while the read loop has already exited, the in-flight request was orphaned — cancelAllRequests skipped it (for potential reconnect reuse), but no reconnect happened, leaving op.resp unclosed and op.wait() blocking forever
  • The fix detects this scenario in the reqSent handler: if the write succeeded but the read loop is dead, the in-flight request is explicitly failed with errDead

Test plan

  • go build ./... passes
  • go test -race -count=1 -timeout 2m ./rpc -run TestWebsocketLargeCall passes 10/10 consecutive runs
  • go test -race -count=1 -timeout 5m ./rpc/... full package suite passes

🤖 Generated with Claude Code

Fix a race condition in the RPC client dispatch loop that could cause
TestWebsocketLargeCall (and real clients) to hang indefinitely.

When a client sends a message that exceeds the server's read limit, the
server closes the connection. If the client's write succeeds (data was
TCP-buffered) while the read loop has already detected the closed
connection, the in-flight request was orphaned: cancelAllRequests
intentionally skipped it (to allow re-registration on reconnect), but
no reconnect occurred because the write didn't fail. The request's resp
channel was never closed, causing op.wait() to block forever.

The fix detects this specific scenario in the reqSent handler: if the
write succeeded (err == nil) but the read loop is already dead
(!reading), the in-flight request is explicitly failed with errDead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-authored-by: Giulio Rebuffo <giulio.rebuffo@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants