[grid] Skip the TCP tunnel read-idle close when reads are paused#17578
Conversation
Review Summary by QodoApply WebSocket frame fast path on Node and fix backpressure-induced tunnel teardown
WalkthroughsDescription• Apply WebSocket frame fast path on Node side, bypassing Message layer • Buffer pre-handshake frames and drain on upgrade completion • Skip TCP tunnel read-idle close when backpressure pauses reads • Safely truncate WebSocket close-frame reasons to RFC 6455 limit Diagramflowchart LR
A["Node WebSocket<br/>Upgrade"] -->|"install<br/>WebSocketFrameProxy"| B["Direct Frame<br/>Forwarding"]
B -->|"bypass Message<br/>layer"| C["Reduced Allocation<br/>& Executor Hops"]
D["Pre-handshake<br/>Frames"] -->|"buffer in<br/>arrival order"| E["Drain on<br/>onUpgrade"]
F["Read-idle Event"] -->|"check autoRead<br/>flag"| G{"Backpressure<br/>Active?"}
G -->|"yes"| H["Log & Ignore"]
G -->|"no"| I["Close Tunnel"]
J["Close Reason"] -->|"truncate safely<br/>at char boundary"| K["RFC 6455<br/>Compliant"]
File Changes1. java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java
|
Code Review by Qodo
1. IdleCloseHandlerTest constructs IdleStateEvent
|
a7188f9 to
a525a9c
Compare
|
Code review by qodo was updated up to the latest commit a525a9c |
a525a9c to
ae83518
Compare
|
Code review by qodo was updated up to the latest commit ae83518 |
The read-idle close that TcpUpgradeTunnelHandler installs after the tunnel is established (introduced by db9b07a, 2026-03-11, "[grid] Router WebSocket handle dropped close frames, idle disconnects, high-latency proxying", SeleniumHQ#17197) tears down both tunnel channels when no bytes have been received for 120 seconds. That is the right behaviour when an intermediate load balancer has silently dropped the TCP connection, but it interacts badly with the backpressure mirroring added in 5cf2e2a (2026-05-26, "[grid] Apply TCP backpressure across the WebSocket tunnel handler", SeleniumHQ#17543): when the peer's outbound buffer crosses its high-water mark, TcpTunnelHandler sets autoRead=false on this side, no channelRead events fire, and the read-idle timer reaches its threshold even though the stall is by design. A sustained slow consumer for more than two minutes will therefore have the tunnel torn down underneath it. Gate the close in IdleCloseHandler.userEventTriggered on the channel's autoRead flag: while reads are paused by backpressure, log at FINE and ignore the event. As soon as the peer drains and TcpTunnelHandler restores autoRead=true the read-idle clock starts again from a fresh read, so a legitimately dropped connection is still detected within the same window once traffic resumes. IdleCloseHandler is promoted from a private nested class to package-private so a focused EmbeddedChannel unit test can exercise both branches: the close-both-channels behaviour on a normal idle event, and the ignored-while-paused behaviour with autoRead=false.
ae83518 to
b2cf8a9
Compare
|
Code review by qodo was updated up to the latest commit b2cf8a9 |
The read-idle close that
TcpUpgradeTunnelHandlerinstalls after the tunnel is established (introduced by #17197) tears down both tunnel channels when no bytes have been received for 120 seconds. That is the right behaviour when an intermediate load balancer has silently dropped the TCP connection, but it interacts badly with the backpressure mirroring added in #17543: when the peer's outbound buffer crosses its high-water mark,TcpTunnelHandlersetsautoRead=falseon this side, nochannelReadevents fire, and the read-idle timer reaches its threshold even though the stall is by design. A sustained slow consumer for more than two minutes will therefore have the tunnel torn down underneath it.Gate the close in
IdleCloseHandler.userEventTriggeredon the channel'sautoReadflag: while reads are paused by backpressure, log at FINE and ignore the event. As soon as the peer drains andTcpTunnelHandlerrestoresautoRead=truethe read-idle clock starts again from a fresh read, so a legitimately dropped connection is still detected within the same window once traffic resumes.IdleCloseHandleris promoted from a private nested class to package-private so a focusedEmbeddedChannelunit test can exercise both branches: the close-both-channels behaviour on a normal idle event, and the ignored-while-paused behaviour withautoRead=false.🤖 Generated with Claude Code