fix(ilp): surface WebSocket frame-size errors to sender callers#13
fix(ilp): surface WebSocket frame-size errors to sender callers#13bluestreak01 merged 7 commits intomainfrom
Conversation
Report oversized WebSocket frames with the received frame size, configured max size, and guidance to decrease batch size. Also propagate terminal WebSocket sender failures at Sender level so async ACK errors surface promptly.
Swap them for zero-GC Utf8s helpers.
|
@jerrinot — critical review follow-up. After double-checking, here are the confirmed findings (a couple of my earlier points turned out to be wrong and have been withdrawn). Confirmed concerns worth addressing before merge1. Terminal sender state is permanent — document the contract 2. Stale stack trace on rethrow 3. PR title mismatch Smaller items (not blockers)4. 5. 6. 7. 8. 9. Points I initially raised but had to withdraw
RecommendationLand after (1), (2), (3). The rest are comments / follow-ups. |
length field for non-OK responses, including empty messages.
[PR Coverage check]😍 pass : 153 / 179 (85.47%) file detail
|
Wires the drainer runtime onto the orphan-scanner foundation. With
drain_orphans=true the foreground sender now actually empties sibling
slots holding unacked data instead of just logging that they exist.
Per-drainer lifecycle:
1. Open CursorSendEngine on the slot — its constructor takes the slot
lock; if another sender or drainer holds it, the engine throws and
the drainer exits silently (LOCKED_BY_OTHER, not a failure).
2. Open a fresh WebSocketClient via the foreground sender's connect
factory — separate connection, same auth/host/port/TLS config.
3. Run a CursorWebSocketSendLoop until ackedFsn catches up to the
publishedFsn snapshot taken at startup.
4. On terminal failure (auth, recovery, budget), drop a .failed
sentinel into the slot. Future scans skip it until an operator
clears it manually — bounded retry, then human-in-the-loop.
Pool: bounded fixed-thread executor, daemon threads, sized by
max_background_drainers (default 4). Closes via cooperative stop +
3s grace; daemon threads ensure no JVM-exit blocking.
Visibility: QwpWebSocketSender#getBackgroundDrainers returns a snapshot
list of live drainers with {slot, target, acked, outcome, lastError}.
Test: ghost sender writes 30 distinct rows against a silent server and
closes fast — leaves an unacked slot. Foreground sender opens the same
group root with a different sender_id and drain_orphans=true against an
ack server; asserts every distinct payload reaches the new server. Plus
a sentinel-skip test confirming an operator-set .failed file disqualifies
the slot from the next foreground run's scan.
Empty active segments and stale hot spares are left in the slot dir per
spec decision #13 ("no automatic cleanup of empty slot dirs"); the
scanner's no-op behavior on empty slots makes this cheap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Improve QWP/WebSocket sender error reporting so failures observed by the I/O thread are surfaced back to the caller instead of getting hidden behind generic queue state errors.
The main user-visible case is an oversized server response frame: callers now get the underlying WebSocket/QWP failure, with actionable context, when they next call into the sender.
Also clarifies the failure contract: after a WebSocket connection is established, send failures, ACK errors, server error ACKs, invalid ACKs, timeouts, or server closes put the sender into a terminal failed state. The sender retains the first failure and rethrows it from subsequent public calls; callers should close it and create a new sender to resume sending.