fix(fpss): correct derived-bar accounting and escalate write failures to reconnect#888
Merged
Conversation
… to reconnect Out-of-order trade corrections (nonzero records_back) reprice a print already counted in the running bar. The derived OHLCVC accumulator dropped that field, so every correction re-added its size and trade, double-counting volume and count on the derive path. Thread records_back into the accumulator and let a correction adjust only the price extremes and the close, never volume or count. Steady-state and post-reconnect command-drain write failures were logged and swallowed, leaving the loop reading on a socket whose write side had broken and deferring recovery to the next read timeout while the queued subscribe or command was lost. Escalate a write error to an immediate reconnect, mirroring the replay-flush path: the in-session drain publishes Disconnected and breaks to the reconnect decision, and the post-reconnect drain re-enters the session loop. The login handshake looped on read with no wall-clock bound and an unbounded pending-control buffer, so a peer streaming Connected/Ping/Restart without ever sending Metadata reset the per-read timeout on every frame and grew the buffer without limit, wedging the reconnect thread. Bound the handshake with a wall-clock deadline derived from the read timeout and cap the buffered control frames, returning a timeout or protocol error past those limits. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three correctness and liveness fixes in the FPSS streaming path, each with a test that fails on the prior behavior.
Derived-bar double-count on trade corrections
An out-of-order trade correction carries a nonzero records_back: the print refers to a trade already reported earlier in the stream, so its size and trade are already inside the running bar. The derived OHLCVC accumulator never received that field, so every correction re-added its size and bumped the count, double-counting volume and trade count on the derive path. The accumulator now takes records_back; a correction adjusts only the high, low, and close, never volume or count. A correction also never opens a fresh bar. Two tests cover the new-high correction and the low-adjust-without-reset case, plus a fresh print after a correction resuming from the un-corrupted totals.
Swallowed write failures defer reconnect
Both command-drain paths logged a write failure and kept going. The steady-state drain then kept reading on a socket whose write side had broken, deferring recovery to the next read timeout while the queued subscribe or command was lost. The post-reconnect drain ran a session whose queued command never reached the server. Both now escalate a write error the same way the re-subscribe replay does: the in-session drain publishes Disconnected and breaks to the reconnect decision, and the post-reconnect drain re-enters the session loop.
Unbounded login handshake
The handshake looped on read with no wall-clock bound and an unbounded pending-control buffer. A peer streaming Connected/Ping/Restart without ever sending Metadata reset the per-read socket timeout on every frame and grew the buffer without limit, wedging the reconnect thread. The handshake now has a wall-clock deadline derived from the read timeout and a cap on buffered control frames, returning a timeout or protocol error past those limits. The clock is injected so the deadline case is tested deterministically; the cap case is exercised by flooding control frames.
Audit findings 2 reviewed, not a defect
The finding that a server-seeded OHLCVC bar over-counts its first overlapping trade does not hold against the code. Server bars carry cumulative session volume, but the derive path only advances the accumulator for trades that arrive after the seed, and the only way an already-counted print is re-delivered is the records_back correction handled above. Reconnect clears the accumulators before a fresh seed, so there is no residual over-count. No change made there.
Verification
Full fpss lib suite passes (274 tests, default and __internal features). Format and the CI-exact workspace clippy with all targets are clean.
🤖 Generated with Claude Code