fix(fpss): treat Windows ERROR_IO_PENDING as transient read#470
Merged
Conversation
On Windows the overlapped socket layer surfaces in-flight reads as ERROR_IO_PENDING (raw OS error 997) rather than WSAEWOULDBLOCK. Rust std maps 997 to ErrorKind::Uncategorized, so the existing kind matches in fpss/io_loop.rs::is_read_timeout and the two retry arms in fpss/framing.rs (pre-header and mid-payload) treated it as fatal. Python users on Windows saw FPSS read error error=IO error: Overlapped I/O operation is in progress. (os error 997) spam followed by a reconnect storm. Centralise transient-read detection in framing::is_transient_read, which matches WouldBlock | TimedOut plus raw_os_error() == Some(997) (ERROR_IO_PENDING). All three sites delegate to it so the I/O loop drains queued commands and retries the way it does on Linux and macOS. Tests: unit test pinning the helper on os_error(997), plus three integration-style tests against the existing mock readers covering the pre-header propagate-as-Io path and the mid-header / mid-payload retry-and-recover paths under raw OS error 997. Bumps tdbe 0.12.5 -> 0.12.7 and the workspace 8.0.24 -> 8.0.26. Closes #469 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes a Windows-specific FPSS reconnect/log spam issue by classifying Win32 ERROR_IO_PENDING (os error 997) as a transient read condition (similar to WouldBlock/TimedOut), preventing benign in-flight reads from being treated as fatal disconnects. It also bumps crate/SDK versions to ship the fix.
Changes:
- Added a shared
is_transient_read(&io::Error)helper (includingraw_os_error == 997) and routed FPSS read-timeout logic through it. - Updated FPSS framing retry branches to use the shared transient-read classification and added regression tests covering pre-header/mid-header/mid-payload behavior.
- Bumped versions across Rust crates/tools and TypeScript/Python SDK packaging metadata; refreshed lockfiles and changelogs.
Reviewed changes
Copilot reviewed 16 out of 21 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| crates/thetadatadx/src/fpss/framing.rs | Adds ERROR_IO_PENDING constant + is_transient_read helper; uses it in mid-frame retry logic; adds Windows-997 regression tests. |
| crates/thetadatadx/src/fpss/io_loop.rs | Delegates read-timeout classification to framing::is_transient_read to keep transient logic consistent. |
| crates/thetadatadx/Cargo.toml | Bumps thetadatadx version and tdbe dependency version. |
| crates/tdbe/Cargo.toml | Bumps tdbe crate version. |
| tools/server/Cargo.toml | Bumps thetadatadx-server and tdbe dependency versions. |
| tools/server/Cargo.lock | Lockfile refresh for new thetadatadx/tdbe versions. |
| tools/mcp/Cargo.toml | Bumps thetadatadx-mcp and tdbe dependency versions. |
| tools/mcp/Cargo.lock | Lockfile refresh for new thetadatadx/tdbe versions. |
| tools/cli/Cargo.toml | Bumps thetadatadx-cli and tdbe dependency versions. |
| ffi/Cargo.toml | Bumps thetadatadx-ffi and tdbe dependency versions. |
| sdks/python/Cargo.toml | Bumps thetadatadx-py and tdbe dependency versions. |
| sdks/python/Cargo.lock | Lockfile refresh for new thetadatadx/tdbe versions. |
| sdks/typescript/Cargo.toml | Bumps thetadatadx-napi and tdbe dependency versions. |
| sdks/typescript/Cargo.lock | Lockfile refresh for new thetadatadx/tdbe versions. |
| sdks/typescript/package.json | Bumps TypeScript SDK version + optional native package versions. |
| sdks/typescript/npm/win32-x64-msvc/package.json | Bumps Windows prebuilt package version. |
| sdks/typescript/npm/linux-x64-gnu/package.json | Bumps Linux prebuilt package version. |
| sdks/typescript/npm/darwin-arm64/package.json | Bumps macOS ARM64 prebuilt package version. |
| CHANGELOG.md | Adds 8.0.26 release notes describing the Windows FPSS fix + tdbe bump. |
| docs-site/docs/changelog.md | Mirrors 8.0.26 release notes for the docs site. |
| Cargo.lock | Workspace lockfile refresh for new thetadatadx/tdbe versions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1427
to
+1438
| /// transient read. Rust `std` maps 997 to `ErrorKind::Uncategorized`, | ||
| /// so a plain `kind()` match would miss it and treat the in-flight | ||
| /// overlapped read as a fatal disconnect — which is exactly what the | ||
| /// Python user reported in issue #469. | ||
| #[test] | ||
| fn is_transient_read_recognises_windows_error_io_pending() { | ||
| let err = std::io::Error::from_raw_os_error(ERROR_IO_PENDING); | ||
| // Sanity: confirm the precondition that motivates this fix — | ||
| // `std` does not map 997 to a recognisable kind on any platform. | ||
| assert_ne!(err.kind(), std::io::ErrorKind::WouldBlock); | ||
| assert_ne!(err.kind(), std::io::ErrorKind::TimedOut); | ||
| assert_eq!(err.raw_os_error(), Some(997)); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #469. Python user on Windows reported a constant stream of:
followed by a reconnect storm.
ERROR_IO_PENDING(Win32 error 997) is what the Windows overlapped I/O layer returns while a non-blocking read is still in flight. Ruststdmaps raw OS error 997 toErrorKind::Uncategorized, so the three existing transient-read checks — which only matchedWouldBlock | TimedOut— fell through to the fatal arm and tore the connection down.Fix
New helper
crates/thetadatadx/src/fpss/framing.rs::is_transient_read(&io::Error)matches:ErrorKind::WouldBlockErrorKind::TimedOutraw_os_error() == Some(997)(WindowsERROR_IO_PENDING)Three patched sites all delegate to it:
crates/thetadatadx/src/fpss/io_loop.rs:687-696—is_read_timeout(drives the I/O loop's command-drain branch)crates/thetadatadx/src/fpss/framing.rs:236-242— pre-header retry decisioncrates/thetadatadx/src/fpss/framing.rs:349-355— mid-frame retry decisionBehaviour on Linux / macOS is unchanged:
WouldBlockandTimedOutcontinue to take the same path. Windows now joins them — the I/O loop drains queued commands and retries instead of escalating to a reconnect.Tests
is_transient_read_recognises_windows_error_io_pending— unit test onio::Error::from_raw_os_error(997). Asserts the helper returnstrue, thatECONNRESET(104) does NOT, and thatWouldBlock/TimedOutstill match.pre_header_error_io_pending_propagates_as_io—read_frameon a reader that returns os error 997 with zero bytes delivered must surface asError::Iowithraw_os_error() == Some(997), the exact pathis_read_timeoutthen drains on.mid_header_error_io_pending_retries_and_recovers— header byte 1 arrives, three os-error-997 stalls, then byte 2 + payload. Frame must decode cleanly.mid_payload_error_io_pending_retries_and_recovers— header + 2 of 4 payload bytes, three os-error-997 stalls, then the remaining 2 payload bytes. Frame must decode with payload[0x01, 0x02, 0x03, 0x04].Versioning
thetadatadx,thetadatadx-cli,thetadatadx-server,thetadatadx-mcp,thetadatadx-ffi,thetadatadx-napi,thetadatadx-py: 8.0.24 -> 8.0.26 (skips 8.0.25, reserved by refactor: rename root to symbol and exp_date to expiration to match v3 vendor surface #468)tdbe: 0.12.5 -> 0.12.7Cargo.lockfiles andpackage.jsonfiles refreshed8.0.26per the patch-only v8 line.Local CI
cargo fmt --all -- --checkcleancargo clippy --workspace --all-targets -- -D warningsclean (also checkedtools/serverandtools/mcpsub-workspaces)cargo test --workspace304/304 main + 109 ffi + sub-suites all green; the 4 new tests passcargo deny checkadvisories / bans / licenses / sources all okcargo run -p thetadatadx --bin generate_sdk_surfaces --features config-file -- --checkcleanTest plan