Skip to content

Commit aaec3d5

Browse files
committed
Add bounded client calls and abort support
1 parent 449bd65 commit aaec3d5

70 files changed

Lines changed: 3068 additions & 207 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.agents/sow/current/SOW-0015-20260605-codacy-scope-and-maintainability.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
## Status
44

5-
Status: in-progress
5+
Status: paused
66

7-
Sub-state: Codacy test/bench exclusion and refreshed production-source maintainability baseline in progress.
7+
Sub-state: paused by user decision on 2026-06-11 so `SOW-0016` can take current active slot for a timeout/abort liveness fix. Resume from Codacy test/bench exclusion and refreshed production-source maintainability baseline work.
88

99
## Requirements
1010

.agents/sow/done/SOW-0016-20260610-client-call-timeout-and-abort.md

Lines changed: 378 additions & 0 deletions
Large diffs are not rendered by default.

docs/getting-started.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,12 @@ if client.Ready() {
153153
}
154154
```
155155

156+
Typed client calls are bounded by a client call timeout. Passing an
157+
explicit timeout of zero uses the client context default, which is
158+
30000 ms. Clients also expose an abort signal so shutdown code can
159+
release a call blocked in receive; the abort remains active until it is
160+
cleared or the client is closed.
161+
156162
## Managed Server (L2)
157163

158164
The managed server receives wire messages, but that is internal to the

docs/level1-posix-uds.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,26 @@ chunk header is used.
8989

9090
See the wire envelope spec for chunk header layout and validation rules.
9191

92+
## Receive timeout and abort
93+
94+
The baseline receive primitive remains a blocking Level 1 operation for
95+
callers that want raw transport semantics. Level 1 also provides a
96+
timeout-aware receive form used by Level 2 typed calls.
97+
98+
The timeout-aware receive form must:
99+
100+
- wait for each required SEQPACKET chunk with a deadline derived from the
101+
caller-provided timeout
102+
- apply the same deadline to the first packet and all continuation
103+
packets that make up one logical message
104+
- poll an explicit abort file descriptor at the same time as the socket
105+
file descriptor
106+
- return a distinct timeout error when the deadline expires
107+
- return a distinct aborted error when the abort descriptor is signaled
108+
109+
Level 2 is responsible for deciding whether a timeout or abort should
110+
break the session. Level 1 only reports the condition precisely.
111+
92112
## SHM file path derivation
93113

94114
When the handshake negotiates a SHM profile, the server creates a

docs/level1-windows-np.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,27 @@ sequentially.
8989

9090
See the wire envelope spec for chunk header layout and validation.
9191

92+
## Receive timeout and abort
93+
94+
The baseline receive primitive remains a blocking Level 1 operation for
95+
callers that want raw transport semantics. Level 1 also provides a
96+
timeout-aware receive form used by Level 2 typed calls.
97+
98+
The timeout-aware receive form must:
99+
100+
- wait for each required pipe message with a deadline derived from the
101+
caller-provided timeout
102+
- apply the same deadline to the first packet and all continuation
103+
packets that make up one logical message
104+
- observe an explicit abort event while waiting for pipe readability
105+
- return a distinct timeout error when the deadline expires
106+
- return a distinct aborted error when the abort event is signaled
107+
108+
Level 2 is responsible for deciding whether a timeout or abort should
109+
break the session. Level 1 reports timeout and abort distinctly; peer
110+
disconnect is reported when the pipe read path observes the corresponding
111+
Win32 pipe error.
112+
92113
## SHM mapping name derivation
93114

94115
When the handshake negotiates a SHM profile, the shared memory region

docs/level2-typed-api.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,9 @@ There are two important cases:
184184

185185
- For ordinary transport / peer failures, Level 2 reconnects and retries
186186
once.
187+
- Timeout and caller-requested abort are terminal for the current call:
188+
Level 2 reports the explicit error, disconnects or marks the current
189+
session broken, and does not reconnect/retry that same request.
187190
- For overflow-driven resize recovery, Level 2 may reconnect more than
188191
once while negotiated request/response capacities grow. This is a
189192
fallback safety net, not the primary sizing strategy for typed APIs.
@@ -197,6 +200,19 @@ without attempting reconnection.
197200

198201
If recovery fails, Level 2 reports failure to the caller.
199202

203+
Every synchronous Level 2 client call is bounded. A per-call timeout of
204+
zero means "use the client context default"; the default is 30000 ms.
205+
The deadline applies to the complete logical response, including baseline
206+
transport chunk continuations and SHM response waits. A timeout returns a
207+
distinct timeout error rather than being collapsed into disconnect or
208+
protocol failure.
209+
210+
Every Level 2 client context also exposes an abort signal that is safe to
211+
trigger from another thread while one thread is blocked in a typed call.
212+
Abort is sticky until the caller explicitly clears it or closes the
213+
client. A call that observes the abort signal returns a distinct aborted
214+
error and is not retried.
215+
200216
Negotiated request payload ceilings are capped at 1 MiB. If a peer
201217
proposes a larger request ceiling, handshake rejects with
202218
`transport_status = LIMIT_EXCEEDED`. The cap is enforced before the value

docs/level3-snapshot-api.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,17 @@ reconnect attempt. On overflow-driven resize recovery it may reconnect
142142
more than once while negotiated capacities grow (up to 8 overflow
143143
retries). If recovery still fails, the previous cache is preserved.
144144

145+
Refresh is bounded by the underlying Level 2 client call timeout. The
146+
default timeout is 30000 ms unless the caller sets a different client
147+
context default or uses an explicit timeout-capable refresh/call form.
148+
Timeout and caller-requested abort preserve the previous cache, return a
149+
distinct error, and do not retry the same snapshot request.
150+
151+
Level 3 exposes the underlying Level 2 abort lifecycle for shutdown paths:
152+
an abort signal may be triggered from another thread to unblock a refresh
153+
that is waiting in transport receive, and remains active until explicitly
154+
cleared or the helper is closed.
155+
145156
This is intentional:
146157

147158
- providers may start late

docs/netipc-integrator-skill.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -324,6 +324,9 @@ as explicit assumptions, not as implementation details they can reinterpret.
324324
- L2 clients and L3 caches should be treated as single-owner mutable objects
325325
unless the integrator adds external synchronization.
326326
- They are not documented as internally synchronized shared objects.
327+
- The only cross-thread operation provided by the public client contract is
328+
the abort signal, which may be triggered by a shutdown thread to release a
329+
call blocked in transport receive.
327330
- On the same object instance, do not concurrently call:
328331
- `refresh()`
329332
- blocking typed methods
@@ -542,16 +545,30 @@ L2 typed calls are at-least-once:
542545
- reconnect
543546
- retry
544547
- ordinary failures retry once
548+
- timeout and caller-requested abort are terminal for that call and are not
549+
retried
545550
- overflow-driven resize recovery may reconnect multiple times while capacities
546551
grow
547552
- managed servers close a session after terminal service errors such as
548553
`LIMIT_EXCEEDED`, `BAD_ENVELOPE`, or `INTERNAL_ERROR`; recovery is a new
549554
handshake on a new session
550555

556+
L2 typed calls are also bounded:
557+
558+
- a per-call timeout of zero uses the client context default
559+
- the default client context timeout is 30000 ms
560+
- the deadline covers the complete logical response, including chunked
561+
baseline messages and SHM waits
562+
- timeout and abort have distinct errors, so shutdown handling does not need
563+
to guess whether the peer disconnected
564+
- abort is sticky until the caller clears it or closes the client/cache helper
565+
551566
Implication:
552567

553568
- server handlers must be duplicate-safe
554569
- do not design exactly-once business semantics around L2 calls
570+
- shutdown paths can abort a blocked call, but normal shared-client access
571+
still needs external synchronization in C and Go
555572

556573
### SHM attach failure
557574

src/crates/netipc/src/protocol/mod.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,10 @@ pub enum NipcError {
103103
BadItemCount,
104104
/// Builder ran out of space.
105105
Overflow,
106+
/// Synchronous call timed out before a complete response arrived.
107+
Timeout,
108+
/// Synchronous call was aborted by the caller.
109+
Aborted,
106110
}
107111

108112
impl core::fmt::Display for NipcError {
@@ -119,6 +123,8 @@ impl core::fmt::Display for NipcError {
119123
NipcError::BadAlignment => write!(f, "item not 8-byte aligned"),
120124
NipcError::BadItemCount => write!(f, "item count inconsistent"),
121125
NipcError::Overflow => write!(f, "builder out of space"),
126+
NipcError::Timeout => write!(f, "synchronous call timed out"),
127+
NipcError::Aborted => write!(f, "synchronous call aborted"),
122128
}
123129
}
124130
}

src/crates/netipc/src/service/apps_lookup.rs

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ use crate::transport::windows::{
1616
use std::sync::atomic::AtomicBool;
1717
use std::sync::Arc;
1818

19-
pub use raw::{AppsLookupHandler, ClientState, ClientStatus};
19+
pub use raw::{AppsLookupHandler, ClientAbortHandle, ClientState, ClientStatus};
2020

2121
#[derive(Debug, Clone)]
2222
pub struct ClientConfig {
@@ -109,8 +109,32 @@ impl AppsLookupClient {
109109
self.inner.status()
110110
}
111111

112+
pub fn set_call_timeout(&mut self, timeout_ms: u32) {
113+
self.inner.set_call_timeout(timeout_ms);
114+
}
115+
116+
pub fn abort_handle(&self) -> ClientAbortHandle {
117+
self.inner.abort_handle()
118+
}
119+
120+
pub fn abort(&self) {
121+
self.inner.abort();
122+
}
123+
124+
pub fn clear_abort(&self) {
125+
self.inner.clear_abort();
126+
}
127+
112128
pub fn call(&mut self, pids: &[u32]) -> Result<AppsLookupResponseView<'_>, NipcError> {
113-
self.inner.call_apps_lookup(pids)
129+
self.call_with_timeout(pids, 0)
130+
}
131+
132+
pub fn call_with_timeout(
133+
&mut self,
134+
pids: &[u32],
135+
timeout_ms: u32,
136+
) -> Result<AppsLookupResponseView<'_>, NipcError> {
137+
self.inner.call_apps_lookup_with_timeout(pids, timeout_ms)
114138
}
115139

116140
pub fn close(&mut self) {

0 commit comments

Comments
 (0)