Skip to content

Commit 4266149

Browse files
amDosionunraid
andauthored
fix: keep UDS peer failures structured (#375)
* fix: keep UDS peer failures structured CodeRabbit and Claude cross-review identified that timeout and raw peer connection failures should share one observable error contract. UDS peer failures now use UdsPeerConnectionError consistently, and connectToPeer hands the socket lifecycle back to the caller after a successful connection instead of retaining an internal timeout or error listener. The tests cover the real socket paths with capability files, timeout behavior, connection failure structure, post-connect listener handoff, AgentSummary rescheduling observations, and platform-specific mailbox directory errno handling. Constraint: Preserve the 5000ms production timeout default while allowing tests to exercise timeout paths quickly. Rejected: Suppress CodeRabbit warnings in tests | would hide the real timeout/error contract gap. Rejected: Keep connectToPeer post-connect error listener | it would silently swallow caller-owned socket errors. Confidence: high Scope-risk: narrow Directive: Keep UDS send/connect timeout and socket-error paths on the same structured peer error contract. Tested: bun test src/utils/__tests__/udsMessaging.test.ts src/services/AgentSummary/__tests__/agentSummary.test.ts src/utils/__tests__/teammateMailbox.test.ts Tested: bunx tsc --noEmit --pretty false Tested: bun run lint Tested: bun run test:all Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage Tested: bun run build Tested: bun run build:vite Tested: omx ask claude simplify review artifact .omx/artifacts/claude-review-only-cross-check-for-pr-374-on-branch-codex-codecov-r-2026-04-27T08-17-47-309Z.md Tested: omx ask claude security review artifact .omx/artifacts/claude-security-review-cross-check-for-pr-374-current-working-tree--2026-04-27T08-26-54-079Z.md Not-tested: GitHub-hosted CodeRabbit refresh until pushed. * docs: clarify UDS peer socket ownership CodeRabbit's #375 pass found that connectToPeer now correctly hands socket errors to the caller, but the JSDoc needed to spell out that contract. The lifecycle test also uses a less brittle post-connect timeout so slow CI does not turn the ownership check into a connection-speed race. Constraint: The raw socket API intentionally detaches its internal listener after successful connect so caller-owned errors are not swallowed. Rejected: Keep the test timeout at 50ms | it tests scheduler speed instead of socket lifecycle ownership. Confidence: high Scope-risk: narrow Directive: connectToPeer callers must attach their own error listener immediately after awaiting the socket. Tested: bun test src/utils/__tests__/udsMessaging.test.ts Tested: bunx tsc --noEmit --pretty false Tested: bun run lint Tested: git diff --check Tested: bun run test:all Not-tested: GitHub-hosted CodeRabbit refresh until pushed. * fix: close peer socket listener handoff window CodeRabbit and Claude review found that documenting caller-owned raw socket errors still left a Promise handoff window and a stale timeout-listener risk. The peer connection API now requires a caller error handler and installs it before resolving, while cleanup removes internal error and timeout listeners on every path. Constraint: Keep the fix precise to PR #375 review feedback and avoid warning suppression or fallback behavior. Rejected: Leave the behavior documented only | still permits an unhandled socket error window between resolve and caller listener attachment. Rejected: Keep a no-op internal error listener | would silently swallow caller-owned socket errors. Confidence: high Scope-risk: narrow Directive: Do not add raw connectToPeer callers without providing a real onSocketError handler and capability handshake. Tested: bun test src/utils/__tests__/udsMessaging.test.ts src/services/AgentSummary/__tests__/agentSummary.test.ts Tested: bunx tsc --noEmit --pretty false Tested: bun run lint Tested: bun run test:all Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage Tested: bun run build Tested: bun run build:vite Tested: bun audit Not-tested: Manual external ACP peer runtime beyond repository tests. * fix: use a deadline timer for peer connects The raw socket handoff no longer needs Socket#setTimeout; an ordinary connection deadline keeps the timeout behavior while avoiding an internal socket timeout listener that has no reliable UDS integration path to exercise. Constraint: Keep Codecov coverage honest without adding ignore pragmas, mocks, or fallback suppression. Rejected: c8 ignore on the timeout listener | hides the uncovered branch instead of simplifying the lifecycle. Rejected: keep Socket#setTimeout listener | leaves a socket listener lifecycle to manage for a connect-only deadline. Confidence: high Scope-risk: narrow Directive: Keep connectToPeer errors caller-owned via onSocketError and reject pre-connect failures with UdsPeerConnectionError. Tested: bun test src/utils/__tests__/udsMessaging.test.ts src/services/AgentSummary/__tests__/agentSummary.test.ts Tested: bunx tsc --noEmit --pretty false Tested: bun run lint Tested: bun test src/utils/__tests__/udsMessaging.test.ts --coverage --coverage-reporter lcov --coverage-dir coverage-uds Tested: bun run test:all Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage Tested: bun run build Tested: bun run build:vite Tested: bun audit Not-tested: Manual external ACP peer runtime beyond repository tests. --------- Co-authored-by: unraid <local@unraid.local>
1 parent 7cc1785 commit 4266149

4 files changed

Lines changed: 121 additions & 17 deletions

File tree

src/services/AgentSummary/__tests__/agentSummary.test.ts

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,10 @@ describe('startAgentSummarization', () => {
109109
lastTimerHandle = undefined
110110
})
111111

112+
function expectDebugLogContaining(fragment: string): void {
113+
expect(debugLogs.some(message => message.includes(fragment))).toBe(true)
114+
}
115+
112116
test('summarizes bounded transcript once and skips unchanged fingerprints', async () => {
113117
handle = startTestSummarization()
114118

@@ -157,7 +161,7 @@ describe('startAgentSummarization', () => {
157161

158162
expect(forkCalls).toEqual([])
159163
expect(updateCalls).toEqual([])
160-
expect(debugLogs).toContain(
164+
expectDebugLogContaining(
161165
'[AgentSummary] Skipping summary for task-1: no bounded context available',
162166
)
163167
})
@@ -171,7 +175,7 @@ describe('startAgentSummarization', () => {
171175

172176
expect(forkCalls).toEqual([])
173177
expect(updateCalls).toEqual([])
174-
expect(debugLogs).toContain(
178+
expectDebugLogContaining(
175179
'[AgentSummary] Skipping summary for task-1: not enough messages (2)',
176180
)
177181
})
@@ -188,9 +192,7 @@ describe('startAgentSummarization', () => {
188192

189193
expect(forkCalls).toEqual([])
190194
expect(updateCalls).toEqual([])
191-
expect(debugLogs).toContain(
192-
'[AgentSummary] Skipping summary — poor mode active',
193-
)
195+
expectDebugLogContaining('[AgentSummary] Skipping summary — poor mode active')
194196
expect(scheduledCount).toBe(initialScheduledCount + 1)
195197
expect(lastTimerHandle).not.toBe(initialTimerHandle)
196198
})
@@ -220,9 +222,7 @@ describe('startAgentSummarization', () => {
220222

221223
handle.stop()
222224

223-
expect(debugLogs).toContain(
224-
'[AgentSummary] Stopping summarization for task-1',
225-
)
225+
expectDebugLogContaining('[AgentSummary] Stopping summarization for task-1')
226226
expect(clearedHandles).toEqual([pendingHandle])
227227
})
228228
})

src/utils/__tests__/teammateMailbox.test.ts

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -365,7 +365,11 @@ describe('teammate mailbox retention', () => {
365365
if (code === undefined) {
366366
throw new Error('Expected filesystem errno code')
367367
}
368-
expect(['EISDIR', 'EPERM', 'EACCES']).toContain(code)
368+
const expectedCodes =
369+
process.platform === 'win32'
370+
? ['EISDIR', 'EPERM', 'EACCES']
371+
: ['EISDIR']
372+
expect(expectedCodes).toContain(code)
369373
expect((await stat(inboxPath)).isDirectory()).toBe(true)
370374
})
371375

src/utils/__tests__/udsMessaging.test.ts

Lines changed: 70 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -275,7 +275,7 @@ describe('UDS inbox retention', () => {
275275
'../udsClient.js'
276276
)
277277

278-
const error = await sendToUdsSocket(path, 'hello', 50).then(
278+
const error = await sendToUdsSocket(path, 'hello', 200).then(
279279
() => undefined,
280280
err => err,
281281
)
@@ -301,6 +301,75 @@ describe('UDS inbox retention', () => {
301301
}
302302
})
303303

304+
test('connectToPeer reports connection failures as peer connection errors', async () => {
305+
const path = socketPath('uds-connect-error')
306+
const { connectToPeer, UdsPeerConnectionError } = await import(
307+
'../udsClient.js'
308+
)
309+
310+
const error = await connectToPeer(path, () => {
311+
throw new Error('Unexpected post-connect socket error')
312+
}).then(
313+
() => undefined,
314+
err => err,
315+
)
316+
317+
expect(error).toBeInstanceOf(UdsPeerConnectionError)
318+
if (!(error instanceof UdsPeerConnectionError)) {
319+
throw new Error('Expected UDS peer connection error')
320+
}
321+
expect(error.socketPath).toBe(path)
322+
})
323+
324+
test('connectToPeer leaves connected socket lifecycle to the caller', async () => {
325+
const path = socketPath('uds-connect-lifecycle')
326+
if (process.platform !== 'win32') {
327+
await mkdir(dirname(path), { recursive: true })
328+
}
329+
330+
const sockets = new Set<Socket>()
331+
const receiver = createServer(socket => {
332+
sockets.add(socket)
333+
socket.on('close', () => {
334+
sockets.delete(socket)
335+
})
336+
})
337+
await new Promise<void>((resolve, reject) => {
338+
receiver.on('error', reject)
339+
receiver.listen(path, () => resolve())
340+
})
341+
342+
let client: Socket | undefined
343+
const socketErrors: Error[] = []
344+
try {
345+
const { connectToPeer } = await import('../udsClient.js')
346+
client = await connectToPeer(
347+
path,
348+
error => {
349+
socketErrors.push(error)
350+
},
351+
1000,
352+
)
353+
await new Promise(resolve => setTimeout(resolve, 100))
354+
355+
expect(client.destroyed).toBe(false)
356+
expect(client.listenerCount('error')).toBe(1)
357+
358+
const socketError = new Error('post-connect failure')
359+
client.emit('error', socketError)
360+
expect(socketErrors).toEqual([socketError])
361+
} finally {
362+
client?.destroy()
363+
for (const socket of sockets) {
364+
socket.destroy()
365+
}
366+
await closeServer(receiver)
367+
if (process.platform !== 'win32') {
368+
await unlink(path).catch(() => undefined)
369+
}
370+
}
371+
})
372+
304373
test('sendUdsMessage fails closed before connecting without an auth token', async () => {
305374
await expect(
306375
sendUdsMessage(socketPath('no-auth-token'), { type: 'text', data: 'x' }),

src/utils/udsClient.ts

Lines changed: 38 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -266,17 +266,48 @@ export async function sendToUdsSocket(
266266

267267
/**
268268
* Connect to a peer and return the raw socket for bidirectional communication.
269-
* The caller is responsible for managing the connection lifecycle.
269+
* The caller owns the post-connect lifecycle through onSocketError, which is
270+
* attached before the Promise resolves so peer socket errors cannot be
271+
* swallowed or surface through a listener handoff window.
272+
* Pre-connect failures reject with UdsPeerConnectionError.
273+
* This only opens the transport; callers still own any capability handshake.
270274
*/
271-
export function connectToPeer(socketPath: string): Promise<Socket> {
275+
export function connectToPeer(
276+
socketPath: string,
277+
onSocketError: (error: Error) => void,
278+
timeoutMs = 5000,
279+
): Promise<Socket> {
272280
return new Promise<Socket>((resolve, reject) => {
273-
const conn = createConnection(socketPath, () => {
281+
const conn = createConnection(socketPath)
282+
let settled = false
283+
const timeout = setTimeout(
284+
fail,
285+
timeoutMs,
286+
new Error('Connection timed out'),
287+
)
288+
function cleanupListeners(): void {
289+
clearTimeout(timeout)
290+
conn.off('error', fail)
291+
}
292+
function fail(cause: unknown): void {
293+
if (settled) {
294+
return
295+
}
296+
settled = true
297+
cleanupListeners()
298+
conn.destroy()
299+
reject(new UdsPeerConnectionError(socketPath, cause))
300+
}
301+
conn.once('connect', () => {
302+
if (settled) {
303+
return
304+
}
305+
settled = true
306+
cleanupListeners()
307+
conn.on('error', onSocketError)
274308
resolve(conn)
275309
})
276-
conn.on('error', reject)
277-
conn.setTimeout(5000, () => {
278-
conn.destroy(new Error('Connection timed out'))
279-
})
310+
conn.on('error', fail)
280311
})
281312
}
282313

0 commit comments

Comments
 (0)