Skip to content

fix: harden Node runtime cleanup#457

Merged
thymikee merged 4 commits intomainfrom
fix/node-runtime-cleanups
Apr 27, 2026
Merged

fix: harden Node runtime cleanup#457
thymikee merged 4 commits intomainfrom
fix/node-runtime-cleanups

Conversation

@thymikee
Copy link
Copy Markdown
Contributor

@thymikee thymikee commented Apr 27, 2026

Summary

Harden Node runtime, daemon shutdown, process/stream cleanup, platform discovery, and test lifecycle handling.

Fixes included:

  • Pin local/CI package management to pnpm 10.33.2 and Node 22.19.
  • Add packageManager, raise the Node engine floor, and expand format coverage for tooling config files.
  • Switch TypeScript checking to NodeNext resolution and add type-stripping guardrails: verbatimModuleSyntax, erasableSyntaxOnly, and rewriteRelativeImportExtensions.
  • Make daemon fatal errors exit non-zero and emit daemon-level diagnostics.
  • Add bounded daemon server shutdown with forced socket/HTTP connection teardown.
  • Extract daemon server shutdown into a focused helper and cover the forced-close timeout path.
  • Tear down daemon sessions in parallel with a per-session timeout during shutdown.
  • Share session resource teardown between daemon shutdown and close, including app logs, Apple runners, and retained materialized paths.
  • Re-check request cancellation around queued session execution locks.
  • Avoid per-chunk Buffer.concat in exec stdout capture and consolidate spawn/error handling.
  • Add AbortSignal support to runCmd and return the daemon-recognized request-canceled error shape on abort.
  • Write runCmd stdin through pipeline(Readable.from(...), child.stdin) and keep benign early EPIPE handling.
  • Thread request cancellation into the iOS simulator curl fallback path.
  • Kill detached process groups on command timeout or abort.
  • Move artifact upload/download and hashing paths to pipeline() for better stream teardown/backpressure.
  • Close partial artifact write streams before deleting failed/timed-out downloads.
  • Use Readable.fromWeb for fetch response download streams.
  • Use AbortSignal.timeout / AbortSignal.any for fetch timeout handling where applicable.
  • Cache only positive physical-device runner tunnel IP resolution results so transient null probes can retry.
  • Serialize iOS runner log appends to avoid unordered concurrent writes.
  • Limit Android device discovery and TV feature probe concurrency to reduce adb pressure.
  • Add subprocess timeouts to integration helpers and installed-package checks.
  • Clear test timeout timers, await killed child processes, isolate session-routing temp stores, and add recording overlay cleanup.
  • Refactor new cleanup code enough to keep fallow complexity failures cleared.

Touched-file count: 24. Scope intentionally spans tooling, daemon lifecycle, process I/O, platform discovery, and test cleanup from the requested all-around audit follow-ups.
Docs/skills were not updated because no CLI behavior, flags, workflows, or user-facing docs changed.

Validation

pnpm install --lockfile-only --frozen-lockfile
pnpm format
pnpm typecheck
pnpm vitest run src/utils/tests/exec.test.ts src/platforms/ios/tests/runner-transport.test.ts
pnpm vitest run src/daemon/tests/server-shutdown.test.ts src/utils/tests/exec.test.ts src/platforms/ios/tests/runner-transport.test.ts src/platforms/tests/install-source.test.ts
pnpm exec node --test test/integration/smoke-daemon-http.test.ts test/integration/smoke-open-remote-config.test.ts
pnpm vitest run src/daemon/tests/http-server.test.ts src/daemon/tests/artifact-materialization.test.ts src/utils/tests/daemon-client.test.ts
pnpm vitest run src/daemon/tests/upload.test.ts src/platforms/ios/tests/runner-transport.test.ts
pnpm check:fallow
pnpm check:unit
pnpm check

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 27, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://callstackincubator.github.io/agent-device/pr-preview/pr-457/

Built to branch gh-pages at 2026-04-27 21:17 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d651151a5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +74 to +77
if (device.kind === 'device' && deviceTunnelIp === undefined) {
deviceTunnelIp = await resolveDeviceTunnelIp(device.id, timeoutBudgetMs);
}
return resolveRunnerCommandEndpoints(device, port, deviceTunnelIp ?? null);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Retry tunnel IP resolution on each device connect attempt

This caches deviceTunnelIp after the first probe, including the null case, so later retries in waitForRunner never call resolveDeviceTunnelIp again. Because resolveDeviceTunnelIp explicitly returns null on transient states (non-zero devicectl, non-success outcome, timeout), a physical device whose tunnel address appears a bit later can never have its tunnel endpoint added, and the full retry loop can fail even though a subsequent probe would have succeeded.

Useful? React with 👍 / 👎.

Comment thread src/daemon/artifact-download.ts Outdated
output.on('finish', () => settle());
armTimeout();
source.pipe(output);
void pipeline(source, byteLimit, fs.createWriteStream(destPath)).then(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep a writable handle before deleting timed-out artifact file

The timeout/error path calls settle(error) which removes destPath, but the destination stream is now created inline and cannot be explicitly destroyed/closed first. On Windows, deleting a file that still has an open write handle can throw EPERM, so a timed-out/interrupted transfer can raise from cleanup instead of returning the intended rejection. Retaining the write stream handle and closing it before rmSync avoids this platform-specific failure.

Useful? React with 👍 / 👎.

@thymikee thymikee force-pushed the fix/node-runtime-cleanups branch from 446daa5 to b62318a Compare April 27, 2026 20:45
@thymikee thymikee merged commit 5260ac8 into main Apr 27, 2026
18 checks passed
@thymikee thymikee deleted the fix/node-runtime-cleanups branch April 27, 2026 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant