Summary
The copilot headless server (copilot --headless) accumulates kqueue file descriptors over its lifetime. Each session creates fs.watch() file watchers that are never released when sessions become idle or are closed. After enough sessions (especially with multi-agent/fleet workflows), PTY allocation for the bash tool fails with the generic error "Failed to start bash process".
Environment
- CLI version: 1.0.10
- OS: macOS 15.3 (Darwin 25.3.0, arm64)
- Node: v24.11.1
- Client type: cli-server (PolyPilot app connecting via TCP)
Reproduction
- Start
copilot --headless --port 4321
- Create many sessions over time (fleet/multi-agent workflows accelerate this — we had 93 unique sessions over the server's lifetime)
- After enough sessions, all
bash tool calls fail with "Failed to start bash process"
- Existing PTY handles start throwing
EIO (I/O error) on write
Restarting the headless server immediately fixes the issue.
Evidence
kqueue FD leak comparison
| Metric |
Old Server (leaked) |
Fresh Server (just restarted) |
| Total FDs |
4,321 |
69 |
| KQUEUE FDs |
9,779 |
27 |
| Node processes |
300 |
21 |
| User processes |
922 |
362 |
That's ~105 kqueue file descriptors leaked per session (9,779 kqueues / 93 sessions).
A fresh server starts with ~27 KQUEUE FDs. After 93 sessions (including multi-agent sub-agents), this grew to 9,779 — a 362x increase.
Timeline from the process log (process-*.log)
- First PTY EIO error:
2026-03-29T14:38:07.722Z
[ERROR] Unhandled pty write error [Error: EIO: i/o error, write] {
errno: -5,
code: 'EIO',
syscall: 'write'
}
- First bash failure:
2026-03-29T14:38:07.925Z (simultaneous with EIO)
{
"tool_name": "bash",
"result_type": "FAILURE",
"error": "<exited with error: Failed to start bash process>"
}
- Last bash failure:
2026-03-29T14:46:32.763Z (~8 minutes of total bash unavailability)
- Total PTY EIO errors: 149
- Total bash spawn failures: 641
- Multiple sessions affected — the failure is server-wide, not session-specific
lsof output on the leaked server (before restart)
$ lsof -p 26081 | awk '{print $5}' | sort | uniq -c | sort -rn
9779 KQUEUE
436 unix
120 CHR
27 REG
8 DIR
6 PIPE
5 IPv4
2 IPv6
1 systm
lsof output on fresh server (after restart)
$ lsof -p 33499 | awk '{print $5}' | sort | uniq -c | sort -rn
27 KQUEUE
17 REG
6 PIPE
4 unix
4 IPv4
4 DIR
2 IPv6
2 CHR
1 systm
Root Cause Analysis
Node.js uses kqueue on macOS for fs.watch(). The headless server likely creates file watchers per session (working directory monitoring, .copilot/ config watches, session state directory watches, etc.). When sessions go idle or are explicitly closed, these watchers are never cleaned up.
With multi-agent workflows that spawn many short-lived sub-agent sessions (fleet mode, parallel task execution), the leak accelerates rapidly. In our case, 93 sessions accumulated ~9,779 kqueue FDs.
When the kqueue count gets high enough, pty.spawn() (used by the bash tool to create pseudo-terminal sessions) fails — likely due to macOS kernel resource pressure on PTY allocation or the child process inheriting too many FDs.
Impact
- All bash tool calls fail server-wide — not just the session that triggered the limit
- Existing PTY sessions get EIO errors — even previously-working bash sessions break
- Only fix is server restart — the FDs are never reclaimed without killing the process
- Multi-agent/fleet workflows hit this faster due to many short-lived sessions
- 8+ minutes of total bash unavailability in our observed incident
Suggested Fix
- Close file watchers when sessions are disposed/idle. Each
fs.watch() handle should be tracked per-session and .close()d when the session ends.
- Set
CLOEXEC on kqueue FDs if not already — prevents child processes (bash) from inheriting unnecessary FDs.
- Add a resource limit guard — if the server detects its FD count exceeding a threshold, log a warning and/or proactively clean up stale watchers.
Workaround
Restart the headless server periodically, or when bash failures are detected. PolyPilot users can use Settings → Save & Reconnect to trigger a server restart.
Summary
The copilot headless server (
copilot --headless) accumulates kqueue file descriptors over its lifetime. Each session createsfs.watch()file watchers that are never released when sessions become idle or are closed. After enough sessions (especially with multi-agent/fleet workflows), PTY allocation for thebashtool fails with the generic error"Failed to start bash process".Environment
Reproduction
copilot --headless --port 4321bashtool calls fail with"Failed to start bash process"EIO(I/O error) on writeRestarting the headless server immediately fixes the issue.
Evidence
kqueue FD leak comparison
That's ~105 kqueue file descriptors leaked per session (9,779 kqueues / 93 sessions).
A fresh server starts with ~27 KQUEUE FDs. After 93 sessions (including multi-agent sub-agents), this grew to 9,779 — a 362x increase.
Timeline from the process log (
process-*.log)2026-03-29T14:38:07.722Z2026-03-29T14:38:07.925Z(simultaneous with EIO){ "tool_name": "bash", "result_type": "FAILURE", "error": "<exited with error: Failed to start bash process>" }2026-03-29T14:46:32.763Z(~8 minutes of total bash unavailability)lsof output on the leaked server (before restart)
lsof output on fresh server (after restart)
Root Cause Analysis
Node.js uses kqueue on macOS for
fs.watch(). The headless server likely creates file watchers per session (working directory monitoring,.copilot/config watches, session state directory watches, etc.). When sessions go idle or are explicitly closed, these watchers are never cleaned up.With multi-agent workflows that spawn many short-lived sub-agent sessions (fleet mode, parallel task execution), the leak accelerates rapidly. In our case, 93 sessions accumulated ~9,779 kqueue FDs.
When the kqueue count gets high enough,
pty.spawn()(used by the bash tool to create pseudo-terminal sessions) fails — likely due to macOS kernel resource pressure on PTY allocation or the child process inheriting too many FDs.Impact
Suggested Fix
fs.watch()handle should be tracked per-session and.close()d when the session ends.CLOEXECon kqueue FDs if not already — prevents child processes (bash) from inheriting unnecessary FDs.Workaround
Restart the headless server periodically, or when bash failures are detected. PolyPilot users can use Settings → Save & Reconnect to trigger a server restart.