Skip to content

fix: embed pool waits for worker restart, restarts on code-0 exit#3

Merged
BrainSlugs83 merged 5 commits intomainfrom
fix/embed-worker-restart-resilience
Mar 27, 2026
Merged

fix: embed pool waits for worker restart, restarts on code-0 exit#3
BrainSlugs83 merged 5 commits intomainfrom
fix/embed-worker-restart-resilience

Conversation

@BrainSlugs83
Copy link
Copy Markdown
Owner

@BrainSlugs83 BrainSlugs83 commented Mar 27, 2026

Fixes #4

Extract worker lifecycle management into embed-pool.js with:

  • embed() awaits pending restart instead of rejecting immediately
  • Worker restarts on ALL exit codes, not just non-zero
  • shuttingDown guard prevents restart after explicit shutdown
  • Configurable workerReadyTimeout for restart wait (default 30s)
  • Timer cleanup in rejectAllPending to prevent leaks
  • try-catch in restart callback prevents permanent hang on factory throw
  • Restart timer stored and cleared in shutdown() prevents zombie workers

Includes 10 unit tests covering both original bugs and edge case fixes.

@BrainSlugs83 BrainSlugs83 force-pushed the fix/embed-worker-restart-resilience branch 3 times, most recently from 531a29b to 82f4e72 Compare March 27, 2026 01:03
BrainSlugs83 and others added 4 commits March 26, 2026 19:10
Extract worker lifecycle management into embed-pool.js with:
- embed() awaits pending restart instead of rejecting immediately
- Worker restarts on ALL exit codes, not just non-zero
- shuttingDown guard prevents restart after explicit shutdown
- Configurable workerReadyTimeout for restart wait (default 30s)
- Timer cleanup in rejectAllPending to prevent leaks

Includes 8 new unit tests covering both bugs and fix behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n shutdown

- Wrap initWorker() in try-catch inside scheduleRestart setTimeout callback
  to prevent permanently hung workerReadyPromise when factory throws
- Store restart timer ID and clearTimeout it in shutdown() to prevent
  zombie worker spawning after explicit shutdown
- Re-check shuttingDown flag inside setTimeout callback as belt-and-suspenders
- Add 2 regression tests that prove both bugs before fix

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… failed restarts

- rejectAllPending now fires on ALL exit codes (code-0 previously orphaned
  in-flight embeds until the 60s embed timeout)
- Failed workerFactory() calls in scheduleRestart now retry with exponential
  backoff (delay doubles each failure, capped at maxRestartDelay)
- Backoff resets to base delay after a successful restart
- Removed unreachable shuttingDown guard inside setTimeout callback
  (clearTimeout in shutdown() already prevents the callback from firing)

7 new tests (68 total), 100% line/branch/function coverage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Restore clearTimeout(restartTimer) in shutdown() (primary guard)
- Remove v8 ignore pragma from shuttingDown guard (backup guard)
- Add monkeypatch test that neuters clearTimeout to exercise the
  backup guard, proving it catches the timer callback race
- 69 tests, 100% line/branch/function coverage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@BrainSlugs83 BrainSlugs83 force-pushed the fix/embed-worker-restart-resilience branch from 10b7e3f to e6b21c0 Compare March 27, 2026 02:10
- Add VECTOR_MEMORY_DATA_DIR env var to vector-memory-server.js and index.js
  for overriding the data directory (default: ~/.copilot/)
- Add test-integration.js with 7 end-to-end tests that spawn the full MCP
  STDIO proxy, perform the JSON-RPC handshake, and exercise all tools
- Tests use temp directory via VECTOR_MEMORY_DATA_DIR + random port to
  avoid touching real data or conflicting with running servers
- Add npm run test:integration script

Refs #5

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@BrainSlugs83 BrainSlugs83 merged commit 3cf1b90 into main Mar 27, 2026
2 checks passed
@BrainSlugs83 BrainSlugs83 deleted the fix/embed-worker-restart-resilience branch March 27, 2026 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Embed worker exits with code 0 and never restarts — vector search permanently broken

1 participant