-
Notifications
You must be signed in to change notification settings - Fork 9
fix: prevent zombie MCP processes via handshake timeout + deferred init #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,122 @@ | ||
| /** | ||
| * Integration tests for zombie process prevention. | ||
| * | ||
| * These tests verify that the MCP server exits cleanly when no client connects | ||
| * (handshake timeout) and that project initialization is deferred until after | ||
| * the MCP handshake completes. | ||
| * | ||
| * The tests spawn real child processes to exercise the actual startup path. | ||
| */ | ||
|
|
||
| import { describe, it, expect, beforeAll } from 'vitest'; | ||
| import { spawn } from 'node:child_process'; | ||
| import { existsSync } from 'node:fs'; | ||
| import path from 'node:path'; | ||
| import os from 'node:os'; | ||
| import { fileURLToPath } from 'node:url'; | ||
|
|
||
| const __dirname = path.dirname(fileURLToPath(import.meta.url)); | ||
| const ENTRY_POINT = path.resolve(__dirname, '..', 'dist', 'index.js'); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The test resolves the entry point from import { existsSync } from 'node:fs';
beforeAll(() => {
if (!existsSync(ENTRY_POINT)) {
throw new Error(
`dist/index.js not found — run \`npm run build\` before the zombie-guard tests.`
);
}
});This is especially relevant for contributors running
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed. Added a beforeAll(() => {
if (!existsSync(ENTRY_POINT)) {
throw new Error(
`dist/index.js not found - run \`npm run build\` before the zombie-guard tests.`
);
}
});Throws early with an actionable message rather than letting the test fail with a cryptic spawn rejection. |
||
|
|
||
| /** | ||
| * Spawn the MCP server as a child process and wait for it to exit. | ||
| * Returns { code, stderr, elapsed } where elapsed is in milliseconds. | ||
| */ | ||
| function spawnServer( | ||
| args: string[], | ||
| env: Record<string, string> = {}, | ||
| timeoutMs = 45_000 | ||
| ): Promise<{ code: number | null; signal: string | null; stderr: string; elapsed: number }> { | ||
| return new Promise((resolve, reject) => { | ||
| const start = Date.now(); | ||
| let stderr = ''; | ||
|
|
||
| const child = spawn(process.execPath, [ENTRY_POINT, ...args], { | ||
| stdio: ['pipe', 'pipe', 'pipe'], | ||
| env: { ...process.env, ...env }, | ||
| timeout: timeoutMs | ||
| }); | ||
|
|
||
| child.stderr?.on('data', (chunk: Buffer) => { | ||
| stderr += chunk.toString(); | ||
| }); | ||
|
|
||
| child.on('error', reject); | ||
| child.on('close', (code, signal) => { | ||
| resolve({ code, signal, stderr, elapsed: Date.now() - start }); | ||
| }); | ||
|
|
||
| // Don't write anything to stdin — simulate the zombie scenario | ||
| // where no MCP client sends an `initialize` message. | ||
| }); | ||
| } | ||
|
|
||
| describe('zombie process prevention', () => { | ||
| beforeAll(() => { | ||
| if (!existsSync(ENTRY_POINT)) { | ||
| throw new Error( | ||
| `dist/index.js not found - run \`npm run build\` before the zombie-guard tests.` | ||
| ); | ||
| } | ||
| }); | ||
|
|
||
| it('exits with code 1 when no MCP client connects within timeout', async () => { | ||
| // Use a short timeout for the test (2 seconds instead of the default 30). | ||
| // Use os.tmpdir() as a real existing directory so path validation passes — | ||
| // this tests the realistic scenario where a valid path IS provided but no | ||
| // MCP client connects (which is exactly the Codex zombie scenario). | ||
| const result = await spawnServer( | ||
| [os.tmpdir()], | ||
| { CODEBASE_CONTEXT_HANDSHAKE_TIMEOUT_MS: '2000' } | ||
| ); | ||
|
|
||
| expect(result.code).toBe(1); | ||
| expect(result.stderr).toContain('No MCP client connected within'); | ||
| expect(result.stderr).toContain('npx codebase-context --help'); | ||
| // Should exit roughly around the timeout (2s), not hang forever | ||
| expect(result.elapsed).toBeLessThan(10_000); | ||
| }, 15_000); | ||
|
|
||
| it('exits with code 1 even when invoked with no arguments at all', async () => { | ||
| const result = await spawnServer( | ||
| [], | ||
| { CODEBASE_CONTEXT_HANDSHAKE_TIMEOUT_MS: '2000' } | ||
| ); | ||
|
|
||
| expect(result.code).toBe(1); | ||
| expect(result.stderr).toContain('No MCP client connected within'); | ||
| expect(result.elapsed).toBeLessThan(10_000); | ||
| }, 15_000); | ||
|
|
||
| it('does not start indexing or file watchers before handshake', async () => { | ||
| // With DEBUG on, the server logs "[DEBUG] Server ready" inside oninitialized. | ||
| // Since no client ever connects, that log must never appear. | ||
| // Use os.tmpdir() so path validation passes before the handshake timer runs. | ||
| const result = await spawnServer( | ||
| [os.tmpdir()], | ||
| { | ||
| CODEBASE_CONTEXT_HANDSHAKE_TIMEOUT_MS: '2000', | ||
| CODEBASE_CONTEXT_DEBUG: '1' | ||
| } | ||
| ); | ||
|
|
||
| expect(result.code).toBe(1); | ||
| // "[DEBUG] Server ready" is printed inside oninitialized — should NOT appear | ||
| // because no client ever sends `initialize`. | ||
| expect(result.stderr).not.toContain('[DEBUG] Server ready'); | ||
| }, 15_000); | ||
|
|
||
| it('respects custom timeout via environment variable', async () => { | ||
| const start = Date.now(); | ||
| const result = await spawnServer( | ||
| [], | ||
| { CODEBASE_CONTEXT_HANDSHAKE_TIMEOUT_MS: '1000' } | ||
| ); | ||
| const elapsed = Date.now() - start; | ||
|
|
||
| expect(result.code).toBe(1); | ||
| // Should exit around 1 second, definitely under 5 | ||
| expect(elapsed).toBeGreaterThan(800); | ||
| expect(elapsed).toBeLessThan(5_000); | ||
| }, 10_000); | ||
| }); | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
task.surfaceis a value not present in therunnersmap (e.g., a typo in a fixture file or a newly addedDiscoverySurfacevalue without a corresponding runner),runnerisundefinedand callingawait runner(task, rootPath)throws a confusingTypeError: runner is not a functionrather than a descriptive error.Consider adding an explicit check:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Added an explicit guard before calling the runner:
This surfaces fixture typos or unregistered surfaces with a clear message instead of a cryptic TypeError.