Skip to content

Commit 336ac0c

Browse files
committed
fix(triage-flow): drain queued tasks during shutdown (DEREM-38)
closeActiveSandboxes snapshots both maps and clears them, then awaits sandbox.close() / coder delete on the snapshots. While those awaits resolve, the sandboxes whose run() they're closing reject; that resolves the in-flight runTriageAgent calls; pLimit's microtask-driven runNext() then dequeues the next task; that task's runTriageAgent calls createSandbox(), repopulating the maps with workspaces that will not be reaped before process.exit() fires. Add a module-level shutdownRequested flag set SYNCHRONOUSLY at the top of the signal handler (before any await, so the flip is visible to every microtask the cleanup unblocks). runTriageAgent checks it as its first synchronous step and returns { status: 'skipped', message: 'shutdown requested' } without provisioning a workspace, so post-snapshot tasks never reach the createSandbox call site. Testing: typecheck, lint, format:check clean. All 55 sandcastle tests still pass; the new path is exercised on the signal pipeline which unit tests don't cover, but the synchronous flag check is a straight-line guard with no other branching.
1 parent 9a4b7c1 commit 336ac0c

1 file changed

Lines changed: 28 additions & 0 deletions

File tree

.sandcastle/main.ts

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -348,6 +348,15 @@ const activeSandboxes = new Map<Sandbox, number>();
348348
// so we can also de-duplicate concurrent attempts on the same issue.
349349
const pendingWorkspaceNames = new Map<string, number>();
350350

351+
// Set synchronously by the signal handler before any await. Checked at
352+
// the top of `runTriageAgent` so any task pLimit dequeues during cleanup
353+
// (in-flight runs completing as their sandboxes are closed cause queued
354+
// tasks to start) returns early without provisioning a new workspace.
355+
// Without this, those late-starting tasks would populate the maps that
356+
// `closeActiveSandboxes` already snapshotted, and the new workspaces
357+
// would be orphaned when `process.exit()` fires.
358+
let shutdownRequested = false;
359+
351360
async function runTriageAgent(
352361
issue: TriageIssue,
353362
runId: string,
@@ -360,6 +369,20 @@ async function runTriageAgent(
360369
let sandbox: Sandbox | undefined;
361370
let result: TriageIssueSummary | undefined;
362371

372+
// Skip queued tasks that pLimit dequeues after a shutdown signal. The
373+
// signal handler sets `shutdownRequested` synchronously before the
374+
// first await in closeActiveSandboxes; any task that starts here after
375+
// that point would otherwise call `createSandbox()`, populate the
376+
// already-snapshotted maps, and orphan a workspace when
377+
// `process.exit()` fires.
378+
if (shutdownRequested) {
379+
return {
380+
issueNumber: issue.number,
381+
status: 'skipped',
382+
message: 'shutdown requested',
383+
};
384+
}
385+
363386
try {
364387
workspaceName = workspaceNameForIssue(issue.number);
365388
const { createSandbox, claudeCode } = await import('@ai-hero/sandcastle');
@@ -586,6 +609,11 @@ function installSignalHandlers(): void {
586609
process.exit(signalExitCode(signal));
587610
}
588611
signalled = true;
612+
// Set the shutdown flag SYNCHRONOUSLY before any await so queued
613+
// pLimit tasks that get dequeued during cleanup (in-flight runs
614+
// completing as their sandboxes are closed) see it and skip
615+
// provisioning new workspaces.
616+
shutdownRequested = true;
589617
console.error(
590618
`[afk-triage] received ${signal}; closing ${activeSandboxes.size} active sandboxes and ${pendingWorkspaceNames.size} in-flight workspaces (send ${signal} again to force-exit)`,
591619
);

0 commit comments

Comments
 (0)