Skip to content

Commit e7a6e76

Browse files
committed
fix: cancel pending delayed snapshots regardless of completion outcome
A runner calling attempts/complete has finished executing either way - on a transient completion failure it retries the call, so cancelling only on success leaves the stale snapshot armed in the meantime. The runnerId guard already covers the stale-duplicate-runner case whose completion fails server-side validation. Addresses CodeRabbit review feedback on the PR.
1 parent 2f5b8db commit e7a6e76

1 file changed

Lines changed: 13 additions & 11 deletions

File tree

  • apps/supervisor/src/workloadServer

apps/supervisor/src/workloadServer/index.ts

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -287,13 +287,25 @@ export class WorkloadServer extends EventEmitter<WorkloadServerEvents> {
287287
"POST",
288288
async () => {
289289
const { req, reply, params, body } = ctx;
290+
const runnerId = this.runnerIdFromRequest(req);
290291
const completeResponse = await this.workerClient.completeRunAttempt(
291292
params.runFriendlyId,
292293
params.snapshotFriendlyId,
293294
body,
294-
this.runnerIdFromRequest(req)
295+
runnerId
295296
);
296297

298+
// A completion attempt invalidates any pending delayed snapshot
299+
// regardless of outcome: the runner has finished executing, so the
300+
// suspended state the snapshot was scheduled to capture no longer
301+
// exists. Without this, the snapshot fires up to snapshotDelayMs
302+
// later and pauses a VM that has long moved on - and on a transient
303+
// completion failure the runner retries, so waiting for success
304+
// would leave the stale snapshot armed in the meantime. The
305+
// runnerId guard keeps a stale duplicate runner's failed completion
306+
// from cancelling a fresh runner's snapshot.
307+
this.snapshotService?.cancel(params.runFriendlyId, runnerId);
308+
297309
if (!completeResponse.success) {
298310
this.logger.error("Failed to complete run", {
299311
params,
@@ -303,16 +315,6 @@ export class WorkloadServer extends EventEmitter<WorkloadServerEvents> {
303315
return;
304316
}
305317

306-
// A completed attempt invalidates any pending delayed snapshot: the
307-
// suspended execution state it was scheduled to capture no longer
308-
// exists. Without this, the snapshot fires up to snapshotDelayMs
309-
// later and pauses a VM that has long moved on, e.g. mid warm-start
310-
// long-poll or already executing the next run.
311-
this.snapshotService?.cancel(
312-
params.runFriendlyId,
313-
this.runnerIdFromRequest(req)
314-
);
315-
316318
reply.json(
317319
completeResponse.data satisfies WorkloadRunAttemptCompleteResponseBody
318320
);

0 commit comments

Comments
 (0)