Skip to content

fix(jobs): stop JobQueueService leaking unhandled rejections on DB errors#83

Merged
ersinkoc merged 1 commit into
mainfrom
fix/job-queue-unhandled-rejection
Jun 5, 2026
Merged

fix(jobs): stop JobQueueService leaking unhandled rejections on DB errors#83
ersinkoc merged 1 commit into
mainfrom
fix/job-queue-unhandled-rejection

Conversation

@ersinkoc
Copy link
Copy Markdown
Collaborator

@ersinkoc ersinkoc commented Jun 5, 2026

Summary

Found during a proactive concurrency audit of JobQueueService.

The service invokes executeJob and pollWorker fire-and-forget from three callsites, but only pollAll isolated rejections (via Promise.allSettled). The gaps:

  • executeJob caught the job handler's errors but awaited repo.complete / repo.fail unguarded — both are DB writes that can throw on a transient connection loss.
  • pollWorker rejects if repo.claimJob throws, and the immediate-start poll (startWorker) and per-job re-poll (.finally) didn't .catch it.

So a transient DB failure while finalizing or claiming a job surfaced as an unhandled promise rejection. The gateway's global handler only logs (no crash), but it emitted a misleading Unhandled Promise Rejection with no job/worker context — a false crash signal.

That pollAll already used allSettled while the sibling callsites didn't is the tell this was an oversight, not intent.

Fix

  • executeJob wraps repo.fail in its own try/catch → it never rejects; the caller's finally still frees the slot.
  • New private safePoll() wrapper catches pollWorker rejections at the two bare callsites (immediate start poll + per-job re-poll). pollAll is unchanged.

Behavior is otherwise identical — the worker frees the slot and re-polls in every path; this only stops the leaked rejection.

Test plan

4 new regression tests in job-queue-service.test.ts:

  1. executeJob resolves (not rejects) when repo.complete throws, and still calls repo.fail for retry.
  2. executeJob resolves when repo.complete and repo.fail both throw.
  3. The slot is freed via finally when the job's persistence throws.
  4. startWorker's immediate poll emits no unhandledRejection when claimJob throws (process listener assertion).
  • ✅ 6/6 tests pass (2 existing + 4 new), no type errors
  • typecheck + eslint clean

🤖 Generated with Claude Code

…rors

JobQueueService invokes executeJob and pollWorker fire-and-forget from
three sites; only pollAll isolated rejections (via allSettled). executeJob
caught the job handler's errors but not repo.complete/repo.fail — and
pollWorker rejects if repo.claimJob throws. A transient DB failure while
finalizing or claiming a job therefore surfaced as an unhandled promise
rejection: the gateway's global handler only logs (no crash), but it
emitted a misleading "Unhandled Promise Rejection" with no job context.

- executeJob now wraps repo.fail in its own try/catch so it never rejects;
  the caller's finally still frees the slot.
- New safePoll() wrapper catches pollWorker rejections at the immediate
  start poll and the per-job re-poll (pollAll already used allSettled).

4 regression tests: executeJob resolves when complete/fail throw, the slot
is freed via finally on persistence failure, and startWorker's immediate
poll emits no unhandledRejection when claimJob throws.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ersinkoc ersinkoc merged commit 245082e into main Jun 5, 2026
2 checks passed
@ersinkoc ersinkoc deleted the fix/job-queue-unhandled-rejection branch June 5, 2026 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant