Skip to content

fix(deployments-queue): reset service status to error when deployment fails#4138

Open
mixelburg wants to merge 1 commit intoDokploy:canaryfrom
mixelburg:fix/deployment-worker-reset-status-on-error
Open

fix(deployments-queue): reset service status to error when deployment fails#4138
mixelburg wants to merge 1 commit intoDokploy:canaryfrom
mixelburg:fix/deployment-worker-reset-status-on-error

Conversation

@mixelburg
Copy link
Copy Markdown
Contributor

@mixelburg mixelburg commented Apr 3, 2026

Fixes #4083

What happened

When a deployment job fails (network error, Docker daemon crash, SSH timeout, etc.), the catch block in deployments-queue.ts only logged the error. The service status set to "running" at the beginning of the job was never rolled back, leaving the service permanently stuck in a ghost "running" state with no way to recover short of manually updating the database.

Fix

Added status rollback in the catch block for all three applicationType branches:

applicationType rollback call
application updateApplicationStatus(applicationId, "error")
compose updateCompose(composeId, { composeStatus: "error" })
application-preview updatePreviewDeployment(previewDeploymentId, { previewStatus: "error" })

Each rollback call is wrapped in .catch(() => {}) so that a secondary DB/network failure cannot mask the original error in the logs.

Greptile Summary

This PR fixes a bug where a failed deployment job would leave the service permanently stuck in a "running" state. The catch block in deployments-queue.ts now resets the service status to "error" for all three applicationType branches (application, compose, application-preview), with each rollback call silenced via .catch(() => {}) to prevent a secondary DB failure from masking the original error in logs.

The fix is minimal, correct, and covers all branches symmetrically. One pre-existing pattern worth noting: the catch block does not re-throw the error, so BullMQ treats the job as completed (not failed) even when it errors — this means no automatic retry will fire and the job won't appear in BullMQ's failed queue. This behavior predates the PR and is out of scope here, but could be a follow-up improvement.

Confidence Score: 5/5

Safe to merge — fixes a real stuck-state bug with a minimal, correct change and introduces no new issues.

All three applicationType branches are handled symmetrically, rollback failures are correctly silenced without masking the original error, and the only notable concern (error not re-thrown so BullMQ treats the job as completed) is a pre-existing behaviour unrelated to this PR.

No files require special attention.

Reviews (1): Last reviewed commit: "fix(deployments-queue): reset service st..." | Re-trigger Greptile

… fails

When the deployment worker catches an error, the service status was left
permanently in 'running' state because the catch block only logged the
error without rolling back the status set at the beginning of the job.

Add status rollback in the catch block for all three applicationType
branches (application, compose, application-preview). Each rollback call
is wrapped in .catch(() => {}) so that a secondary DB failure cannot mask
the original error in the logs.
@mixelburg mixelburg requested a review from Siumauricio as a code owner April 3, 2026 22:14
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. bug Something isn't working labels Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deployment worker does not reset status to error on failure — services stuck permanently in running state

1 participant