fix(deployments-queue): reset service status to error when deployment fails#4138
Open
mixelburg wants to merge 1 commit intoDokploy:canaryfrom
Open
fix(deployments-queue): reset service status to error when deployment fails#4138mixelburg wants to merge 1 commit intoDokploy:canaryfrom
mixelburg wants to merge 1 commit intoDokploy:canaryfrom
Conversation
… fails
When the deployment worker catches an error, the service status was left
permanently in 'running' state because the catch block only logged the
error without rolling back the status set at the beginning of the job.
Add status rollback in the catch block for all three applicationType
branches (application, compose, application-preview). Each rollback call
is wrapped in .catch(() => {}) so that a secondary DB failure cannot mask
the original error in the logs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #4083
What happened
When a deployment job fails (network error, Docker daemon crash, SSH timeout, etc.), the
catchblock indeployments-queue.tsonly logged the error. The service status set to"running"at the beginning of the job was never rolled back, leaving the service permanently stuck in a ghost"running"state with no way to recover short of manually updating the database.Fix
Added status rollback in the catch block for all three
applicationTypebranches:applicationupdateApplicationStatus(applicationId, "error")composeupdateCompose(composeId, { composeStatus: "error" })application-previewupdatePreviewDeployment(previewDeploymentId, { previewStatus: "error" })Each rollback call is wrapped in
.catch(() => {})so that a secondary DB/network failure cannot mask the original error in the logs.Greptile Summary
This PR fixes a bug where a failed deployment job would leave the service permanently stuck in a
"running"state. The catch block indeployments-queue.tsnow resets the service status to"error"for all threeapplicationTypebranches (application,compose,application-preview), with each rollback call silenced via.catch(() => {})to prevent a secondary DB failure from masking the original error in logs.The fix is minimal, correct, and covers all branches symmetrically. One pre-existing pattern worth noting: the catch block does not re-throw the error, so BullMQ treats the job as completed (not failed) even when it errors — this means no automatic retry will fire and the job won't appear in BullMQ's failed queue. This behavior predates the PR and is out of scope here, but could be a follow-up improvement.
Confidence Score: 5/5
Safe to merge — fixes a real stuck-state bug with a minimal, correct change and introduces no new issues.
All three applicationType branches are handled symmetrically, rollback failures are correctly silenced without masking the original error, and the only notable concern (error not re-thrown so BullMQ treats the job as completed) is a pre-existing behaviour unrelated to this PR.
No files require special attention.
Reviews (1): Last reviewed commit: "fix(deployments-queue): reset service st..." | Re-trigger Greptile