Fix hidden queued deployments in deployment history by M-Hassan-Raza · Pull Request #4146 · Dokploy/dokploy

M-Hassan-Raza · 2026-04-04T10:16:45Z

Summary

When multiple deplomyents are triggered at the same time, only the active deployment gets a deployment record immediately. Waiting jobs are queued in BullMQ, but they do not appear in the deployment history until a worker starts processing them.

This change creates the deployment record before the job is added to the queue and keeps that same record through the rest of the deployment lifecycle.

What changed

add a queued deployment status
create deployment rows before enqueueing application, compose, and preview deployment jobs
pass deploymentId through the queue and remote deploy API so workers update the existing row instead of creating a new one
update queue cleanup actions so waiting deployments are marked as cancelled
show queued and cancelled deployment states in the deployment views
make preview deployment cards reflect the latest deployment attempt status

Manual testing

triggered two compose deployments and confirmed the second deployment appears immediately as queued
confirmed the same deployment row transitions from queued to running to done
confirmed Cancel Queues updates a waiting deployment from queued to cancelled
confirmed webhook-triggered application deployments also show queued
confirmed global queue cleanup cancels queued application, compose, and preview deployment rows

Screenshots

Service deployment view

Caption: Queued deployment visible while another deployment is running

Centralized deployments view

Caption: Queued status shown in the centralized deployments view

Queue cancellation

Caption: Queued deployment updated to cancelled after queue cleanup

Notes

the last commit only contains generated Drizzle metadata which blew up the size :D

Greptile Summary

This PR introduces a queued deployment status so that BullMQ-waiting deployments are visible in the history immediately, rather than only appearing once a worker picks them up. The approach is well-thought-out: a deployment record is created as queued before the job is enqueued, the same record is transitioned to running when the worker starts processing it, and cancellation/failure paths update the record accordingly. The cloud (IS_CLOUD) remote-server path correctly propagates deploymentId through the API schema and handler.

Key issues found:

Server-restart / Redis-flush leaves queued records stuck: initCancelDeployments only cancels status = 'running' rows. If the server crashes between writing the queued DB record and successfully calling myQueue.add, or if an admin flushes Redis via the "Clean Redis" action (cleanRedis mutation in settings.ts), every queued row in the database will remain permanently stuck.
cleanRedis doesn't cancel outstanding queued deployment records: Flushing Redis destroys in-flight BullMQ jobs but leaves their corresponding queued DB rows orphaned.
startQueuedDeployment throws TRPCError from worker context: This function is invoked inside BullMQ workers, not tRPC procedures. Throwing TRPCError causes the job to be marked as failed with misleading error output.
attachQueuedDeployment error bypasses failQueuedDeployment: In enqueueDeploymentJob, the outer catch that calls failQueuedDeployment is only reachable if attachQueuedDeployment itself succeeds.

Confidence Score: 2/5

Not safe to merge as-is — server restarts and the existing 'Clean Redis' admin action can permanently strand deployment records in the queued state with no recovery path.

The core feature logic is sound and well-structured, but there are multiple independent scenarios where queued deployment records can become permanently stuck. The startup cancellation routine and the Redis-flush mutation both need to be updated to handle the new queued status before this is safe to ship.

packages/server/src/utils/startup/cancel-deployments.ts and apps/dokploy/server/api/routers/settings.ts (the cleanRedis mutation) require the most attention.

Comments Outside Diff (2)

packages/server/src/services/deployment.ts, line 1070-1075 (link)

TRPCError thrown from BullMQ worker context

startQueuedDeployment is called from BullMQ worker job handlers (e.g., deployApplication, rebuildCompose), which are not tRPC procedures. Throwing TRPCError here causes BullMQ to mark the job as failed (rather than handling the case gracefully), and emits confusing error stack traces in the worker logs that expose tRPC internals.

More concretely: if the deployment was cancelled between job enqueue and job processing (a valid race condition despite job removal being attempted), the worker crashes the job unnecessarily. The deployment DB row is already 'cancelled', so the correct behaviour is a silent no-op.

Consider returning the existing deployment record (or null) and letting the caller decide whether to abort:
```
if (deployment.length === 0 || !deployment[0]) {
    // Deployment was already cancelled/processed — skip silently
    return null;
}
```
The callers (deployApplication, rebuildApplication, etc.) can then bail out early when null is returned, rather than letting a TRPCError unwind the BullMQ job.
apps/dokploy/server/api/routers/settings.ts, line 104-126 (link)

cleanRedis leaves queued deployment records permanently stuck

cleanRedis flushes all Redis data via FLUSHALL. After this PR, every in-flight BullMQ job has a corresponding DB row with status = 'queued'. Flushing Redis destroys those jobs without updating the deployment records, so every queued row in the database will never transition to running, done, or cancelled.

The fix is to cancel all queued deployments in the database before (or right after) flushing Redis, analogous to what cancelQueuedJobs does for jobs with known deploymentIds. Alternatively, cleanAllDeploymentQueue could be called first to drain and cancel the queue entries before flushing Redis.

_{Reviews (1): Last reviewed commit: "chore(database): update drizzle metadata" | Re-trigger Greptile}

Greptile also left 2 inline comments on this PR.

_{(2/5) Greptile learns from your feedback when you react with thumbs up/down!}

add queued status

wire queued deployment flow

show queued deployment status

update drizzle metadata

greptile-apps · 2026-04-04T10:23:36Z

packages/server/src/utils/startup/cancel-deployments.ts

 			.set({
 				status: "cancelled",
+				finishedAt: cancelledAt,
 			})
 			.where(eq(deployments.status, "running"))
 			.returning();


queued deployments not cancelled on server restart

initCancelDeployments only cancels deployments with status = 'running'. After this PR, a deployment record can also be in 'queued' status. There is a time window between when the queued DB record is written (in attachQueuedDeployment) and when the BullMQ job is actually enqueued (in myQueue.add). If the server crashes in that window, the record will be stuck in 'queued' forever because:

initCancelDeployments runs on startup but ignores queued rows

No BullMQ job exists to transition it to running

The same problem occurs if an admin uses the "Clean Redis" action (cleanRedis mutation in settings.ts), which flushes all Redis data and destroys any waiting BullMQ jobs, leaving their corresponding queued DB records permanently stuck.

The fix is to also cancel queued deployments on startup:

Suggested change

.set({

status: "cancelled",

finishedAt: cancelledAt,

})

.where(eq(deployments.status, "running"))

.returning();

const result = await db

.update(deployments)

.set({

status: "cancelled",

finishedAt: cancelledAt,

})

.where(inArray(deployments.status, ["running", "queued"]))

.returning();

(Requires importing inArray from drizzle-orm.)

greptile-apps · 2026-04-04T10:23:40Z

apps/dokploy/server/queues/enqueue-deployment.ts

+export const enqueueDeploymentJob = async (jobData: DeploymentJob) => {
+	const queuedJobData = await attachQueuedDeployment(jobData);
+
+	try {
+		if (IS_CLOUD && queuedJobData.serverId) {
+			await deploy(queuedJobData);
+			return queuedJobData;
+		}
+
+		await myQueue.add(
+			"deployments",
+			{ ...queuedJobData },
+			{
+				removeOnComplete: true,
+				removeOnFail: true,
+			},
+		);
+
+		return queuedJobData;
+	} catch (error) {
+		if (queuedJobData.deploymentId) {
+			await failQueuedDeployment(queuedJobData.deploymentId, error);
+		}
+
+		throw error;
+	}
+};


attachQueuedDeployment errors bypass failQueuedDeployment

If attachQueuedDeployment itself throws (e.g., the database is momentarily unreachable or writeDeploymentLogPreamble fails mid-flight), the exception propagates before the try block is entered. The catch block that calls failQueuedDeployment is therefore never reached.

In createDeployment's own catch clause, a new deployment row with status: 'error' is inserted. However, because attachQueuedDeployment threw, queuedJobData was never assigned, so the outer caller has no deploymentId to work with either.

The current structure:

const queuedJobData = await attachQueuedDeployment(jobData); // ← if this throws, catch below is skipped try { ... } catch (error) { if (queuedJobData.deploymentId) { await failQueuedDeployment(queuedJobData.deploymentId, error); // never reached } }

Consider wrapping the entire body (including attachQueuedDeployment) in the try block, or adding a separate try/catch around attachQueuedDeployment to ensure all failure paths are handled consistently.

M-Hassan-Raza · 2026-04-05T15:12:03Z

Just noticed there was a conflict in the branch, have rebased now. Hope this PR gets a look, it looks like a common slop PR from the looks and size (size is mainly just the db snapshot) of it but I did test the code and refactored where I could see some slopification happeneing

M-Hassan-Raza added 4 commits April 4, 2026 15:04

feat(database): add queued deployment status

c2c9a84

add queued status

feat(deployments): track queued deployments through queue lifecycle

168703d

wire queued deployment flow

feat(deployments): show queued deployment state

73aa0ce

show queued deployment status

chore(database): update drizzle metadata

ce8a3e9

update drizzle metadata

M-Hassan-Raza requested a review from Siumauricio as a code owner April 4, 2026 10:16

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Apr 4, 2026

greptile-apps bot reviewed Apr 4, 2026

View reviewed changes

M-Hassan-Raza added 2 commits April 4, 2026 15:33

fix(deployments): stop queued jobs from becoming ghosts

d83ba21

merge upstream canary

59ca5ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix hidden queued deployments in deployment history#4146

Fix hidden queued deployments in deployment history#4146
M-Hassan-Raza wants to merge 6 commits intoDokploy:canaryfrom
M-Hassan-Raza:feat/queued-deployment-status

M-Hassan-Raza commented Apr 4, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Apr 4, 2026

Uh oh!

greptile-apps bot Apr 4, 2026

Uh oh!

M-Hassan-Raza commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

M-Hassan-Raza commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Manual testing

Screenshots

Service deployment view

Centralized deployments view

Queue cancellation

Notes

Greptile Summary

Confidence Score: 2/5

Comments Outside Diff (2)

Uh oh!

greptile-apps bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

M-Hassan-Raza commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

M-Hassan-Raza commented Apr 4, 2026 •

edited

Loading