Skip to content

Commit 5b9e8e3

Browse files
authored
Fix JobRunningPipeline not reclaiming stale jobs for terminating runs (#3741)
1 parent 869754b commit 5b9e8e3

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

src/dstack/_internal/server/background/pipeline_tasks/jobs_running.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,6 @@ async def fetch(self, limit: int) -> list[JobRunningPipelineItem]:
207207
JobModel.status.in_(
208208
[JobStatus.PROVISIONING, JobStatus.PULLING, JobStatus.RUNNING]
209209
),
210-
RunModel.status.not_in([RunStatus.TERMINATING]),
211210
or_(
212211
# Process provisioning and pulling jobs quicker for low-latency provisioning.
213212
# Active jobs processing can be less frequent to minimize contention with `RunPipeline`.
@@ -223,10 +222,11 @@ async def fetch(self, limit: int) -> list[JobRunningPipelineItem]:
223222
),
224223
or_(
225224
and_(
226-
# Do not try to lock jobs if the run is waiting for the lock,
225+
# Do not try to lock jobs if the run is waiting for the lock or terminating,
227226
# but allow retrying jobs whose own lock is stale because
228227
# the run pipeline cannot reclaim stale job locks.
229228
RunModel.lock_owner.is_(None),
229+
RunModel.status.not_in([RunStatus.TERMINATING]),
230230
JobModel.lock_expires_at.is_(None),
231231
),
232232
JobModel.lock_expires_at < now,

0 commit comments

Comments
 (0)