Skip to content

Commit 6e5602a

Browse files
committed
Add red 'stalled' status to admin dashboard banner
The dashboard banner now distinguishes three states instead of two: - green Queue Running (heartbeat < dashboardIdleAfter, 60s default) - yellow Queue Idle (heartbeat >= dashboardIdleAfter, or no heartbeat and no backlog) - red Queue Stalled (heartbeat >= dashboardStalledAfter, 120s default, with a pending backlog AND no in-flight job; OR no worker has reported at all and there is a pending backlog or stuck in-flight job) Red surfaces the case the old "Queue Idle" yellow blurred away: jobs are piling up and nothing is processing them. The previous banner also showed a muted info notice ("No queue status available") when every worker row had aged past Queue.defaultRequeueTimeout — the worst-case full cron outage. That path now produces the red banner too. The conditions are tuned to avoid false positives: - workers == 0 alone does not trigger red. In cron-driven mode that is the normal idle state for a quiet system. - runningJobs > 0 keeps a busy worker out of red. Heartbeats fire at the top of each loop, not during runJob(), so a long-running task (>2 min) with more pending behind it would otherwise look stalled by heartbeat age alone. When red, the banner expands with a diagnostic grid (last activity absolute + relative, workers, pending) and a context-specific cause hint that names the most likely fault and the relevant CLI command. Thresholds are exposed as two new config keys: - Queue.dashboardIdleAfter (default 60, seconds) - Queue.dashboardStalledAfter (default 120, seconds) Defaults are deliberate UI policy — human-perceptible 1-min / 2-min boundaries — not derived from queue mechanics. None of the existing config knobs (workerLifetime, defaultRequeueTimeout, sleeptime) actually mean "dashboard heartbeat freshness", so deriving from them coupled the banner to unrelated semantics. Installations with unusual cron cadence (e.g. slow exitwhennothingtodo cron) should raise dashboardStalledAfter past the cron interval to avoid false-red between ticks.
1 parent 06abf17 commit 6e5602a

3 files changed

Lines changed: 180 additions & 37 deletions

File tree

config/app.example.php

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,23 @@
127127
// auto-refresh dashboard in seconds (0 = disabled)
128128
'dashboardAutoRefresh' => 0,
129129

130+
// Status-banner thresholds on the admin dashboard, in seconds. The
131+
// banner has three colors: green (running), yellow (idle), red
132+
// (stalled — action required).
133+
// running: fresh heartbeat (< dashboardIdleAfter)
134+
// idle: stale heartbeat, no backlog (>= dashboardIdleAfter)
135+
// stalled: >= dashboardStalledAfter with a pending backlog and no
136+
// in-flight job, OR no worker reporting with backlog
137+
// Defaults (60 / 120) are deliberate UI policy — human-perceptible
138+
// 1-min / 2-min boundaries — not derived from queue mechanics, since
139+
// no existing config knob (workerLifetime, defaultRequeueTimeout,
140+
// sleeptime) actually means "heartbeat freshness." Override for
141+
// unusual cadences (e.g. slow cron in `exitwhennothingtodo` mode —
142+
// raise dashboardStalledAfter past the cron interval to avoid
143+
// false-red between ticks).
144+
'dashboardIdleAfter' => 60,
145+
'dashboardStalledAfter' => 120,
146+
130147
// Standalone mode for admin controllers:
131148
// - false (default): Extends App\Controller\AppController, inherits app auth/components
132149
// - true: Isolated admin, skips app's AppController setup

templates/Admin/Queue/index.php

Lines changed: 147 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -25,58 +25,168 @@
2525

2626
use Cake\Core\Configure;
2727

28+
// Banner thresholds are a UI policy, not a system mechanic: how long the
29+
// dashboard waits before nagging the admin. Defaults are 60s yellow / 120s
30+
// red — human-perceptible minute boundaries — and don't derive from queue
31+
// config knobs because none of them actually mean "heartbeat freshness":
32+
// - workerLifetime is an exit policy (a 1h-lifetime worker still
33+
// heartbeats every ~sleeptime when idle).
34+
// - defaultRequeueTimeout is the job-reassignment safeguard, tuned for
35+
// max job duration (often 5-10 min).
36+
// - sleeptime is closest to the real heartbeat cadence for an idle
37+
// worker, but busy workers don't sleep — and we already cover the
38+
// busy-worker case via the `runningJobs > 0` escape hatch below.
39+
// Override these for installations with unusual cron cadence (e.g. slow
40+
// `exitwhennothingtodo` cron — raise dashboardStalledAfter past the cron
41+
// interval to avoid false-red between ticks).
42+
$idleAfterSeconds = (int)Configure::read('Queue.dashboardIdleAfter', 60);
43+
$stalledAfterSeconds = (int)Configure::read('Queue.dashboardStalledAfter', 120);
2844
?>
2945

3046
<!-- Status Banner -->
31-
<?php if ($status): ?>
32-
<?php
47+
<?php
48+
/**
49+
* Three-state status (running / idle / stalled). State is computed for both the
50+
* "have a recent heartbeat" path and the "no active worker rows at all" path so
51+
* that a total cron outage — the worst case — surfaces as red, not as a muted
52+
* info notice.
53+
*
54+
* running <idleAfterSeconds green
55+
* idle idleAfterSeconds-stalledAfterSeconds,
56+
* OR no heartbeat & no backlog yellow
57+
* stalled ≥stalledAfterSeconds with pending backlog and no
58+
* in-flight job, OR no heartbeat at all with pending red
59+
*
60+
* Thresholds default to 60s yellow / 120s red — human-perceptible minute
61+
* boundaries — and can be tuned via Queue.dashboardIdleAfter and
62+
* Queue.dashboardStalledAfter for installs with unusual cron cadence.
63+
*
64+
* Notes:
65+
* - In cron-driven mode workers are short-lived; `workers == 0` is the normal
66+
* idle state for a quiet system, so red requires a real pending backlog too.
67+
* - `runningJobs > 0` (derived in the controller from
68+
* `fetched IS NOT NULL AND completed IS NULL`) keeps a busy worker out of
69+
* red: heartbeats fire at the top of each loop, not during long jobs, so
70+
* a >2 min task with more pending behind it would look stalled by heartbeat
71+
* age alone.
72+
* - When `$status` is empty, `QueueProcessesTable::status()` filtered every
73+
* worker row past `Queue.defaultRequeueTimeout`. In that case a pending
74+
* backlog or stuck in-flight job is unambiguously a problem.
75+
*/
76+
$state = 'idle';
77+
$time = null;
78+
$relTime = null;
79+
80+
if ($status) {
3381
/** @var \Cake\I18n\DateTime $time */
3482
$time = $status['time'];
35-
$running = $time->addMinutes(1)->isFuture();
83+
$now = new \Cake\I18n\DateTime();
84+
$secondsSinceActivity = max(0, $now->getTimestamp() - $time->getTimestamp());
85+
86+
$state = 'running';
87+
if ($secondsSinceActivity >= $idleAfterSeconds) {
88+
$state = 'idle';
89+
}
90+
if ($secondsSinceActivity >= $stalledAfterSeconds && $pendingJobs > 0 && $runningJobs === 0) {
91+
$state = 'stalled';
92+
}
93+
3694
$relTime = method_exists($this->Time, 'relLengthOfTime')
3795
? $this->Time->relLengthOfTime($status['time'])
3896
: $this->Time->timeAgoInWords($status['time']);
39-
?>
40-
<div class="status-banner <?= $running ? 'status-running' : 'status-idle' ?>">
41-
<div class="d-flex align-items-center justify-content-between">
42-
<div class="d-flex align-items-center">
43-
<span class="status-icon me-3">
44-
<?php if ($running): ?>
45-
<i class="fas fa-check-circle text-success"></i>
46-
<?php else: ?>
47-
<i class="fas fa-pause-circle text-warning"></i>
48-
<?php endif; ?>
49-
</span>
50-
<div>
51-
<strong><?= $running ? __d('queue', 'Queue Running') : __d('queue', 'Queue Idle') ?></strong>
52-
<div class="text-muted small">
97+
} elseif ($pendingJobs > 0 || $runningJobs > 0) {
98+
// No worker has reported within `Queue.defaultRequeueTimeout`, yet jobs are
99+
// either waiting (pending) or marked in-flight (fetched but not completed).
100+
// Pending + no heartbeat = cron likely dead. Running + no heartbeat = worker
101+
// died mid-job and left a stale fetched row, OR a job legitimately ran past
102+
// the requeue timeout (which is itself a misconfiguration worth surfacing).
103+
$state = 'stalled';
104+
}
105+
106+
$stateMeta = [
107+
'running' => ['icon' => 'check-circle', 'iconColor' => 'text-success', 'label' => __d('queue', 'Queue Running')],
108+
'idle' => ['icon' => 'pause-circle', 'iconColor' => 'text-warning', 'label' => __d('queue', 'Queue Idle')],
109+
'stalled' => ['icon' => 'exclamation-circle', 'iconColor' => 'text-danger', 'label' => __d('queue', 'Queue Stalled')],
110+
][$state];
111+
?>
112+
<div class="status-banner status-<?= $state ?>">
113+
<div class="d-flex align-items-center justify-content-between">
114+
<div class="d-flex align-items-center">
115+
<span class="status-icon me-3">
116+
<i class="fas fa-<?= $stateMeta['icon'] ?> <?= $stateMeta['iconColor'] ?>"></i>
117+
</span>
118+
<div>
119+
<strong><?= $stateMeta['label'] ?></strong>
120+
<?php if ($state === 'stalled'): ?>
121+
<span class="badge bg-danger ms-2"><?= __d('queue', 'action required') ?></span>
122+
<?php endif; ?>
123+
<div class="text-muted small">
124+
<?php if ($status): ?>
53125
<?= __d('queue', 'Last activity {0}', $relTime) ?>
54126
&bull;
55-
<?= $this->Html->link(
56-
__d('queue', '{0} worker(s)', $workers),
57-
['action' => 'processes'],
58-
['class' => 'text-decoration-none']
59-
) ?>
127+
<?php else: ?>
128+
<?= __d('queue', 'No worker reporting') ?>
60129
&bull;
61-
<?= __d('queue', '{0} server(s)', count($servers)) ?>
62-
</div>
130+
<?php endif; ?>
131+
<?= $this->Html->link(
132+
__d('queue', '{0} worker(s)', $workers),
133+
['action' => 'processes'],
134+
['class' => 'text-decoration-none']
135+
) ?>
136+
&bull;
137+
<?= __d('queue', '{0} server(s)', count($servers)) ?>
63138
</div>
64139
</div>
65-
<div>
66-
<?= $this->Html->link(
67-
'<i class="fas fa-cogs me-1"></i>' . __d('queue', 'Manage Workers'),
68-
['action' => 'processes'],
69-
['class' => 'btn btn-sm btn-outline-dark', 'escapeTitle' => false]
70-
) ?>
71-
</div>
140+
</div>
141+
<div>
142+
<?= $this->Html->link(
143+
'<i class="fas fa-cogs me-1"></i>' . __d('queue', 'Manage Workers'),
144+
['action' => 'processes'],
145+
['class' => 'btn btn-sm btn-outline-dark', 'escapeTitle' => false]
146+
) ?>
72147
</div>
73148
</div>
74-
<?php else: ?>
75-
<div class="alert alert-secondary">
76-
<i class="fas fa-info-circle me-2"></i>
77-
<?= __d('queue', 'No queue status available. Workers may not have started yet.') ?>
78-
</div>
79-
<?php endif; ?>
149+
<?php if ($state === 'stalled'): ?>
150+
<?php
151+
if (!$status && $runningJobs > 0) {
152+
$causeHint = __d('queue', 'A job is marked in-flight but no worker is reporting. The worker likely crashed mid-job — reset stale fetched jobs and check cron.');
153+
} elseif (!$status) {
154+
$causeHint = __d('queue', 'No worker has reported in. Cron is likely not firing — check that {0} runs on at least one server.', '<code>bin/cake queue run</code>');
155+
} elseif ($workers === 0) {
156+
$causeHint = __d('queue', 'Jobs are waiting but no workers are running. Check that {0} cron is firing on at least one server.', '<code>bin/cake queue run</code>');
157+
} else {
158+
$causeHint = __d('queue', "Jobs are waiting but aren't being picked up. Workers may have crashed — restart the queue or clean up stale processes.");
159+
}
160+
?>
161+
<div class="stalled-details mt-3 pt-3 border-top border-danger-subtle">
162+
<dl class="row mb-2 small">
163+
<dt class="col-sm-3 text-muted fw-normal"><?= __d('queue', 'Last activity') ?></dt>
164+
<dd class="col-sm-9 mb-1">
165+
<?php if ($time): ?>
166+
<code><?= h($time->i18nFormat('yyyy-MM-dd HH:mm:ss')) ?></code>
167+
<span class="text-muted">· <?= $relTime ?></span>
168+
<?php else: ?>
169+
<span class="text-danger fw-medium"><?= __d('queue', 'No worker has reported recently') ?></span>
170+
<?php endif; ?>
171+
</dd>
172+
<dt class="col-sm-3 text-muted fw-normal"><?= __d('queue', 'Workers') ?></dt>
173+
<dd class="col-sm-9 mb-1">
174+
<strong class="<?= $workers === 0 ? 'text-danger' : '' ?>"><?= $workers ?></strong>
175+
<?= __d('queue', 'on {0} server(s)', count($servers)) ?>
176+
</dd>
177+
<dt class="col-sm-3 text-muted fw-normal"><?= __d('queue', 'Pending') ?></dt>
178+
<dd class="col-sm-9 mb-0">
179+
<strong class="<?= $pendingJobs > 0 ? 'text-danger' : '' ?>"><?= $pendingJobs ?></strong>
180+
<?= __d('queue', 'jobs waiting') ?>
181+
</dd>
182+
</dl>
183+
<div class="small">
184+
<i class="fas fa-info-circle me-1 text-danger"></i>
185+
<?= $causeHint ?>
186+
</div>
187+
</div>
188+
<?php endif; ?>
189+
</div>
80190

81191
<!-- Stats Cards -->
82192
<div class="row g-3 mb-4">

templates/layout/queue.php

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -248,6 +248,22 @@
248248
border: 1px solid #ffc107;
249249
}
250250

251+
.status-banner.status-stalled {
252+
background: linear-gradient(135deg, #f8d7da 0%, #f5c6cb 100%);
253+
border: 1px solid #dc3545;
254+
border-left-width: 4px;
255+
}
256+
257+
.status-banner.status-stalled .stalled-details dt {
258+
padding-top: 0.125rem;
259+
}
260+
261+
.status-banner.status-stalled .stalled-details code {
262+
background: rgba(220, 53, 69, 0.08);
263+
padding: 0.125rem 0.375rem;
264+
border-radius: 0.25rem;
265+
}
266+
251267
.status-banner .status-icon {
252268
font-size: 1.5rem;
253269
}

0 commit comments

Comments
 (0)