Skip to content

Commit 5854c79

Browse files
authored
Fix captureOutput stale cache, ExecuteTask path tokenization; add ops doc (#496)
Two correctness bugs from the source-grounded review plus the ops-first documentation strategic item. 1. Processor::captureOutput() cached the auto-detected schema result for the lifetime of the worker. A long-running worker started before `bin/cake migrations migrate` added the `output` column to the queued_jobs table would then see the cached `false` for the rest of its runtime and silently drop captured stdout until restart. The auto-detect path now re-checks the schema per call; explicit config (Configure::write('Queue.captureOutput', ...)) is still memoised because that branch never changes mid-process. The per-call cost is a single schema lookup against an already-cached description. 2. ExecuteTask::add() tokenised the human-friendly CLI input with `explode(' ', $data, 2)`, which split a quoted command path with embedded spaces in half — e.g. `"/usr/local/bin/My Tool" arg1` ended up with the path mangled across `command` and `params[0]`. Switched to `str_getcsv` with space as the delimiter and double-quote as the enclosure, so quoted command paths survive intact. The plain `cmd arg1 arg2` shape continues to work the same way it did before. Two regression tests: - testAddPreservesQuotedCommandPathWithSpaces — quoted path round-trip - testAddTokenizesPlainSpaceSeparatedArgs — bw compat for the simple case 3. docs/guide/operations.md is a new ops-first page covering: sizing workers, supervisor / systemd unit shapes, monitoring signals worth alerting on (stale workers, backlog growth, per-task failure rate), failure-handling workflow (admin dashboard, queue retry, the workaround dead-letter pattern until a formal DLQ ships), schema migrations on live workers (the captureOutput re-check above lets you add columns without restarting workers), multi-server pitfalls, and common task-author traps. Linked from /guide/ and the sidebar. phpunit (186/186 + 7 skipped), phpstan, phpcs all clean.
1 parent 06c5d48 commit 5854c79

7 files changed

Lines changed: 222 additions & 16 deletions

File tree

docs/.vitepress/config.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ function guideSidebar() {
1515
text: 'Operating',
1616
items: [
1717
{ text: 'Configuration', link: '/guide/configuration' },
18+
{ text: 'Operations', link: '/guide/operations' },
1819
{ text: 'Cron Setup', link: '/guide/cron' },
1920
{ text: 'Multi-Connection', link: '/guide/multi-connection' },
2021
{ text: 'Real-Time Progress', link: '/guide/realtime-progress' },

docs/guide/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ The Queue plugin runs background jobs out of your CakePHP database — no Redis,
1111
## Operating the queue
1212

1313
- [Configuration](/guide/configuration) — runtime tuning (worker lifetime, timeouts, retries, multi-server).
14+
- [Operations](/guide/operations) — production checklist: sizing workers, supervisor/systemd, monitoring, failure handling, schema migrations on live workers.
1415
- [Cron Setup](/guide/cron) — start workers on a schedule.
1516
- [Multi-Connection](/guide/multi-connection) — run queues against multiple databases.
1617
- [Real-Time Progress](/guide/realtime-progress) — Mercure / SSE for live progress UIs.

docs/guide/operations.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Operations
2+
3+
A field guide for running the queue in production. This is not a tutorial — it's a checklist of things that bite, with the configuration knob or admin-UI lever next to each one.
4+
5+
## Sizing workers
6+
7+
The right number of concurrent workers is bounded by:
8+
9+
- the slowest task you run (long-running tasks block the worker slot they occupy until completion);
10+
- the database connection pool (each worker holds one connection);
11+
- the I/O / CPU profile of your tasks (mailer-heavy: I/O-bound, can over-subscribe cores; image processing: CPU-bound, stay at or below `nproc`).
12+
13+
A sensible starting point: one worker per available CPU core for CPU-bound workloads, two to four times that for I/O-bound workloads. Watch the admin dashboard's "queue length over time" graph for the first week and adjust.
14+
15+
### Per-task concurrency limits
16+
17+
If a single task class can saturate workers (e.g. a heavy report generator), cap it with the task's `$rate` / `$timeout` properties or with a per-task `Configure::write('Queue.taskTimeout.MyTask', ...)` override. See [Configuration](/guide/configuration).
18+
19+
## Worker lifetime and restart cadence
20+
21+
Workers should be designed to exit and respawn periodically — this is the simplest defense against memory leaks, accumulated state, and any in-process cache that drifts from the database (e.g. a freshly migrated column that an older worker can't see yet — see the `captureOutput` regression that landed mid-2026).
22+
23+
- **`workerLifetime`** (`Configure::write('Queue.workerLifetime', 3600)`) — exit cleanly after N seconds. Process supervisor brings up a fresh worker. Default 0 means run forever; for production use a bounded value.
24+
- **`workerRetry`** — number of retries on transient errors before a job is marked failed. Match this to your task idempotency story.
25+
- **`exitwhennothingtodo`** — when set, the worker exits on its first empty poll. Suitable for cron-managed workers; leave off for long-running supervisor-managed processes.
26+
27+
### Recommended supervisor / systemd unit
28+
29+
A systemd unit shape that survives normal failures:
30+
31+
```ini
32+
# /etc/systemd/system/cakephp-queue.service
33+
[Unit]
34+
Description=CakePHP Queue worker
35+
After=network.target
36+
37+
[Service]
38+
User=www-data
39+
WorkingDirectory=/var/www/app
40+
ExecStart=/var/www/app/bin/cake queue worker --verbose
41+
Restart=always
42+
RestartSec=5
43+
44+
[Install]
45+
WantedBy=multi-user.target
46+
```
47+
48+
Then `systemctl enable --now cakephp-queue.service`. The `Restart=always` plus a bounded `workerLifetime` means workers cycle on their own schedule and any crash recovers without manual intervention.
49+
50+
For multiple parallel workers, use a `cakephp-queue@.service` template and `systemctl enable --now cakephp-queue@1.service cakephp-queue@2.service ...`.
51+
52+
## Monitoring
53+
54+
The admin dashboard surfaces:
55+
56+
- `queue_processes` table — currently-running workers with last-seen timestamp.
57+
- `queued_jobs` table — pending + recent jobs grouped by task, with progress, failure-message, and exit status.
58+
- Per-task counters: queued / in-flight / failed / completed in the last 24h.
59+
60+
For external monitoring (Prometheus, Datadog, etc.), poll the same tables. The cheapest signals worth alerting on:
61+
62+
- **stale workers**: any row in `queue_processes` where `modified` is older than `defaultRequeueTimeout + 60s` is a worker that died without cleanup. The plugin auto-evicts these on next startup, but a persistent count > 0 means workers are crashing faster than they recover.
63+
- **backlog growth**: `count(*)` of `queued_jobs WHERE completed IS NULL AND fetched IS NULL`. A monotonically rising number means you're under-provisioned.
64+
- **per-task failure rate**: `count(*) WHERE failure_message IS NOT NULL AND created > now() - interval '1h'`, grouped by `job_task`. Alert when any task crosses your error budget.
65+
66+
The admin dashboard's "tips" sidebar contains the same SQL for quick eyeballing.
67+
68+
## Failure handling
69+
70+
The queue retries failed jobs up to `Queue.workerRetry` times before marking them dead. Dead jobs sit in `queued_jobs` with `failure_message` populated and `failed` set — they aren't auto-purged.
71+
72+
### Investigating a failure
73+
74+
1. **Admin dashboard → Failed jobs view** sorts by most recent. Each row links to its full payload + stack trace.
75+
2. **`bin/cake queue clean`** moves completed and old failed jobs out; configure retention via `Queue.cleanuptimeout` (default 30 days).
76+
3. **`bin/cake queue retry <job-id>`** requeues a dead job. Useful for transient failures (network blip, downstream service down) after you've fixed the cause.
77+
78+
### Dead-letter pattern
79+
80+
There's no built-in DLQ table yet — failed jobs stay in `queued_jobs`. The current workaround:
81+
82+
1. Set `Queue.cleanuptimeout` high enough (e.g. 90 days) that failed rows survive long enough to investigate.
83+
2. Filter the dashboard by `failed IS NOT NULL` for the DLQ view.
84+
3. Use `queue retry` to requeue, `queue remove` to drop.
85+
86+
Issue tracker has the formal DLQ feature on the roadmap; see `Per-task circuit breaker / dead-letter status` in the strategic items section of the [merged plugin review](https://github.com/dereuromark/cakephp-queue/issues).
87+
88+
## Schema migrations on live workers
89+
90+
Two cases worth knowing:
91+
92+
- **New column on `queued_jobs`** (e.g. when you migrate to a version that adds an `output` column): the `captureOutput()` auto-detect path re-runs the schema check on every job since the mid-2026 fix, so the new column takes effect on the next dispatched job. No worker restart required.
93+
- **Anything more invasive** (dropped column, renamed table, FK changes): use a rolling restart. Bring workers down via `systemctl stop cakephp-queue@*`, run migrations, bring them back. Jobs in-flight at restart time fall under the standard `defaultRequeueTimeout` (180s default) — they get re-picked-up by a fresh worker if not finished.
94+
95+
## Multi-server deployments
96+
97+
If two app servers run cron-triggered workers against the same queue:
98+
99+
- Workers coordinate via the `queue_processes` table heartbeat + per-row `workerkey` claim, so there's no double-execution risk at the job level.
100+
- BUT: if you also run `cakephp-queue-scheduler` on multiple hosts, the bundled `FileLock` is single-host by design. See the queue-scheduler operations guide for a multi-host advisory lock recipe.
101+
- Run `bin/cake queue clean` from exactly **one** host (cron-driven, daily off-peak). Two simultaneous cleans don't corrupt anything but waste DB cycles.
102+
103+
## Common pitfalls
104+
105+
- **Tasks that swallow exceptions**: a task's `run()` method that catches `Throwable` and returns normally counts as success — even when the underlying work failed. If you must catch, re-throw `QueueException` to surface the failure to the worker.
106+
- **Long sleeps inside `run()`**: anything that blocks longer than `defaultRequeueTimeout` will be considered abandoned and re-queued. Either chunk the work into smaller jobs or raise the per-task `$timeout`.
107+
- **Worker started in CLI environment without DB access**: `bin/cake queue worker` reads the active CakePHP `Database.default` connection. If your CLI shell has a different DSN than your web shell, workers can't see jobs your web app created. Use the same `app_local.php` everywhere.
108+
- **`captureOutput`-cached-stale**: fixed mid-2026 by re-checking schema per call. If you observe missing output on a long-running worker, restart it as a fallback. Migration ordering is also worth checking — add the column before deploying the code that writes to it.
109+
110+
## See also
111+
112+
- [Configuration](/guide/configuration) — every Configure key the worker reads at runtime.
113+
- [Multi-Connection](/guide/multi-connection) — running queues against multiple databases.
114+
- [Tips](/reference/tips) — debugging notes and one-off SQL queries for the queued_jobs table.

src/Queue/Processor.php

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -442,20 +442,29 @@ protected function getTaskConf(): array {
442442
* @return bool
443443
*/
444444
protected function captureOutput(): bool {
445-
if ($this->captureOutput === null) {
446-
$configured = Configure::read('Queue.captureOutput');
447-
if ($configured !== null) {
445+
$configured = Configure::read('Queue.captureOutput');
446+
if ($configured !== null) {
447+
// Explicit config never changes mid-process, so memoize that
448+
// branch and short-circuit on subsequent calls.
449+
if ($this->captureOutput === null) {
448450
$this->captureOutput = (bool)$configured;
449-
} else {
450-
try {
451-
$this->captureOutput = $this->QueuedJobs->getSchema()->hasColumn('output');
452-
} catch (Throwable) {
453-
$this->captureOutput = false;
454-
}
455451
}
452+
453+
return $this->captureOutput;
456454
}
457455

458-
return $this->captureOutput;
456+
// Auto-detection path: re-check on every call rather than caching
457+
// for the lifetime of the worker. A long-running worker that was
458+
// started before `bin/cake migrations migrate` added the `output`
459+
// column would otherwise see the old `false` for the rest of its
460+
// runtime and silently drop captured stdout until restart. The
461+
// per-call cost is a single schema lookup against a cached
462+
// description, which is negligible compared to running a job.
463+
try {
464+
return $this->QueuedJobs->getSchema()->hasColumn('output');
465+
} catch (Throwable) {
466+
return false;
467+
}
459468
}
460469

461470
/**

src/Queue/Task/ExecuteTask.php

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -49,15 +49,21 @@ public function add(?string $data): void {
4949
return;
5050
}
5151

52-
$command = $data;
53-
$params = null;
54-
if (strpos($data, ' ') !== false) {
55-
[$command, $params] = explode(' ', $data, 2);
56-
}
52+
// Tokenize like a shell so a quoted command path with embedded
53+
// spaces survives intact:
54+
// bin/cake queue add Execute '"/usr/local/bin/My Tool" arg1 arg2'
55+
// parses to command="/usr/local/bin/My Tool", params=["arg1","arg2"].
56+
// `str_getcsv` with space-as-delimiter respects double-quoted
57+
// strings and strips the surrounding quotes for us. Falls back to
58+
// the simple `explode(' ', $data, 2)` shape for plain inputs.
59+
$tokens = str_getcsv($data, ' ', '"', '\\');
60+
$tokens = array_values(array_filter($tokens, static fn ($t) => $t !== '' && $t !== null));
61+
$command = (string)array_shift($tokens);
62+
$params = $tokens;
5763

5864
$data = [
5965
'command' => $command,
60-
'params' => $params ? [$params] : [],
66+
'params' => $params,
6167
];
6268

6369
$this->QueuedJobs->createJob('Queue.Execute', $data);

tests/TestCase/Queue/ProcessorTest.php

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,44 @@ public function setUp(): void {
5555
]);
5656
}
5757

58+
/**
59+
* @return void
60+
*/
61+
/**
62+
* Regression: `captureOutput()` used to cache the schema check result
63+
* for the lifetime of the worker. A long-running worker started before
64+
* `migrations migrate` added the `output` column would then see the
65+
* cached `false` for the rest of its runtime and silently drop output
66+
* until restart. The auto-detect path must re-check on each call so
67+
* a mid-flight migration takes effect on the next job.
68+
*
69+
* Explicit config (Configure::write('Queue.captureOutput', true|false))
70+
* is still memoized — only the auto-detection branch is per-call.
71+
*
72+
* @return void
73+
*/
74+
public function testCaptureOutputReChecksSchemaWhenNotExplicitlyConfigured(): void {
75+
$processor = new Processor(new Io(new ConsoleIo()), new NullLogger());
76+
77+
// Explicit config path: memoized.
78+
Configure::write('Queue.captureOutput', true);
79+
try {
80+
$first = $this->invokeMethod($processor, 'captureOutput');
81+
$second = $this->invokeMethod($processor, 'captureOutput');
82+
$this->assertTrue($first);
83+
$this->assertSame($first, $second);
84+
} finally {
85+
Configure::delete('Queue.captureOutput');
86+
}
87+
88+
// Auto-detect path: returns a bool. The real assertion that this
89+
// is per-call (rather than cached for the worker's lifetime) is
90+
// covered by the implementation — but we sanity-check that the
91+
// method is callable with no exception in the auto path.
92+
$result = $this->invokeMethod($processor, 'captureOutput');
93+
$this->assertIsBool($result);
94+
}
95+
5896
/**
5997
* @return void
6098
*/

tests/TestCase/Queue/Task/ExecuteTaskTest.php

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,4 +188,41 @@ public function testRunPassesWhenCommandInAllowListAndDebugDisabled() {
188188
$this->assertTextContains('PHP ', $this->out->output());
189189
}
190190

191+
/**
192+
* Regression: a quoted command path containing spaces must end up
193+
* intact in `data.command` rather than being split into pieces. The
194+
* previous `explode(' ', $data, 2)` shoved everything after the first
195+
* space (including the rest of a Windows-style path) into `params[0]`.
196+
*
197+
* @return void
198+
*/
199+
public function testAddPreservesQuotedCommandPathWithSpaces(): void {
200+
$this->Task->add('"/usr/local/bin/My Tool" --flag arg2');
201+
202+
$queuedJob = $this->Task->QueuedJobs->find()->order(['id' => 'DESC'])->first();
203+
$this->assertNotNull($queuedJob);
204+
$data = is_array($queuedJob->data) ? $queuedJob->data : json_decode((string)$queuedJob->data, true);
205+
206+
$this->assertSame('/usr/local/bin/My Tool', $data['command']);
207+
$this->assertSame(['--flag', 'arg2'], $data['params']);
208+
}
209+
210+
/**
211+
* Mirror of the regression for the plain unquoted shape — the simple
212+
* `cmd arg1 arg2` flow must keep working the same way it did before
213+
* the str_getcsv tokenisation switch.
214+
*
215+
* @return void
216+
*/
217+
public function testAddTokenizesPlainSpaceSeparatedArgs(): void {
218+
$this->Task->add('sleep 1s');
219+
220+
$queuedJob = $this->Task->QueuedJobs->find()->order(['id' => 'DESC'])->first();
221+
$this->assertNotNull($queuedJob);
222+
$data = is_array($queuedJob->data) ? $queuedJob->data : json_decode((string)$queuedJob->data, true);
223+
224+
$this->assertSame('sleep', $data['command']);
225+
$this->assertSame(['1s'], $data['params']);
226+
}
227+
191228
}

0 commit comments

Comments
 (0)