Cap $tries on queued jobs to stop unbounded retry on --tries=0 workers#4353
Open
JoshSalway wants to merge 12 commits into
Open
Cap $tries on queued jobs to stop unbounded retry on --tries=0 workers#4353JoshSalway wants to merge 12 commits into
JoshSalway wants to merge 12 commits into
Conversation
Four queued jobs in this package ship without a $tries property, inheriting the worker-level default. That works fine for typical runs, but produces the failure mode described in SpartnerNL#2769, SpartnerNL#2596, SpartnerNL#3948, SpartnerNL#2779, SpartnerNL#3962, SpartnerNL#4203, and SpartnerNL#4328: a stall or OOM during import/export triggers the worker to re-dispatch the job, which stalls or OOMs again, repeating until the worker process dies or the queue backlog is cleared manually. AfterImportJob is the most severe case: its handle() uses $this->release($this->interval) to poll for dependent ReadChunk completion. Each release increments the attempts counter; without $tries, there is no ceiling, so a failed dependency or missing cache entry keeps the poll alive indefinitely at 60-second intervals. Changes: - AfterImportJob: $tries = 10 (caps the 60s polling at ~10 minutes) - AppendDataToSheet: $tries = 5 - AppendPaginatedToSheet: $tries = 5 - AppendQueryToSheet: $tries = 5 Users who already set $tries on their Import class are unaffected, since job-level $tries is overridden by the payload resolution path already in use for ReadChunk/QueueImport. These four jobs were the only ones in src/Jobs that had no fallback.
Previous docblock said 'keeps polling indefinitely' — that is only true when the worker runs with --tries=0 (unlimited). Laravel's default queue:work signature is --tries=1, which would fail AfterImportJob after a single release before dependencies complete. Updated wording describes both cases correctly.
After reviewing the referenced issues in the tracker, the retry-storm failure mode the original commits cited isn't actually the pattern users are reporting. The reports are about PhpSpreadsheet memory inefficiency (SpartnerNL#3962, SpartnerNL#4203), queue worker execution timeouts (SpartnerNL#2596, SpartnerNL#3948), and unrelated architectural issues (SpartnerNL#4328). Users hitting those reports typically want MORE attempts on the Append* jobs, not fewer. Setting $tries=5 on them could regress users whose current queue:work --tries=N gives them headroom. The only case with a genuinely defensible mechanism is AfterImportJob, which uses $this->release($this->interval) to poll for ReadChunk completion. Without a job-level $tries, that loop has no ceiling on workers started with --tries=0. Keeping that fix. Three Append* changes reverted. One file, one property remains.
Expanded from AfterImportJob-only to seven export-pipeline jobs that currently ship with no queue-safety configuration. Each gets $tries=5 with $backoff=[30,60,300] to match the pattern that ReadChunk and QueueImport already use (where they pull from the user's Import class). Patched: - AfterImportJob: $tries=10, no $backoff (its release($interval) polling sets its own timing; adding $backoff would be redundant) - AppendDataToSheet, AppendPaginatedToSheet, AppendQueryToSheet, AppendViewToSheet: $tries=5, $backoff=[30,60,300] - CloseSheet, QueueExport, StoreQueuedExport: same Not touched: - ReadChunk: already pulls $tries, $backoff, $timeout, $maxExceptions from the user's Import class - QueueImport: pulls $tries from Import class and has an empty handle() body, so retries would do nothing anyway Untyped properties used throughout to match the composer.json PHP constraint (^7.0||^8.0) — typed properties require 7.4+.
ReadChunk resolves $tries, $backoff, $timeout, and $maxExceptions from the user's Import class at construction time. If the Import class doesn't set them, these currently resolve to null, meaning the worker falls back to whatever queue:work --tries=N and default backoff (0) are. This changes two of them to explicit fallbacks that match what a fresh Laravel install does: - $tries: null -> 1 (same as queue:work --tries default) - $backoff: null -> [30, 60, 300] (affects spacing only, not count) Effect by scenario: - User sets Import::$tries = 5: unchanged (Import still wins) - User runs queue:work (default --tries=1): unchanged (1 either way) - User runs queue:work --tries=0 without Import setting: now bounded at 1 attempt instead of unlimited retries on failure - User runs queue:work --tries=5 without Import setting: now bounded at 1 instead of 5. Can restore by setting $tries on the Import class. $timeout and $maxExceptions left as null-fallbacks because worker --timeout=60 default is already tight, and $maxExceptions default behaviour is correct when $tries alone acts as the ceiling.
Three unit tests verify the fallback behaviour documented in the PR: - test_readchunk_falls_back_to_tries_one_when_import_has_no_retry_config Import class without $tries -> ReadChunk.tries = 1 - test_readchunk_falls_back_to_exponential_backoff_when_import_has_no_retry_config Import class without $backoff -> ReadChunk.backoff = [30, 60, 300] - test_readchunk_preserves_user_tries_set_on_import Import class with $tries = 7 -> ReadChunk.tries = 7 (user wins) Construct ReadChunk directly with anonymous-class Import instances and mocked IReader/TemporaryFile dependencies to avoid the DB-backed test path needed for the existing queue propagation tests.
The .phpunit.cache/test-results file was inadvertently committed when running the new ReadChunkDefaultsTest locally. Reverting to match origin/3.1. Also removed a stray blank line at the end of QueuedImportTest.php.
StyleCI flagged the original ordering (private helpers before public test methods). Moved the three test methods to the top of the class and left the helpers below, matching the visibility ordering the laravel preset enforces (public -> private). No test logic changed. All 3 tests still pass locally: Tests: 3, Assertions: 3.
Changes the seven single-run jobs from $tries=5 with exponential backoff to $tries=1 with no backoff. Keeps AfterImportJob at $tries=10 because its polling pattern genuinely needs the headroom. Rationale: - $tries=1 matches Laravel's queue:work --tries default. Default-worker deployments see zero behaviour change. - Still caps --tries=0 (Vapor-pattern) deployments at 1 attempt instead of unlimited. - Append*/Close/Queue/Store jobs write to temp files and their idempotency on partial-write failure is not obvious. Fail-fast on the first error avoids any risk of double-written rows or corrupt output. - ReadChunk fallback ($tries ?? 1, $backoff ?? [30, 60, 300]) kept as-is: users who explicitly set $tries on their Import class still get properly-spaced retries via the $backoff fallback. - AfterImportJob keeps $tries=10 because its handle() uses $this->release($interval) to poll for ReadChunk completion. Each release increments the attempt counter; $tries=1 would fail the job on the first release before any dependency completes.
Dropped the $import->backoff fallback to [30, 60, 300] on ReadChunk. If the user sets $tries on their Import class but no $backoff, they now get the framework default (0s between retries) again. If they want spacing they can set it on their Import class. Rationale: this PR's narrative for the seven single-run jobs is 'cap runaway attempts, don't impose retry-spacing opinions.' Keeping a backoff fallback on ReadChunk contradicted that for a narrow case (user opted into tries>1 without setting backoff). Follow-up PR can add it if maintainers want it. Dropped the accompanying test. Two tests remain: - test_readchunk_falls_back_to_tries_one_when_import_has_no_tries - test_readchunk_preserves_user_tries_set_on_import Tests: 2, Assertions: 2.
Removed the long inline rationale; the same information now lives in the PR body where it is easier to maintain. One-line docblock points at the load-bearing fact (each release() counts as an attempt).
The tests verified that PHP's null-coalescing operator returns the fallback when the operand is null and the operand otherwise. That is a language guarantee, not library behaviour. The existing integration tests in QueuedImportTest.php already cover the jobs through real dispatch cycles (see test_can_define_max_exceptions_property_on_queued_import for the pattern). Dropping the unit file to keep the PR focused on the one-line property changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Caps
$trieson the nine queued jobs that currently inherit it fromqueue:work --tries=N. On default-worker deployments (--tries=1) this is a no-op; on--tries=0deployments (e.g.VaporWorkCommand's default) it stops unbounded retry loops.Nine files, ~25 insertions, 1 deletion. No new test file: existing integration tests in
tests/QueuedImportTest.phpandtests/QueuedExportTest.phpalready cover these jobs through full dispatch cycles.$triesAfterImportJobAppendDataToSheetAppendPaginatedToSheetAppendQueryToSheetAppendViewToSheetCloseSheetQueueExportStoreQueuedExportReadChunk$import->tries ?? 1(fallback only)Not touched:
QueueImport(emptyhandle()body, retries accomplish nothing).Untyped
public $tries = 1;used to match thecomposer.jsonPHP constraint^7.0||^8.0.Why these values
$tries = 1on the seven single-run jobs matches Laravel's ownqueue:work --tries=1default. Default deployments see zero behaviour change;--tries=0deployments get capped at 1. If the maintainers prefer a resilience-over-fail-fast tradeoff ($tries = 3with$backoff), happy to switch. My read is that fail-fast is safer because these jobs write chunk data to a shared temp file and idempotency on partial-write failure is not obvious.$tries = 10onAfterImportJobbecause itshandle()uses$this->release($this->interval)to poll forReadChunkdependencies. Each release increments the attempt counter.$tries = 1would fail the job on the first release before any dependency completes. 10 attempts at the 60s default interval caps the polling at ~10 minutes.$tries = $import->tries ?? 1onReadChunkpreserves user-set values on the Import class while falling back to the Laravel default instead ofnull. User settings still win.Verified
php -lclean on all edited files (PHP 8.4.19)composer.jsonPHP constraint^7.0||^8.0respected (untyped properties)laravel+ordered_class_elementsrespectedqueue:work --tries=1default verified onlaravel/framework13.xIlluminate/Queue/Console/WorkCommand.phpsignatureVaporWorkCommand's--tries=0signature is long-standing and well-documented in the wider Laravel queue-safety discussionContext
Part of a Laravel queue-safety audit: https://joshsalway.com/articles/improving-queue-safety-in-laravel.