Summary
After upgrading from 1.0.69 to 1.0.70, later handled activity failures in the same workflow lifecycle stop being recorded in workflow_logs as Workflow\\Exception rows.
I reproduced this in the real durable-workflow/sample-app app with a real queue worker and Redis queue, not with feature tests.
This looks like the same underlying problem described in discussion #372.
What I tested
I used the sample app with a real worker:
php -d opcache.enable_cli=0 artisan queue:work --sleep=1 --tries=1 --timeout=0 -v
php -d opcache.enable_cli=0 artisan app:exception-logging-repro --timeout=90
The repro workflow shape is:
class ExceptionLoggingRetryActivity extends Activity
{
public $tries = 1;
public function execute(string $step): string
{
return match ($step) {
'first' => throw new RuntimeException('first failure from activity'),
'second' => throw new InvalidArgumentException('second failure from activity'),
default => "success on {$step}",
};
}
}
class ExceptionLoggingRetryWorkflow extends Workflow
{
protected int $retryRequests = 0;
#[SignalMethod]
public function requestRetry(): void
{
$this->retryRequests++;
}
public function execute(): Generator
{
$caught = [];
$stage = 0;
while (true) {
try {
$result = yield activity(
ExceptionLoggingRetryActivity::class,
match ($stage) {
0 => 'first',
1 => 'second',
default => 'success',
}
);
return [
'caught' => $caught,
'result' => $result,
];
} catch (Throwable $throwable) {
$caught[] = get_class($throwable).': '.$throwable->getMessage();
$requiredRetries = $stage + 1;
yield await(fn () => $this->retryRequests >= $requiredRetries);
$stage++;
}
}
}
}
The command just starts the workflow, waits 3 seconds, sends requestRetry(), waits another 3 seconds, sends requestRetry() again, then waits for completion.
Expected behavior
The second handled failure happens at a new workflow index, so it should produce a second Workflow\\Exception row in workflow_logs, just like 1.0.69 does.
Actual behavior
On 1.0.69
The workflow completes successfully.
workflow_logs for the run:
[0] Workflow\\Exception
[1] Workflow\\Signal
[2] Workflow\\Exception
[3] Workflow\\Signal
[4] App\\Workflows\\Repro\\ExceptionLoggingRetryActivity
workflow_exceptions for the run:
RuntimeException: first failure from activity
InvalidArgumentException: second failure from activity
On 1.0.70
The workflow gets stuck in WorkflowWaitingStatus.
workflow_logs for the run:
[0] Workflow\\Exception
[1] Workflow\\Signal
workflow_exceptions for the run:
RuntimeException: first failure from activity
InvalidArgumentException: second failure from activity
InvalidArgumentException: second failure from activity
The worker output shows the later Workflow\\Exception jobs being dispatched, but they do not create new replay-log rows. Because index 2 never gets a Workflow\\Exception row, the workflow keeps replaying the same second failing activity on later retry signals.
Why this seems to happen
src/Exception.php in 1.0.70 now does:
if ($this->storedWorkflow->hasLogByIndex($this->index)) {
$workflow->resume();
} elseif (! $this->storedWorkflow->logs()->where('class', self::class)->exists()) {
$workflow->next($this->index, $this->now, self::class, $this->exception);
}
That global exists() check looks fine for suppressing stale sibling exception logs in a parallel fan-out, but it also suppresses later legitimate exceptions at new indexes in the same workflow lifecycle.
Why I think this is a real bug
workflow_exceptions keeps growing, so the later activity failures are definitely happening.
What stops working is the replay log in workflow_logs, which means the workflow cannot deterministically advance past the later failed stage.
So this is not just a visibility/logging issue. In signal-driven manual retry flows, it changes behavior and can leave the workflow stuck replaying the same failing stage.
Related context
Summary
After upgrading from
1.0.69to1.0.70, later handled activity failures in the same workflow lifecycle stop being recorded inworkflow_logsasWorkflow\\Exceptionrows.I reproduced this in the real
durable-workflow/sample-appapp with a real queue worker and Redis queue, not with feature tests.This looks like the same underlying problem described in discussion #372.
What I tested
I used the sample app with a real worker:
The repro workflow shape is:
The command just starts the workflow, waits 3 seconds, sends
requestRetry(), waits another 3 seconds, sendsrequestRetry()again, then waits for completion.Expected behavior
The second handled failure happens at a new workflow index, so it should produce a second
Workflow\\Exceptionrow inworkflow_logs, just like1.0.69does.Actual behavior
On
1.0.69The workflow completes successfully.
workflow_logsfor the run:workflow_exceptionsfor the run:On
1.0.70The workflow gets stuck in
WorkflowWaitingStatus.workflow_logsfor the run:workflow_exceptionsfor the run:The worker output shows the later
Workflow\\Exceptionjobs being dispatched, but they do not create new replay-log rows. Because index2never gets aWorkflow\\Exceptionrow, the workflow keeps replaying the same second failing activity on later retry signals.Why this seems to happen
src/Exception.phpin1.0.70now does:That global
exists()check looks fine for suppressing stale sibling exception logs in a parallel fan-out, but it also suppresses later legitimate exceptions at new indexes in the same workflow lifecycle.Why I think this is a real bug
workflow_exceptionskeeps growing, so the later activity failures are definitely happening.What stops working is the replay log in
workflow_logs, which means the workflow cannot deterministically advance past the later failed stage.So this is not just a visibility/logging issue. In signal-driven manual retry flows, it changes behavior and can leave the workflow stuck replaying the same failing stage.
Related context