Skip to content

Commit fed8ffe

Browse files
GitHub #60: Workflow Phase 3: Timers, schedules, and deterministic time (#529)
1 parent 398b74d commit fed8ffe

4 files changed

Lines changed: 434 additions & 0 deletions

File tree

docs/workflow/plan.md

Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -336,6 +336,196 @@ Continue-as-new lineage entries are deliberately unaffected; the
336336
continued run owns the open children and re-emits the relevant child
337337
history events on its own line.
338338

339+
## Time, Timer, and Schedule Determinism Contract
340+
341+
This section pins the determinism rules for time, timers, timeouts,
342+
and named schedules. It expands the `Timers and sleep`, `Schedules`,
343+
and timeout-bearing rows of the Feature Compatibility Matrix and is
344+
the normative reference for what every implementation must preserve
345+
so that replay is not coupled to ambient wall-clock time.
346+
347+
### Virtual time
348+
349+
Workflow code reads time through `Workflow::now()` (or the
350+
namespaced `Workflow\V2\now()` helper). The reading is durable: it
351+
returns the deterministic event time that the executor seeded onto
352+
the workflow fiber from the latest history event consumed before
353+
the resume, not ambient wall-clock time.
354+
355+
The contract pins:
356+
357+
- The first resume of a run reads `workflow_runs.started_at`.
358+
- After an activity, child, signal, update, side-effect, version,
359+
memo, or search-attribute history event drives a resume, the
360+
fiber-local time advances to that event's `recorded_at`. Timer
361+
resumes use `fired_at`; condition-wait resumes use the resolution
362+
event's `recorded_at`.
363+
- Successful parallel groups advance the fiber-local time to the
364+
latest leaf-completion `recorded_at` so a resume after a fan-out
365+
reads the most recent observable progress on the workflow line.
366+
- Outside a workflow fiber (CLI, tests, server jobs, the matching
367+
role) `Workflow::now()` falls back to wall-clock `now()` so
368+
non-workflow code is not coupled to a fiber-local clock.
369+
- Workflow code MUST NOT call `Carbon::now()`,
370+
`Carbon::setTestNow()`, the global `now()` helper, `microtime()`,
371+
`time()`, `date()`, `new DateTime()`, or any other ambient-clock
372+
primitive. Replay must produce identical decisions regardless of
373+
host wall-clock state.
374+
375+
The exit criterion "replay never depends on ambient `Carbon::now()`"
376+
is verified by `tests/Feature/V2/V2DeterministicTimeReplayTest.php`,
377+
which freezes `Carbon::setTestNow()` to a value far from the
378+
recorded history before replaying the run and asserts the replayed
379+
`Workflow::now()` reading still matches the seeded event time. The
380+
matching live-execution behavior is pinned by
381+
`tests/Feature/V2/V2DeterministicTimeTest.php`, and fiber-local
382+
seeding is pinned by
383+
`tests/Unit/V2/WorkflowFiberContextTimeTest.php`.
384+
385+
### Timer lifecycle and supersession
386+
387+
Timers are durable rows in `workflow_run_timers`/`workflow_run_timer_entries`
388+
with one logical `timer_id` per wait. The lifecycle is:
389+
390+
- **Scheduled.** The run records `TimerScheduled` with the durable
391+
`timer_id`, `delay_seconds`, and `fire_at`. A `workflow_tasks`
392+
row with `task_type=Timer` and `available_at=fire_at` is created.
393+
- **Fired.** When the timer task is dispatched the run records
394+
`TimerFired` with the same `timer_id`, and the fiber resumes at
395+
`fired_at`.
396+
- **Cancelled.** When workflow code cancels a pending timer (scope
397+
rollback, satisfied condition wait, supersession by a later
398+
command) the run records `TimerCancelled` with the same
399+
`timer_id` and the open `workflow_tasks` row is closed.
400+
- **Repaired.** When the watchdog recreates a missing timer task
401+
from typed timer history, the repaired task preserves the
402+
timer's durable `fire_at`. Future timers MUST NOT be snapped to
403+
current wall-clock time.
404+
- **Transport chunking.** `TimerTransportChunker` may split the
405+
scheduling/wake hops for very long sleeps that the queue
406+
transport cannot represent in one entry, but the durable timer
407+
row, the typed history events, and the resume value all keep one
408+
stable `timer_id` end-to-end.
409+
410+
Cancel and terminate commands targeting the run cancel every open
411+
timer, condition timeout, and signal timeout the same way: the run
412+
records `TimerCancelled` (or the corresponding wait-resolution
413+
event), the open `workflow_tasks` row is closed, and the fiber
414+
wakes through the cancellation/termination path rather than the
415+
timer-fired path.
416+
417+
### Activity timeout taxonomy
418+
419+
Every `activity_executions` row carries the four timeout deadlines
420+
as durable columns:
421+
422+
- **`schedule_to_start`** — fails the activity if no worker claims
423+
the task before the deadline. Retriable per the activity's retry
424+
policy.
425+
- **`start_to_close`** (column `close_deadline_at`) — fails the
426+
activity if the running attempt does not return before the
427+
deadline. Retriable.
428+
- **`schedule_to_close`** — fails the activity if the total
429+
scheduling+running time crosses the deadline. NOT retriable; the
430+
enforcer terminates the activity even when the retry policy still
431+
has remaining attempts.
432+
- **`heartbeat`** — fails the running attempt if no heartbeat is
433+
recorded before the deadline. Retriable.
434+
435+
`ActivityTimeoutEnforcer::enforce` is the single durable path that
436+
turns an expired deadline into a typed `ActivityTimedOut` event with
437+
the matching `timeout_kind`. The enforcer is covered by
438+
`tests/Feature/V2/V2ActivityTimeoutTest.php`.
439+
440+
### Workflow execution-timeout, run-timeout, and workflow-task-timeout
441+
442+
Workflow-level timeouts share the same separation:
443+
444+
- **`execution_timeout_seconds`** — limits the total lifetime of a
445+
workflow instance across continue-as-new chains. Breach records
446+
`WorkflowTimedOut` with `timeout_kind=execution_timeout`.
447+
- **`run_timeout_seconds`** — limits a single run's lifetime.
448+
Continue-as-new resets the run-timeout deadline. Breach records
449+
`WorkflowTimedOut` with `timeout_kind=run_timeout`.
450+
- **`workflow_task_timeout_seconds`** — limits how long a single
451+
workflow-task lease may be held by a worker. Lease expiry is
452+
reclaimed by the watchdog and does not record a workflow-level
453+
terminal event; it surfaces as a worker-plane redelivery.
454+
455+
Both `execution_timeout` and `run_timeout` flow through
456+
`WorkflowExecutor::timeoutRun` and call
457+
`ParentClosePolicyEnforcer::enforce` (see the parent-close matrix
458+
above) so child disposition is consistent across timeout kinds.
459+
460+
### Workflow-level retry first-release contract
461+
462+
Top-level workflow runs do NOT retry on failure or timeout in the
463+
first v2 release. A run that records `WorkflowFailed` or
464+
`WorkflowTimedOut` is terminal; restarting a new run is the
465+
operator's or owner's responsibility. This is the explicit
466+
"first-release unsupported" contract: the durable engine does not
467+
expose a top-level workflow retry policy, and adding one is a
468+
contract change that must be reviewed under the rules in
469+
"Changing This Contract" below.
470+
471+
Retry policies remain a first-class authoring surface for
472+
**activities** through `ActivityOptions` (`maxAttempts`, `backoff`,
473+
`nonRetryableErrorTypes`) and for **child workflows** through the
474+
`ChildWorkflowRetryPolicy` snapshot carried on the child-start
475+
command. Both default to no retry unless an explicit policy is set,
476+
matching the V2.0 default above.
477+
478+
### Named schedule lifecycle
479+
480+
`workflow_schedules` is the durable home for named schedules.
481+
`ScheduleManager` exposes the full lifecycle:
482+
483+
- `create` / `createFromSpec` — register a schedule with a cron
484+
expression, timezone, overlap policy, and visibility.
485+
- `pause` / `resume` — toggle `ScheduleStatus::Active` and
486+
`ScheduleStatus::Paused` without losing schedule history.
487+
- `update` — change the cron, timezone, overlap policy, or paused
488+
state with the next-fire time recomputed deterministically.
489+
- `trigger` — start a manual run of the schedule's workflow,
490+
gated by `ScheduleOverlapPolicy`.
491+
- `tick` — two-phase fire evaluation: drain the buffered run
492+
queue first, then evaluate due fires.
493+
- `describe` / `findByScheduleId` — read the schedule's
494+
projection and audit history.
495+
- `delete` — mark the schedule `Deleted`; the row remains for
496+
audit and is filtered out of evaluation.
497+
- `backfill` — replay `computeNextFireAt` between two boundary
498+
instants to mass-trigger missed fires under the overlap policy.
499+
500+
`ScheduleOverlapPolicy::Skip`, `BufferOne`, `AllowAll`,
501+
`CancelOther`, and `TerminateOther` are the five supported
502+
policies. The schedule history events (`ScheduleCreated`,
503+
`SchedulePaused`, `ScheduleResumed`, `ScheduleUpdated`,
504+
`ScheduleTriggered`, `ScheduleTriggerSkipped`, `ScheduleDeleted`)
505+
are the durable audit truth; schedule projections are derived from
506+
those events. Schedule-fire evaluation runs through the
507+
`workflow:v2:schedule-tick` artisan command (and the equivalent
508+
server endpoint), which is covered by
509+
`tests/Feature/V2/V2ScheduleTest.php`.
510+
511+
### Replay tests for timeout helpers and timer ordering
512+
513+
Replay-test coverage is required and pinned for:
514+
515+
- Workflow timer ordering across parallel and sequential branches
516+
(`tests/Feature/V2/V2GoldenHistoryReplayTest.php`,
517+
`tests/Feature/V2/V2WorkflowReplayerTest.php`).
518+
- Activity timeout helpers
519+
(`tests/Feature/V2/V2ActivityTimeoutTest.php`).
520+
- Deterministic time across live execution
521+
(`tests/Feature/V2/V2DeterministicTimeTest.php`) and across
522+
replay (`tests/Feature/V2/V2DeterministicTimeReplayTest.php`).
523+
- Schedule lifecycle
524+
(`tests/Feature/V2/V2ScheduleTest.php`).
525+
526+
Adding a new time-, timer-, or schedule-affecting code path that
527+
does not extend one of these test buckets is a release-blocker.
528+
339529
## Gap Analysis
340530

341531
| Item | Classification | Decision |
Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
<?php
2+
3+
declare(strict_types=1);
4+
5+
namespace Tests\Feature\V2;
6+
7+
use Illuminate\Support\Carbon;
8+
use Illuminate\Support\Facades\Queue;
9+
use Tests\Fixtures\V2\TestReplayDeterministicTimeWorkflow;
10+
use Tests\TestCase;
11+
use Workflow\V2\Enums\HistoryEventType;
12+
use Workflow\V2\Enums\TaskStatus;
13+
use Workflow\V2\Enums\TaskType;
14+
use Workflow\V2\Jobs\RunActivityTask;
15+
use Workflow\V2\Jobs\RunWorkflowTask;
16+
use Workflow\V2\Models\WorkflowHistoryEvent;
17+
use Workflow\V2\Models\WorkflowRun;
18+
use Workflow\V2\Models\WorkflowTask;
19+
use Workflow\V2\Support\WorkflowReplayer;
20+
use Workflow\V2\WorkflowStub;
21+
22+
/**
23+
* Pins the Phase 3 exit criterion: replay never depends on ambient
24+
* Carbon::now(). The test runs a workflow to completion at one frozen
25+
* clock, then replays it under a wildly different ambient clock and
26+
* asserts that `Workflow::now()` (Workflow\V2\now()) reads the
27+
* deterministic event time the executor seeded — not the ambient
28+
* Carbon::setTestNow() value.
29+
*/
30+
final class V2DeterministicTimeReplayTest extends TestCase
31+
{
32+
protected function setUp(): void
33+
{
34+
parent::setUp();
35+
36+
config()
37+
->set('queue.default', 'redis');
38+
config()
39+
->set('queue.connections.redis.driver', 'redis');
40+
config()
41+
->set('workflows.v2.task_dispatch_mode', 'poll');
42+
Queue::fake();
43+
}
44+
45+
public function testReplayReadsSeededEventTimeAndIgnoresAmbientWallClock(): void
46+
{
47+
$startedAt = Carbon::parse('2026-02-01T12:00:00Z');
48+
Carbon::setTestNow($startedAt);
49+
50+
$workflow = WorkflowStub::make(
51+
TestReplayDeterministicTimeWorkflow::class,
52+
'replay-deterministic-time-1',
53+
);
54+
$workflow->start('Taylor');
55+
56+
$this->runReadyTaskOfType(TaskType::Workflow);
57+
58+
$activityRecordedAt = $startedAt->copy()
59+
->addMinutes(5);
60+
Carbon::setTestNow($activityRecordedAt);
61+
62+
$this->runReadyTaskOfType(TaskType::Activity);
63+
$this->runReadyTaskOfType(TaskType::Workflow);
64+
65+
Carbon::setTestNow(null);
66+
67+
$this->assertTrue($workflow->refresh()->completed());
68+
69+
/** @var WorkflowRun $run */
70+
$run = WorkflowRun::query()
71+
->findOrFail($workflow->runId());
72+
73+
/** @var WorkflowHistoryEvent $activityCompleted */
74+
$activityCompleted = WorkflowHistoryEvent::query()
75+
->where('workflow_run_id', $run->id)
76+
->where('event_type', HistoryEventType::ActivityCompleted->value)
77+
->orderBy('sequence')
78+
->firstOrFail();
79+
80+
$expectedStartMs = $run->started_at?->getTimestampMs();
81+
$expectedAfterActivityMs = $activityCompleted->recorded_at?->getTimestampMs();
82+
83+
$this->assertNotNull($expectedStartMs);
84+
$this->assertNotNull($expectedAfterActivityMs);
85+
86+
$ambientFuture = Carbon::parse('2099-12-31T23:59:59Z');
87+
Carbon::setTestNow($ambientFuture);
88+
89+
try {
90+
$state = (new WorkflowReplayer())->replay($run);
91+
} finally {
92+
Carbon::setTestNow(null);
93+
}
94+
95+
$this->assertInstanceOf(TestReplayDeterministicTimeWorkflow::class, $state->workflow);
96+
97+
$this->assertSame(
98+
$expectedStartMs,
99+
$state->workflow->observedStartMs,
100+
'Replay must read the run started_at from history when sampling Workflow::now() at handle entry, even when Carbon::setTestNow() is set far in the future.',
101+
);
102+
103+
$this->assertSame(
104+
$expectedAfterActivityMs,
105+
$state->workflow->observedAfterActivityMs,
106+
'Replay must read the ActivityCompleted recorded_at from history when sampling Workflow::now() after the activity, even when Carbon::setTestNow() is set far in the future.',
107+
);
108+
109+
$this->assertNotSame(
110+
$ambientFuture->getTimestampMs(),
111+
$state->workflow->observedStartMs,
112+
'Replay must not read ambient wall-clock time at handle entry.',
113+
);
114+
115+
$this->assertNotSame(
116+
$ambientFuture->getTimestampMs(),
117+
$state->workflow->observedAfterActivityMs,
118+
'Replay must not read ambient wall-clock time after an activity completes.',
119+
);
120+
}
121+
122+
private function runReadyTaskOfType(TaskType $taskType): void
123+
{
124+
/** @var WorkflowTask|null $task */
125+
$task = WorkflowTask::query()
126+
->where('status', TaskStatus::Ready->value)
127+
->where('task_type', $taskType->value)
128+
->orderBy('created_at')
129+
->first();
130+
131+
if (! $task instanceof WorkflowTask) {
132+
$this->fail("Expected a ready task of type {$taskType->value}.");
133+
}
134+
135+
$job = match ($task->task_type) {
136+
TaskType::Workflow => new RunWorkflowTask($task->id),
137+
TaskType::Activity => new RunActivityTask($task->id),
138+
default => $this->fail("Unsupported task type {$task->task_type->value}."),
139+
};
140+
141+
$this->app->call([$job, 'handle']);
142+
}
143+
}
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
<?php
2+
3+
declare(strict_types=1);
4+
5+
namespace Tests\Fixtures\V2;
6+
7+
use function Workflow\V2\activity;
8+
use function Workflow\V2\now;
9+
10+
use Workflow\V2\Workflow;
11+
12+
/**
13+
* Fixture used by V2DeterministicTimeReplayTest to verify that
14+
* Workflow::now() during replay returns the deterministic event time
15+
* seeded by the executor — not ambient Carbon::now() / wall-clock.
16+
*
17+
* The observed reads are stored on instance properties so the test can
18+
* inspect them on the workflow returned from WorkflowReplayer::replay().
19+
*/
20+
final class TestReplayDeterministicTimeWorkflow extends Workflow
21+
{
22+
public ?int $observedStartMs = null;
23+
24+
public ?int $observedAfterActivityMs = null;
25+
26+
public function handle(string $name): array
27+
{
28+
$timeAtStart = now();
29+
$this->observedStartMs = $timeAtStart->getTimestampMs();
30+
31+
$greeting = activity(TestGreetingActivity::class, $name);
32+
33+
$timeAfterActivity = now();
34+
$this->observedAfterActivityMs = $timeAfterActivity->getTimestampMs();
35+
36+
return [
37+
'greeting' => $greeting,
38+
'time_at_start_ms' => $timeAtStart->getTimestampMs(),
39+
'time_after_activity_ms' => $timeAfterActivity->getTimestampMs(),
40+
];
41+
}
42+
}

0 commit comments

Comments
 (0)