Skip to content

Commit 02c8646

Browse files
durable-workflow.github.io: update v2 changes
1 parent ac8c2a6 commit 02c8646

File tree

2 files changed

+64
-2
lines changed

2 files changed

+64
-2
lines changed

docs/constraints/structural-limits.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Structural limits cap the resource consumption of a single workflow run. When an
1919
| `payload_size_bytes` | 2 MiB | Serialized size of a single argument payload |
2020
| `memo_size_bytes` | 256 KiB | Serialized size of non-indexed memo metadata |
2121
| `search_attribute_size_bytes` | 40 KiB | Serialized size of indexed search-attribute metadata |
22+
| `history_transaction_size` | 5,000 | History events produced by a single workflow task execution |
2223

2324
All limits are enforced at the point of scheduling or recording. A value of `0` disables the check for that limit kind.
2425

@@ -39,6 +40,7 @@ Override any limit through `workflows.v2.structural_limits` in your config or vi
3940
'payload_size_bytes' => (int) env('WORKFLOW_V2_LIMIT_PAYLOAD_SIZE_BYTES', 2097152),
4041
'memo_size_bytes' => (int) env('WORKFLOW_V2_LIMIT_MEMO_SIZE_BYTES', 262144),
4142
'search_attribute_size_bytes' => (int) env('WORKFLOW_V2_LIMIT_SEARCH_ATTRIBUTE_SIZE_BYTES', 40960),
43+
'history_transaction_size' => (int) env('WORKFLOW_V2_LIMIT_HISTORY_TRANSACTION_SIZE', 5000),
4244
],
4345
],
4446
```
@@ -101,6 +103,27 @@ activity(ProcessDocumentActivity::class, $ref);
101103

102104
When a workflow upserts memo entries via `upsertMemo()`, the executor merges the new entries into the existing memo map, then JSON-encodes the merged result and checks the byte length against `memo_size_bytes`. If the merged memo exceeds the limit, the run fails before the memo is persisted.
103105

106+
### History transaction size
107+
108+
Each workflow task execution (a single "turn" of replay and forward progress) may produce new history events — activity scheduling, timer creation, side-effect recording, search-attribute upserts, and so on. The `history_transaction_size` limit caps the total number of new events a single task can produce.
109+
110+
This catches runaway loops that create unbounded events in a single task without yielding control:
111+
112+
```php
113+
// If a workflow schedules thousands of operations in one task,
114+
// the history transaction limit prevents the task from growing
115+
// without bound. Process large batches in bounded chunks instead.
116+
foreach (array_chunk($items, 500) as $chunk) {
117+
$calls = [];
118+
foreach ($chunk as $item) {
119+
$calls[] = startActivity(ProcessItemActivity::class, $item);
120+
}
121+
all($calls); // Each chunk is a separate task execution
122+
}
123+
```
124+
125+
The check runs at the top of each iteration of the executor's main loop. Events created during replay (reading existing history) do not count toward the limit — only new events written during the current task contribute.
126+
104127
### Search attribute size
105128

106129
When a workflow upserts search attributes via `upsertSearchAttributes()`, the executor merges the new attributes into the existing set, then JSON-encodes the merged result and checks the byte length against `search_attribute_size_bytes`. If the merged attributes exceed the limit, the run fails before the attributes are persisted.
@@ -133,7 +156,8 @@ The current structural limits configuration is included in the v2 health check s
133156
"command_batch_size": 1000,
134157
"payload_size_bytes": 2097152,
135158
"memo_size_bytes": 262144,
136-
"search_attribute_size_bytes": 40960
159+
"search_attribute_size_bytes": 40960,
160+
"history_transaction_size": 5000
137161
}
138162
}
139163
```

docs/failures-and-recovery.md

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ Every failure recorded by the engine carries a `failure_category` that classifie
8585
| Timeout | `timeout` | Failure caused by a timeout expiration — enforced by the engine when a workflow execution or run deadline passes. |
8686
| Task Failure | `task_failure` | Workflow-task execution failure such as replay errors, determinism violations, or invalid command shapes. |
8787
| Internal | `internal` | Server or infrastructure failure (database, queue, worker crash). |
88-
| Structural Limit | `structural_limit` | Failure caused by exceeding a structural limit (payload size, pending fan-out count, command batch size, metadata size). See [Structural Limits](./constraints/structural-limits.md). |
88+
| Structural Limit | `structural_limit` | Failure caused by exceeding a structural limit (payload size, pending fan-out count, command batch size, metadata size, history transaction size). See [Structural Limits](./constraints/structural-limits.md). |
8989

9090
The category is determined automatically when the failure is recorded:
9191

@@ -146,6 +146,44 @@ The retry task records `retry_of_task_id`, `retry_after_attempt_id`, `retry_afte
146146

147147
`Workflow\Exceptions\NonRetryableExceptionContract` still short-circuits the retry policy: throwing a non-retryable exception fails the activity execution immediately and resumes the workflow with the exception.
148148

149+
## Workflow-Level Retry
150+
151+
Durable Workflow v2 does **not** support automatic workflow-level retry. When a workflow run fails — whether from an unhandled exception, a structural limit, or a timeout — the run is terminal. The engine does not automatically start a new run of the same workflow instance.
152+
153+
This is an intentional design choice:
154+
155+
- **Activities already have retry.** Activity retry policies with configurable `$tries`, `backoff()`, and non-retryable exceptions handle transient failures at the right granularity.
156+
- **Workflow replay is the recovery primitive.** If a workflow task encounters a transient infrastructure failure (database error, worker crash), the durable task system re-dispatches the task, and replay resumes from committed history — no new run needed.
157+
- **Continue-as-new handles long-lived workflows.** Workflows that need fresh state or history compaction use `continueAsNew()` as an explicit workflow-level restart.
158+
- **Repair handles stuck runs.** The `repair()` command and automatic worker-loop repair recover runs where durable task transport was lost.
159+
160+
If your application needs workflow-level retry semantics, model them explicitly:
161+
162+
```php
163+
class RetryableWorkflow extends Workflow
164+
{
165+
public function handle(string $orderId): void
166+
{
167+
try {
168+
activity(ProcessOrderActivity::class, $orderId);
169+
} catch (Throwable $e) {
170+
// Record the failure, then start a new workflow
171+
// for retry-at-workflow-level scenarios.
172+
activity(NotifyFailureActivity::class, $orderId, $e->getMessage());
173+
}
174+
}
175+
}
176+
```
177+
178+
## Reset (Reserved)
179+
180+
The `reset` operation is reserved for a future release. Reset is distinct from `repair`:
181+
182+
- **Repair** recreates missing durable transport (task rows, execution rows) so a stuck run can resume from its committed history. It does not discard any recorded progress.
183+
- **Reset** would discard committed progress beyond a chosen history point and re-execute the workflow from that earlier state. This requires careful handling of already-started activities, child workflows, and timers.
184+
185+
Until reset ships, the supported recovery path for a workflow that has made incorrect progress is: fix the underlying issue, deploy the corrected code, and let replay or repair resume the run. For terminal runs, start a new workflow instance with the corrected logic.
186+
149187
## Task Recovery
150188

151189
The engine separates workflow truth from queue delivery. Durable task rows stay authoritative even if a queue publish is missed or a worker dies after leasing a task.

0 commit comments

Comments
 (0)