You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a workflow upserts memo entries via `upsertMemo()`, the executor merges the new entries into the existing memo map, then JSON-encodes the merged result and checks the byte length against `memo_size_bytes`. If the merged memo exceeds the limit, the run fails before the memo is persisted.
103
105
106
+
### History transaction size
107
+
108
+
Each workflow task execution (a single "turn" of replay and forward progress) may produce new history events — activity scheduling, timer creation, side-effect recording, search-attribute upserts, and so on. The `history_transaction_size` limit caps the total number of new events a single task can produce.
109
+
110
+
This catches runaway loops that create unbounded events in a single task without yielding control:
111
+
112
+
```php
113
+
// If a workflow schedules thousands of operations in one task,
114
+
// the history transaction limit prevents the task from growing
115
+
// without bound. Process large batches in bounded chunks instead.
all($calls); // Each chunk is a separate task execution
122
+
}
123
+
```
124
+
125
+
The check runs at the top of each iteration of the executor's main loop. Events created during replay (reading existing history) do not count toward the limit — only new events written during the current task contribute.
126
+
104
127
### Search attribute size
105
128
106
129
When a workflow upserts search attributes via `upsertSearchAttributes()`, the executor merges the new attributes into the existing set, then JSON-encodes the merged result and checks the byte length against `search_attribute_size_bytes`. If the merged attributes exceed the limit, the run fails before the attributes are persisted.
@@ -133,7 +156,8 @@ The current structural limits configuration is included in the v2 health check s
Copy file name to clipboardExpand all lines: docs/failures-and-recovery.md
+39-1Lines changed: 39 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -85,7 +85,7 @@ Every failure recorded by the engine carries a `failure_category` that classifie
85
85
| Timeout |`timeout`| Failure caused by a timeout expiration — enforced by the engine when a workflow execution or run deadline passes. |
86
86
| Task Failure |`task_failure`| Workflow-task execution failure such as replay errors, determinism violations, or invalid command shapes. |
87
87
| Internal |`internal`| Server or infrastructure failure (database, queue, worker crash). |
88
-
| Structural Limit |`structural_limit`| Failure caused by exceeding a structural limit (payload size, pending fan-out count, command batch size, metadata size). See [Structural Limits](./constraints/structural-limits.md). |
88
+
| Structural Limit |`structural_limit`| Failure caused by exceeding a structural limit (payload size, pending fan-out count, command batch size, metadata size, history transaction size). See [Structural Limits](./constraints/structural-limits.md). |
89
89
90
90
The category is determined automatically when the failure is recorded:
91
91
@@ -146,6 +146,44 @@ The retry task records `retry_of_task_id`, `retry_after_attempt_id`, `retry_afte
146
146
147
147
`Workflow\Exceptions\NonRetryableExceptionContract` still short-circuits the retry policy: throwing a non-retryable exception fails the activity execution immediately and resumes the workflow with the exception.
148
148
149
+
## Workflow-Level Retry
150
+
151
+
Durable Workflow v2 does **not** support automatic workflow-level retry. When a workflow run fails — whether from an unhandled exception, a structural limit, or a timeout — the run is terminal. The engine does not automatically start a new run of the same workflow instance.
152
+
153
+
This is an intentional design choice:
154
+
155
+
-**Activities already have retry.** Activity retry policies with configurable `$tries`, `backoff()`, and non-retryable exceptions handle transient failures at the right granularity.
156
+
-**Workflow replay is the recovery primitive.** If a workflow task encounters a transient infrastructure failure (database error, worker crash), the durable task system re-dispatches the task, and replay resumes from committed history — no new run needed.
157
+
-**Continue-as-new handles long-lived workflows.** Workflows that need fresh state or history compaction use `continueAsNew()` as an explicit workflow-level restart.
158
+
-**Repair handles stuck runs.** The `repair()` command and automatic worker-loop repair recover runs where durable task transport was lost.
159
+
160
+
If your application needs workflow-level retry semantics, model them explicitly:
The `reset` operation is reserved for a future release. Reset is distinct from `repair`:
181
+
182
+
-**Repair** recreates missing durable transport (task rows, execution rows) so a stuck run can resume from its committed history. It does not discard any recorded progress.
183
+
-**Reset** would discard committed progress beyond a chosen history point and re-execute the workflow from that earlier state. This requires careful handling of already-started activities, child workflows, and timers.
184
+
185
+
Until reset ships, the supported recovery path for a workflow that has made incorrect progress is: fix the underlying issue, deploy the corrected code, and let replay or repair resume the run. For terminal runs, start a new workflow instance with the corrected logic.
186
+
149
187
## Task Recovery
150
188
151
189
The engine separates workflow truth from queue delivery. Durable task rows stay authoritative even if a queue publish is missed or a worker dies after leasing a task.
0 commit comments