|
| 1 | +# History Budget |
| 2 | + |
| 3 | +Workflow runs accumulate history events across activities, timers, signals, |
| 4 | +updates, side effects, child workflows, and message cursor advances. Without |
| 5 | +explicit budgets, long-running runs can grow until replay cost or persistence |
| 6 | +limits cause failures that are hard to diagnose and impossible to retroactively |
| 7 | +fix on a single run. The history budget contract gives operators and workflow |
| 8 | +code an inspectable signal long before that boundary is reached. |
| 9 | + |
| 10 | +## Dimensions |
| 11 | + |
| 12 | +The budget is computed across three dimensions, each with a soft (warning) |
| 13 | +threshold and a hard (continue-as-new) threshold: |
| 14 | + |
| 15 | +| Dimension | Counter | Default soft | Default hard | Config root | |
| 16 | +| --- | --- | --- | --- | --- | |
| 17 | +| Event count | `history_event_count` — number of `workflow_history_events` rows | 8 000 | 10 000 | `workflows.v2.history_budget.event_warning_threshold` / `.continue_as_new_event_threshold` | |
| 18 | +| Payload size | `history_size_bytes` — serialized event-type + JSON payload size | 4 MiB | 5 MiB | `.size_bytes_warning_threshold` / `.continue_as_new_size_bytes_threshold` | |
| 19 | +| Fan-out | `history_fan_out` — largest `parallel_group_size` recorded in any parallel group | 160 | 200 | `.fan_out_warning_threshold` / `.continue_as_new_fan_out_threshold` | |
| 20 | + |
| 21 | +Setting any threshold to `0` disables that dimension; warning thresholds clamp |
| 22 | +to the corresponding hard threshold so a misconfigured warning cannot fire |
| 23 | +after continue-as-new is already recommended. |
| 24 | + |
| 25 | +## Pressure indicator |
| 26 | + |
| 27 | +Each run has a derived `history_budget_pressure` value with three states: |
| 28 | + |
| 29 | +- `ok` — every dimension is below its soft threshold. |
| 30 | +- `approaching` — at least one dimension is at or above its soft threshold, |
| 31 | + but no dimension has crossed its hard threshold. |
| 32 | +- `continue_as_new_recommended` — at least one dimension is at or above its |
| 33 | + hard threshold. `continue_as_new_recommended=true` is also surfaced as a |
| 34 | + boolean for backward compatibility. |
| 35 | + |
| 36 | +The pressure value is computed from the same counters that drive |
| 37 | +`continue_as_new_recommended`, so operators see the same authoritative signal |
| 38 | +across waterline, the run detail view, and operator metrics. |
| 39 | + |
| 40 | +## Surfaces |
| 41 | + |
| 42 | +- `Workflow::historyLength()`, `historySize()`, `historyFanOut()`, |
| 43 | + `historyBudgetPressure()`, and `shouldContinueAsNew()` are advisory |
| 44 | + authoring signals exposed on the v2 workflow base class. |
| 45 | +- `WorkflowRunSummary` persists `history_event_count`, `history_size_bytes`, |
| 46 | + `history_fan_out`, `continue_as_new_recommended`, and |
| 47 | + `history_budget_pressure`. `RunListItemView` and `RunDetailView` project |
| 48 | + these directly. |
| 49 | +- `RunDetailView` additionally returns the active soft and hard thresholds |
| 50 | + (`history_event_threshold`, `history_event_warning_threshold`, |
| 51 | + `history_size_bytes_threshold`, `history_size_bytes_warning_threshold`, |
| 52 | + `history_fan_out_threshold`, `history_fan_out_warning_threshold`) and the |
| 53 | + list of dimensions that triggered the current pressure |
| 54 | + (`history_budget_pressure_dimensions`) so operators can explain *why* a run |
| 55 | + is approaching the boundary. |
| 56 | +- `OperatorMetrics::history` reports |
| 57 | + `continue_as_new_recommended_runs`, `approaching_budget_runs`, |
| 58 | + `max_event_count`, `max_size_bytes`, `max_fan_out`, and the configured |
| 59 | + thresholds for each dimension. |
| 60 | + |
| 61 | +## Replay-safety |
| 62 | + |
| 63 | +Counters come straight from frozen history-event payloads. Fan-out is derived |
| 64 | +by taking the maximum `parallel_group_size` across distinct |
| 65 | +`parallel_group_id` values recorded in the run's history events; the value is |
| 66 | +deterministic for a given history and re-derives correctly on any replay. |
| 67 | +Workflow code that branches on `historyBudgetPressure()` or |
| 68 | +`shouldContinueAsNew()` therefore reaches the same decision on the original |
| 69 | +attempt and on every subsequent replay. |
| 70 | + |
| 71 | +## What this contract does *not* cover |
| 72 | + |
| 73 | +- Snapshot or history compaction is intentionally out of scope for the first |
| 74 | + release. The budget contract ships first so archive can land on top of an |
| 75 | + inspectable correctness signal before any compaction protocol is committed. |
| 76 | +- Reset semantics (truncating history at a chosen sequence) remain a reserved |
| 77 | + operator command and are tracked separately in the v2 plan. |
0 commit comments