durable-workflow
diff --git a/‎docs/architecture/authoring-definition-boundary.md‎
Lines changed: 12 additions & 3 deletions b/‎docs/architecture/authoring-definition-boundary.md‎
Lines changed: 12 additions & 3 deletions
diff --git a/‎docs/architecture/history-budget.md‎
Lines changed: 77 additions & 0 deletions b/‎docs/architecture/history-budget.md‎
Lines changed: 77 additions & 0 deletions
diff --git a/‎docs/workflow/plan.md‎
Lines changed: 4 additions & 0 deletions b/‎docs/workflow/plan.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎src/V2/Models/WorkflowRunSummary.php‎
Lines changed: 1 addition & 0 deletions b/‎src/V2/Models/WorkflowRunSummary.php‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎src/V2/Support/HealthCheck.php‎
Lines changed: 55 additions & 16 deletions b/‎src/V2/Support/HealthCheck.php‎
Lines changed: 55 additions & 16 deletions
@@ -94,9 +94,18 @@ findings from historical definition drift.
 Workflow code may observe its current history budget without reading storage
 internals. `historyLength()` returns the current history event count,
 `historySize()` returns the serialized history size estimate, and
-`shouldContinueAsNew()` returns the continue-as-new suggestion flag. These are
-advisory authoring signals; workflow code still chooses when to call
-`continueAsNew()`.
+`historyFanOut()` returns the largest parallel-group breadth recorded in this
+run's history. `shouldContinueAsNew()` returns the continue-as-new suggestion
+flag, which is true when any dimension reaches its hard threshold.
+
+The budget is reported as a three-state pressure indicator:
+`historyBudgetPressure()` returns `ok`, `approaching`, or
+`continue_as_new_recommended`. Each dimension (event count, payload size,
+fan-out) has a soft (warning) threshold and a hard (continue-as-new) threshold.
+Reaching any soft threshold flips pressure to `approaching`; reaching any hard
+threshold flips it to `continue_as_new_recommended` and sets
+`shouldContinueAsNew()` to true. These are advisory authoring signals;
+workflow code still chooses when to call `continueAsNew()`.
 
 ## Activity idempotency surface
 
 
@@ -0,0 +1,77 @@
+# History Budget
+
+Workflow runs accumulate history events across activities, timers, signals,
+updates, side effects, child workflows, and message cursor advances. Without
+explicit budgets, long-running runs can grow until replay cost or persistence
+limits cause failures that are hard to diagnose and impossible to retroactively
+fix on a single run. The history budget contract gives operators and workflow
+code an inspectable signal long before that boundary is reached.
+
+## Dimensions
+
+The budget is computed across three dimensions, each with a soft (warning)
+threshold and a hard (continue-as-new) threshold:
+
+| Dimension | Counter | Default soft | Default hard | Config root |
+| --- | --- | --- | --- | --- |
+| Event count | `history_event_count` — number of `workflow_history_events` rows | 8 000 | 10 000 | `workflows.v2.history_budget.event_warning_threshold` / `.continue_as_new_event_threshold` |
+| Payload size | `history_size_bytes` — serialized event-type + JSON payload size | 4 MiB | 5 MiB | `.size_bytes_warning_threshold` / `.continue_as_new_size_bytes_threshold` |
+| Fan-out | `history_fan_out` — largest `parallel_group_size` recorded in any parallel group | 160 | 200 | `.fan_out_warning_threshold` / `.continue_as_new_fan_out_threshold` |
+
+Setting any threshold to `0` disables that dimension; warning thresholds clamp
+to the corresponding hard threshold so a misconfigured warning cannot fire
+after continue-as-new is already recommended.
+
+## Pressure indicator
+
+Each run has a derived `history_budget_pressure` value with three states:
+
+- `ok` — every dimension is below its soft threshold.
+- `approaching` — at least one dimension is at or above its soft threshold,
+  but no dimension has crossed its hard threshold.
+- `continue_as_new_recommended` — at least one dimension is at or above its
+  hard threshold. `continue_as_new_recommended=true` is also surfaced as a
+  boolean for backward compatibility.
+
+The pressure value is computed from the same counters that drive
+`continue_as_new_recommended`, so operators see the same authoritative signal
+across waterline, the run detail view, and operator metrics.
+
+## Surfaces
+
+- `Workflow::historyLength()`, `historySize()`, `historyFanOut()`,
+  `historyBudgetPressure()`, and `shouldContinueAsNew()` are advisory
+  authoring signals exposed on the v2 workflow base class.
+- `WorkflowRunSummary` persists `history_event_count`, `history_size_bytes`,
+  `history_fan_out`, `continue_as_new_recommended`, and
+  `history_budget_pressure`. `RunListItemView` and `RunDetailView` project
+  these directly.
+- `RunDetailView` additionally returns the active soft and hard thresholds
+  (`history_event_threshold`, `history_event_warning_threshold`,
+  `history_size_bytes_threshold`, `history_size_bytes_warning_threshold`,
+  `history_fan_out_threshold`, `history_fan_out_warning_threshold`) and the
+  list of dimensions that triggered the current pressure
+  (`history_budget_pressure_dimensions`) so operators can explain *why* a run
+  is approaching the boundary.
+- `OperatorMetrics::history` reports
+  `continue_as_new_recommended_runs`, `approaching_budget_runs`,
+  `max_event_count`, `max_size_bytes`, `max_fan_out`, and the configured
+  thresholds for each dimension.
+
+## Replay-safety
+
+Counters come straight from frozen history-event payloads. Fan-out is derived
+by taking the maximum `parallel_group_size` across distinct
+`parallel_group_id` values recorded in the run's history events; the value is
+deterministic for a given history and re-derives correctly on any replay.
+Workflow code that branches on `historyBudgetPressure()` or
+`shouldContinueAsNew()` therefore reaches the same decision on the original
+attempt and on every subsequent replay.
+
+## What this contract does *not* cover
+
+- Snapshot or history compaction is intentionally out of scope for the first
+  release. The budget contract ships first so archive can land on top of an
+  inspectable correctness signal before any compaction protocol is committed.
+- Reset semantics (truncating history at a chosen sequence) remain a reserved
+  operator command and are tracked separately in the v2 plan.
@@ -275,6 +275,10 @@ explicitly reserved for a future contract before support is advertised.
   defines first-class deployment lifecycle and rollout blockage.
 - [`docs/architecture/sticky-execution.md`](../architecture/sticky-execution.md)
   defines sticky replay-cache routing and cold-replay fallback.
+- [`docs/architecture/history-budget.md`](../architecture/history-budget.md)
+  defines the soft and hard thresholds for history event count, payload
+  size, and parallel fan-out, and the `pressure` indicator that drives
+  `continue_as_new_recommended`.
 - [`docs/architecture/workflow-service-calls-architecture.md`](../architecture/workflow-service-calls-architecture.md)
   defines cross-namespace service-call lifecycle and outcome semantics.
 - [`docs/architecture/cross-namespace-service-policy.md`](../architecture/cross-namespace-service-policy.md)
 
@@ -42,6 +42,7 @@ class WorkflowRunSummary extends Model
         'projection_schema_version' => 'integer',
         'history_event_count' => 'integer',
         'history_size_bytes' => 'integer',
+        'history_fan_out' => 'integer',
         'continue_as_new_recommended' => 'bool',
         'started_at' => 'datetime',
         'sort_timestamp' => 'datetime',
 
@@ -5,8 +5,6 @@
 namespace Workflow\V2\Support;
 
 use Carbon\CarbonInterface;
-use Illuminate\Contracts\Cache\Repository as CacheRepository;
-use Illuminate\Support\Facades\App;
 
 final class HealthCheck
 {
@@ -577,21 +575,27 @@ private static function longPollWakeAccelerationCheck(): array
             'reason' => null,
         ];
 
-        $cache = self::resolveCacheRepository();
+        $defaultStore = self::configuredDefaultCacheStore();
+        $configuredDriver = self::configuredCacheDriver($defaultStore);
+
+        if ($configuredDriver === null) {
+            $data['backend'] = null;
+            $data['capable'] = false;
+            $data['safe'] = false;
+            $data['reason'] = 'Cache backend is not resolvable; wake acceleration may be disabled. Durable discovery continues via bounded polling.';
 
-        if ($cache === null) {
             return self::check(
                 'long_poll_wake_acceleration',
                 'warning',
-                'Cache repository is not resolvable; wake acceleration may be disabled. Durable discovery continues via bounded polling.',
+                $data['reason'],
                 self::CATEGORY_ACCELERATION,
                 $data,
             );
         }
 
         $validator = new LongPollCacheValidator();
-        $capability = $validator->validateMultiNodeCapable($cache);
-        $safety = $validator->checkMultiNodeSafety($cache, $multiNode);
+        $capability = $validator->validateMultiNodeCapableFromDriver($configuredDriver);
+        $safety = $validator->checkMultiNodeSafetyFromDriver($configuredDriver, $multiNode);
 
         $data['backend'] = is_string($capability['backend'] ?? null) ? $capability['backend'] : null;
         $data['capable'] = (bool) ($capability['capable'] ?? false);
@@ -621,18 +625,53 @@ private static function longPollWakeAccelerationCheck(): array
         );
     }
 
-    private static function resolveCacheRepository(): ?CacheRepository
+    /**
+     * Read the currently configured default cache store name. The check
+     * deliberately reads `cache.default` (with a fall-through to the older
+     * `cache.driver` alias) every snapshot so operator-visible config is
+     * the source of truth, not a previously-resolved store memoized in the
+     * cache manager.
+     */
+    private static function configuredDefaultCacheStore(): ?string
     {
-        try {
-            // Resolve through the CacheManager so the check reflects the
-            // currently configured default store. The cache.store container
-            // singleton is bound on first access and does not reflect later
-            // changes to cache.default, which drifts from the advertised
-            // backend when operators reconfigure cache at runtime.
-            return App::make('cache')->store();
-        } catch (\Throwable) {
+        $value = config('cache.default') ?? config('cache.driver');
+
+        if (! is_string($value)) {
             return null;
         }
+
+        $normalized = trim($value);
+
+        return $normalized === '' ? null : $normalized;
+    }
+
+    /**
+     * Resolve the driver name configured for the given default cache store.
+     * Falls back to the store name itself when the driver entry is missing
+     * (Laravel's CacheManager does the same — the store key is the driver
+     * name when no explicit `driver` is configured).
+     */
+    private static function configuredCacheDriver(?string $store): ?string
+    {
+        if ($store === null) {
+            return null;
+        }
+
+        $driver = config(sprintf('cache.stores.%s.driver', $store));
+
+        if (is_string($driver)) {
+            $normalized = trim($driver);
+
+            if ($normalized !== '') {
+                return $normalized;
+            }
+        }
+
+        if (config(sprintf('cache.stores.%s', $store)) !== null) {
+            return $store;
+        }
+
+        return null;
     }
 
     /**