Skip to content

Commit e2498b8

Browse files
Fail closed incompatible workflow starts (#354)
1 parent 6c5086a commit e2498b8

2 files changed

Lines changed: 28 additions & 0 deletions

File tree

docs/architecture/rollout-safety.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,31 @@ Guarantees:
229229
Scopes with no heartbeats yet and runs without a required
230230
marker fall back to normal dispatch so the first worker
231231
heartbeat never races an incoming dispatch.
232+
- `Workflow\V2\Support\WorkflowStartGate` is the authority on
233+
per-call admission for new workflow runs under
234+
`DW_V2_FLEET_VALIDATION_MODE=fail`. It refuses to admit a start
235+
when the run's resolved connection/queue has at least one
236+
active worker heartbeat but none of them advertise the
237+
`WorkerCompatibility::current()` marker the producer is about
238+
to write onto the run. The start callers
239+
(`Workflow\V2\WorkflowStub::attemptStart`,
240+
`Workflow\V2\WorkflowStub::attemptSignalWithStart`, and
241+
`Workflow\V2\Support\DefaultWorkflowControlPlane::startWorkflow`)
242+
consult the gate inside the same transaction that would have
243+
created the run, persist a rejected `WorkflowCommand` carrying
244+
`CommandOutcome::RejectedCompatibilityBlocked` with
245+
`rejection_reason = compatibility_blocked`, and surface the
246+
refusal as
247+
`Workflow\V2\Exceptions\WorkflowExecutionUnavailableException`
248+
with `blockedReason() = compatibility_blocked` so
249+
`Workflow\V2\Support\ScheduleManager` can record a `skipped`
250+
trigger without creating a run. Scopes with no heartbeats yet
251+
and producers with no required marker fall back to normal
252+
start admission so the first worker heartbeat never races an
253+
incoming start. The rejection is observable through the
254+
`runs.compatibility_blocked` and
255+
`backlog.oldest_compatibility_blocked_started_at` keys on
256+
`OperatorMetrics::snapshot()`.
232257
- Operators see mixed-build state explicitly through
233258
`workers.active_worker_scopes` (how many distinct
234259
namespace/queue/compatibility tuples are live) and through the

tests/Unit/V2/RolloutSafetyDocumentationTest.php

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,9 @@ final class RolloutSafetyDocumentationTest extends TestCase
8484
'StructuralLimits',
8585
'WorkerProtocolVersionResolver',
8686
'ControlPlaneVersionResolver',
87+
'WorkflowStartGate',
88+
'WorkflowExecutionUnavailableException',
89+
'ScheduleManager',
8790
];
8891

8992
private const REQUIRED_HTTP_ROUTES = [

0 commit comments

Comments
 (0)