diff --git a/docs/content/deep-dives/compiler-magic-swc-plugin-blog.mdx b/docs/content/deep-dives/compiler-magic-swc-plugin-blog.mdx
new file mode 100644
index 0000000000..80a314d9d7
--- /dev/null
+++ b/docs/content/deep-dives/compiler-magic-swc-plugin-blog.mdx
@@ -0,0 +1,340 @@
+---
+title: "One File, Three Bundles — How the Workflow DevKit Compiler Splits Your Code"
+description: A developer-facing deep dive into the SWC compiler plugin that transforms a single workflow source file into three execution targets — step, workflow, and client — with stable IDs, dead code elimination, and build-time validation.
+type: conceptual
+summary: The Workflow DevKit SWC plugin takes a single source file containing directives and produces three outputs. Step mode preserves function bodies for execution. Workflow mode replaces step bodies with WORKFLOW_USE_STEP proxy calls for deterministic replay. Client mode attaches workflow IDs while preventing direct execution. Stable IDs are derived from module path and function name. A dead code elimination pass removes unreachable imports after body replacement.
+prerequisites:
+ - /docs/foundations/workflows-and-steps
+related:
+ - /docs/how-it-works/code-transform
+ - /deep-dives/step-execution-model-reference
+ - /deep-dives/durability-replay-reference
+ - /deep-dives/durable-streaming-reference
+---
+
+Most durable execution frameworks ask you to do one of two things: either write your code in a special DSL that the runtime understands, or manually separate your orchestration logic from your side effects into different files with explicit registrations.
+
+Workflow DevKit takes a different approach. You write one file. You add directives — `"use step"` and `"use workflow"` — to your functions. The compiler does the rest.
+
+Behind those two simple strings is an SWC compiler plugin that produces three distinct JavaScript bundles from the same source, each serving a different execution context. This article walks through exactly how that transformation works, what each mode produces, and why the design decisions matter.
+
+Here's the example we'll follow throughout this article — a user signup workflow that creates a user, sends emails, waits for a webhook, and returns:
+
+```ts
+export async function handleUserSignup(email: string) {
+ 'use workflow';
+
+ const user = await createUser(email);
+ await sendWelcomeEmail(user);
+ await sleep('5s');
+ const webhook = createWebhook();
+ await sendOnboardingEmail(user, webhook.url);
+ await webhook;
+ console.log('Webhook Resolved');
+ return { userId: user.id, status: 'onboarded' };
+}
+
+async function createUser(email: string) {
+ 'use step';
+ console.log(`Creating a new user with email: ${email}`);
+ return { id: crypto.randomUUID(), email };
+}
+```
+
+This single file contains both the orchestration logic (`handleUserSignup`) and a side-effecting step (`createUser`). The compiler splits them apart.
+
+## The Three Modes
+
+The plugin runs three separate passes over each workflow file:
+
+| Mode | Who consumes it | What happens to step bodies | What happens to workflow bodies |
+|------|----------------|-----------------------------|-------------------------------|
+| **Step** | Step execution runtime | Preserved intact | Replaced with error throws |
+| **Workflow** | Sandboxed VM | Replaced with `WORKFLOW_USE_STEP` proxy calls | Preserved intact |
+| **Client** | Your application code | Preserved (with `stepId` attached) | Replaced with error throws |
+
+Each mode answers a different question:
+- Step mode: "What code should run when a step executes?"
+- Workflow mode: "What code should run inside the deterministic replay VM?"
+- Client mode: "How does application code reference workflows without executing them?"
+
+Let's see what each mode produces from the `handleUserSignup` source.
+
+## Step Mode: Preserve and Register
+
+Step mode keeps step function bodies exactly as written — they need full Node.js access to call APIs, query databases, and perform side effects. The plugin strips the directive, appends a `registerStepFunction()` call, and replaces workflow bodies with error stubs:
+
+```ts
+// Step mode output
+import { registerStepFunction } from "workflow/internal/private";
+
+export async function createUser(email) {
+ // Body preserved exactly as written
+ return { id: crypto.randomUUID(), email };
+}
+registerStepFunction("step//./workflows/user//createUser", createUser);
+
+// Workflow functions throw — they belong in the flow bundle
+export async function handleUserSignup(email) {
+ throw new Error(
+ "You attempted to execute workflow handleUserSignup function directly. " +
+ "To start a workflow, use start(handleUserSignup) from workflow/api"
+ );
+}
+handleUserSignup.workflowId = "workflow//./workflows/user//handleUserSignup";
+```
+
+The `registerStepFunction` call links the step ID to its implementation so the step execution runtime can look it up when a queue message arrives. The step ID is the same in all three modes — we'll see how it's generated shortly.
+
+## Workflow Mode: The Central Transformation
+
+This is where the magic happens. Workflow mode replaces step function bodies with proxy calls through `globalThis[Symbol.for("WORKFLOW_USE_STEP")]` — a well-known symbol that the runtime binds to the `useStep` function inside the sandboxed VM:
+
+```ts
+// Workflow mode output
+export var createUser = globalThis[Symbol.for("WORKFLOW_USE_STEP")](
+ "step//./workflows/user//createUser"
+);
+
+// Workflow body preserved — deterministic orchestration
+export async function handleUserSignup(email) {
+ const user = await createUser(email);
+ return { userId: user.id };
+}
+handleUserSignup.workflowId = "workflow//./workflows/user//handleUserSignup";
+globalThis.__private_workflows.set(
+ "workflow//./workflows/user//handleUserSignup",
+ handleUserSignup
+);
+```
+
+When the workflow VM calls `await createUser(email)`, it's actually calling through the `WORKFLOW_USE_STEP` proxy. That proxy consults the `EventsConsumer`: if a matching `step_completed` event exists in the log, it returns the cached result. If not, it adds the step to the pending invocations queue and throws a `WorkflowSuspension` to exit the VM.
+
+The transformation is why workflows are deterministic. The original step body — the one that calls `crypto.randomUUID()` — is gone. In its place is a proxy that returns cached results from the **Event log**. Same inputs, same outputs, every replay.
+
+
+The **Workflow bundle** omits step bodies because replay must never re-run side effects. Deterministic randomness and fixed time already exist inside the VM; side-effect isolation is the real reason steps stay out of the **Workflow bundle**.
+
+
+The `__private_workflows.set()` call registers the workflow function so the runtime can find it by ID when a **Queue message** arrives.
+
+### What about the runtime side?
+
+The `WORKFLOW_USE_STEP` symbol is injected in `packages/core/src/workflow.ts`:
+
+```ts
+// From packages/core/src/workflow.ts
+const useStep = createUseStep(workflowContext);
+// @ts-expect-error — `@types/node` says symbol is not valid, but it does work
+vmGlobalThis[WORKFLOW_USE_STEP] = useStep;
+```
+
+The `createUseStep` function in `packages/core/src/step.ts` returns a factory that, given a step ID, produces a function. That function subscribes to the `EventsConsumer`, checks for cached results, and either resolves the promise or triggers suspension.
+
+## Client Mode: Safe References
+
+Client mode ensures application code can reference workflows (to pass to `start()`) without accidentally executing them:
+
+```ts
+// Client mode output
+export async function handleUserSignup(email) {
+ throw new Error(
+ "You attempted to execute workflow handleUserSignup function directly. " +
+ "To start a workflow, use start(handleUserSignup) from workflow/api"
+ );
+}
+handleUserSignup.workflowId = "workflow//./workflows/user//handleUserSignup";
+```
+
+The `workflowId` property is what `start()` reads to know which workflow to launch. The error stub prevents the common mistake of calling `await handleUserSignup(email)` directly — which would bypass the entire durable execution model.
+
+Step functions in client mode keep their bodies intact (for local testing) and have a `stepId` property attached directly. Unlike step mode, client mode doesn't import `registerStepFunction` — that module has server-side dependencies that shouldn't appear in client bundles.
+
+## Stable ID Generation
+
+The compiler generates deterministic IDs from two inputs: the module path and the function name. The pattern is:
+
+```
+{type}//{modulePath}//{identifier}
+```
+
+Where:
+- **type** is `workflow`, `step`, or `class`
+- **modulePath** is a relative path prefixed with `./` (e.g., `./workflows/user-signup`) — file extensions are stripped. When a module specifier with version is configured, it uses that instead (e.g., `@myorg/tasks@2.0.0`)
+- **identifier** is the function name, with nested functions using `/` separators and class members using `.` (static) or `#` (instance)
+
+Examples:
+```
+workflow//./workflows/user-signup//handleUserSignup
+step//./workflows/user-signup//createUser
+step//./src/jobs/order//processOrder/innerStep (nested step)
+step//./src/jobs/order//MyClass.staticMethod (static method)
+step//./src/jobs/order//MyClass#instanceMethod (instance method)
+```
+
+IDs are stable across builds and deployments — they only change if you rename a file or function. The same ID is generated in all three modes, so step mode's `registerStepFunction()`, workflow mode's `WORKFLOW_USE_STEP` proxy, and client mode's `workflowId`/`stepId` all agree on the identity of each function.
+
+This stability is what makes durable execution work across deploys. A workflow that suspended before a deploy resumes with new code, but the step IDs in the event log still match the new code's step registrations.
+
+## Nested Steps and Closures
+
+Steps defined inside workflow functions get special treatment. Consider:
+
+```ts
+export async function myWorkflow(config: Config) {
+ "use workflow";
+ let count = 0;
+
+ async function increment() {
+ "use step";
+ count++;
+ return count;
+ }
+
+ return await increment();
+}
+```
+
+In workflow mode, the nested step becomes an inline `WORKFLOW_USE_STEP` call with a closure function:
+
+```ts
+// Workflow mode — nested step with closure
+export async function myWorkflow(config) {
+ let count = 0;
+
+ var increment = globalThis[Symbol.for("WORKFLOW_USE_STEP")](
+ "step//./input//myWorkflow/increment",
+ () => ({ count }) // Closure variables serialized at call time
+ );
+
+ return await increment();
+}
+```
+
+The second argument to `WORKFLOW_USE_STEP` captures the variables the nested step needs. On the step side, the hoisted function retrieves these via `__private_getClosureVars()`. The ID includes the parent function name: `myWorkflow/increment`.
+
+Steps can also nest inside deeply nested object properties. The plugin recursively walks the AST, generating compound path-based IDs like `step//./input//vade/tools/VercelRequest/execute` and hoisting the functions with `$`-separated variable names.
+
+## Dead Code Elimination
+
+After replacing step bodies in workflow mode (or workflow bodies in client mode), the original code's imports and helpers become orphaned. A helper function only called from a step body is now unreachable — the step body was replaced with a proxy call.
+
+The plugin runs a DCE (dead code elimination) pass that removes these unreachable references:
+
+```mermaid
+flowchart TD
+ A["Source: imports axios, calls it in step body"] --> B["Workflow mode: step body → WORKFLOW_USE_STEP proxy"]
+ B --> C["axios import is now unreferenced"]
+ C --> D["DCE pass removes axios import"]
+ D --> E["Workflow bundle doesn't include axios"]
+```
+
+This matters for bundle size and correctness. Without DCE, the workflow bundle would try to import server-side dependencies like database drivers or HTTP clients — modules that don't belong in the sandboxed VM. The DCE pass ensures that only code reachable from the surviving function bodies makes it into the final bundle.
+
+## Build-Time Validation
+
+The plugin validates directive usage during compilation, catching mistakes before deployment:
+
+| Error | What it catches |
+|-------|----------------|
+| Non-async function | `"use step"` or `"use workflow"` on a synchronous function |
+| Instance methods with `"use workflow"` | Only static methods can be workflows |
+| Misplaced directive | Directive not at the top of the file or start of the function body |
+| Conflicting directives | Both `"use step"` and `"use workflow"` at module level |
+| Invalid exports | Module-level directive files can only export async functions |
+| Misspelled directive | Catches typos like `"use steps"` or `"use workflows"` |
+
+These are build-time errors, not runtime surprises. If you misspell a directive, you find out during `next build`, not at 2 AM when the production workflow fails silently.
+
+## Before and After: Manual Separation vs. Compiler-Driven
+
+Consider a traditional approach where the developer must manually maintain the separation:
+
+**Manual separation:**
+- Step functions live in one file, workflow logic in another, client references in a third
+- Step IDs are manually defined strings — typos cause silent failures at runtime
+- Renaming a function requires updating the ID in every file that references it
+- Imports must be manually managed to avoid pulling server-side code into client bundles
+- No build-time validation — misconfigurations surface as runtime errors
+
+**Compiler-driven transformation:**
+- One file. Two directives. Three bundles generated automatically
+- IDs are derived from file path and function name — deterministic, stable, no manual management
+- Rename a function and the ID changes everywhere in the next build
+- DCE eliminates orphaned imports from replaced function bodies
+- Build-time validation catches structural errors before deployment
+
+The compiler is the mechanism that makes the programming model feel like writing normal JavaScript while providing the runtime guarantees that durable execution requires.
+
+## Follow one `await` across the system
+
+To understand why the compiler splits one file into three, trace what happens when your application calls `start(handleUserSignup, ['alice@example.com'])`:
+
+### 1. Source transform into three modes
+
+The SWC plugin produces the three bundles described above. The **Client bundle** keeps the `workflowId` property. The **Workflow bundle** replaces `createUser`'s body with a `WORKFLOW_USE_STEP` proxy. The **Step bundle** preserves `createUser`'s body and registers it via `registerStepFunction()`.
+
+### 2. Client stub contributes `workflowId`
+
+Your application imports the **Client bundle**. When `start()` is called, it reads `handleUserSignup.workflowId` — set by the compiler — to identify the workflow. Without this property, `start()` throws a `WorkflowRuntimeError`.
+
+### 3. `start()` queues the workflow
+
+`start()` creates a `run_created` event in the **Event log** and sends a **Queue message** containing the `runId` and trace context to the workflow queue. The **Queue message** is a trigger, not durable state — the **Event log** is the source of truth.
+
+### 4. Workflow VM installs `WORKFLOW_USE_STEP`
+
+The workflow handler loads the **Workflow bundle** into a sandboxed Node.js VM. Before executing any user code, the runtime creates a deterministic context (seeded `Math.random()`, fixed `Date.now()`, deterministic `crypto.getRandomValues()` and `crypto.randomUUID()`) and binds `WORKFLOW_USE_STEP` to the `useStep` proxy:
+
+```ts
+// From packages/core/src/workflow.ts
+const useStep = createUseStep(workflowContext);
+vmGlobalThis[WORKFLOW_USE_STEP] = useStep;
+```
+
+### 5. Missing `step_completed` causes `WorkflowSuspension`
+
+The workflow code runs. When it hits `await createUser(email)`, the `WORKFLOW_USE_STEP` proxy subscribes to the `EventsConsumer` and looks for a `step_completed` event with a matching correlation ID. On the first invocation, no such event exists — so the proxy adds the step to the pending invocations queue and throws a `WorkflowSuspension` to exit the VM.
+
+### 6. Suspension handler writes `step_created` and queues the step
+
+The suspension handler catches the `WorkflowSuspension`, writes a `step_created` event to the **Event log** (persisting the step's input), and sends a **Queue message** to the step queue. Step input is hydrated from the persisted `step_created` event, not from the queue message.
+
+### 7. Step handler executes real side effects and writes `step_completed`
+
+The step handler receives the **Queue message**, loads the **Step bundle**, looks up `createUser` via `registerStepFunction()`, hydrates the input from the `step_created` event in the **Event log**, and executes the function with full Node.js runtime access. The result is persisted as a `step_completed` event.
+
+### 8. Workflow is queued again and replays with cached result
+
+After writing `step_completed`, the step handler sends a **Queue message** back to the workflow queue. The workflow handler replays the code from the beginning — but this time, when `await createUser(email)` hits the `WORKFLOW_USE_STEP` proxy, the `EventsConsumer` finds the `step_completed` event and returns the cached result. Workflow state is reconstructed by replaying code against the **Event log**. Execution continues to the next step.
+
+The three bundles are placed in predictable locations by the framework integration:
+- **Step bundle**: `.well-known/workflow/v1/step`
+- **Workflow bundle**: `.well-known/workflow/v1/flow`
+- **Client bundle**: your normal import paths
+
+This convention means the runtime can discover and load bundles without any explicit registration or configuration beyond the directives themselves.
+
+## The Full Pipeline
+
+```mermaid
+flowchart TD
+ A["Source file with directives"] --> B["SWC plugin"]
+ B --> C["Step mode"]
+ B --> D["Workflow mode"]
+ B --> E["Client mode"]
+ C --> F["Bodies preserved
registerStepFunction() appended"]
+ D --> G["Step bodies → WORKFLOW_USE_STEP
Workflow bodies preserved"]
+ E --> H["Step bodies preserved with stepId
Workflow bodies → error throw"]
+ G --> I["DCE removes orphaned imports"]
+ E --> I
+ F --> J["step.js → Step execution runtime"]
+ I --> K["flow.js → Sandboxed workflow VM"]
+ I --> L["Client bundle → Application code"]
+```
+
+## Conclusion
+
+The SWC plugin is the bridge between the programming model and the runtime model. It lets you write code that looks and feels like ordinary JavaScript — functions, imports, `async`/`await` — while producing the three specialized bundles that the durable execution runtime needs. Stable IDs ensure consistency across modes. DCE keeps bundles clean. Build-time validation catches errors early.
+
+One file. Two directives. Three bundles. Zero manual plumbing.
diff --git a/docs/content/deep-dives/compiler-magic-swc-plugin-reference.mdx b/docs/content/deep-dives/compiler-magic-swc-plugin-reference.mdx
new file mode 100644
index 0000000000..afdd5003fe
--- /dev/null
+++ b/docs/content/deep-dives/compiler-magic-swc-plugin-reference.mdx
@@ -0,0 +1,452 @@
+---
+title: Compiler Magic (SWC Plugin)
+description: How the Workflow DevKit SWC plugin transforms a single source file into three execution targets — step, workflow, and client — with stable IDs, step proxies, and dead code elimination.
+type: conceptual
+summary: A technical reference covering the three-mode compiler transformation, WORKFLOW_USE_STEP symbol replacement, client-mode error stubs, stable ID generation from module path and function name, nested step handling, and the dead code elimination pass.
+prerequisites:
+ - /docs/foundations/workflows-and-steps
+related:
+ - /docs/how-it-works/code-transform
+ - /deep-dives/step-execution-model-reference
+ - /deep-dives/durability-replay-reference
+---
+
+
+The SWC compiler plugin is the mechanism that makes the directive-based programming model work. It takes a single source file containing `"use workflow"` and `"use step"` functions and produces three distinct outputs — one for step execution, one for workflow orchestration, and one for client-side references. Without this transformation, the runtime would have no way to separate side-effecting step bodies from deterministic workflow logic, and developers would need to manually split their code across files and maintain separate ID registries.
+
+
+## Overview
+
+The Workflow DevKit compiler is an SWC plugin that operates in three modes, each producing a different transformation of the same source file:
+
+| Mode | Purpose | Output bundle | Key transformation |
+|----------|--------------------------------------------|------------------------------------|---------------------------------------------------|
+| Step | Bundles step function bodies for execution | `.well-known/workflow/v1/step` | Bodies kept intact, registered via `registerStepFunction()` |
+| Workflow | Bundles orchestration logic for the VM | `.well-known/workflow/v1/flow` | Step bodies replaced with `WORKFLOW_USE_STEP` proxy calls |
+| Client | Provides type-safe workflow references | Your application code | Workflow bodies replaced with error throws, `workflowId` attached |
+
+All three modes emit a JSON manifest comment at the top of the output containing metadata about discovered workflows, steps, and serializable classes. This manifest is consumed by bundlers and the runtime to discover and register functions.
+
+## Lifecycle
+
+The following diagram shows how a single source file is transformed into three execution targets:
+
+```mermaid
+flowchart TD
+ A["Source file
'use workflow' + 'use step' directives"] --> B["SWC plugin invoked per mode"]
+ B --> C["Step mode"]
+ B --> D["Workflow mode"]
+ B --> E["Client mode"]
+ C --> F["Step bodies kept intact
registerStepFunction() appended
Workflow bodies → error throw"]
+ D --> G["Step bodies → WORKFLOW_USE_STEP proxy
Workflow bodies intact
Registered via __private_workflows.set()"]
+ E --> H["Step bodies kept intact, stepId attached
Workflow bodies → error throw
workflowId attached"]
+ G --> I["DCE pass removes unreachable code"]
+ E --> I
+ F --> J["step.js bundle"]
+ I --> K["flow.js bundle"]
+ I --> L["Client application code"]
+```
+
+After the workflow-mode and client-mode transforms, the plugin runs a dead code elimination (DCE) pass. In workflow mode, because step bodies are replaced with proxy calls, imports and helper functions that were only referenced from those original step bodies become unreachable and are removed. In client mode, the same pruning applies to code reachable only from workflow bodies that were replaced with error stubs. Exports and identifiers still referenced by the surviving code are preserved.
+
+## Code Walkthrough
+
+### Step mode: register and preserve
+
+In step mode, step function bodies are kept completely intact — they execute with full Node.js runtime access. The plugin strips the directive, appends a `registerStepFunction()` call, and attaches an error-throwing stub to any workflow functions (since workflows should never execute directly in the step bundle):
+
+```ts title="Step mode output (from spec.md)" lineNumbers
+import { registerStepFunction } from "workflow/internal/private";
+
+export async function createUser(email) {
+ // Body preserved exactly as written
+ return { id: crypto.randomUUID(), email };
+}
+registerStepFunction("step//./workflows/user//createUser", createUser);
+
+// Workflow functions throw in step mode — they belong in the flow bundle
+export async function handleUserSignup(email) {
+ throw new Error(
+ "You attempted to execute workflow handleUserSignup function directly. " +
+ "To start a workflow, use start(handleUserSignup) from workflow/api"
+ );
+}
+handleUserSignup.workflowId = "workflow//./workflows/user//handleUserSignup";
+```
+
+### Workflow mode: step proxy replacement
+
+This is the central transformation. Step function bodies are replaced with calls through `globalThis[Symbol.for("WORKFLOW_USE_STEP")]` — a well-known symbol bound to the runtime's `useStep` function inside the sandboxed VM. Workflow function bodies are left intact because they contain deterministic orchestration logic that must replay identically:
+
+```ts title="Workflow mode output (from spec.md)" lineNumbers
+// Step body replaced — the runtime proxy checks the event log
+// and either returns a cached result or triggers suspension
+export var createUser = globalThis[Symbol.for("WORKFLOW_USE_STEP")](
+ "step//./workflows/user//createUser"
+);
+
+// Workflow body preserved — deterministic orchestration
+export async function handleUserSignup(email) {
+ const user = await createUser(email);
+ return { userId: user.id };
+}
+handleUserSignup.workflowId = "workflow//./workflows/user//handleUserSignup";
+globalThis.__private_workflows.set(
+ "workflow//./workflows/user//handleUserSignup",
+ handleUserSignup
+);
+```
+
+At runtime, when the workflow calls `await createUser(email)`, the `WORKFLOW_USE_STEP` proxy consults the `EventsConsumer`. If a matching `step_completed` event exists in the log, it returns the cached result. If not, it adds the step to the pending invocations queue and eventually throws a `WorkflowSuspension` to exit the VM.
+
+### Client mode: error stubs with workflow IDs
+
+Client mode prevents direct execution of workflow functions while preserving the `workflowId` property that `start()` needs to identify which workflow to launch:
+
+```ts title="Client mode output (from spec.md)" lineNumbers
+// Workflow body replaced with error throw
+export async function handleUserSignup(email) {
+ throw new Error(
+ "You attempted to execute workflow handleUserSignup function directly. " +
+ "To start a workflow, use start(handleUserSignup) from workflow/api"
+ );
+}
+// workflowId attached — same value as workflow mode
+handleUserSignup.workflowId = "workflow//./workflows/user//handleUserSignup";
+```
+
+Step functions in client mode keep their bodies intact (allowing local testing) and have their `stepId` property set directly on the function — unlike step mode, client mode does not import `registerStepFunction` because that module contains server-side dependencies.
+
+
+Client mode is optional but recommended. Without it, you must manually construct workflow IDs using the `{type}//{modulePath}//{functionName}` pattern or look them up in the build manifest. All framework integrations include client mode as a loader by default.
+
+
+### Stable ID generation
+
+The compiler generates deterministic IDs from the module path and function name using the pattern:
+
+```
+{type}//{modulePath}//{identifier}
+```
+
+Where:
+- **type** is `workflow`, `step`, or `class`
+- **modulePath** is either a module specifier with version (e.g., `@myorg/tasks@2.0.0`) when configured, or a relative path prefixed with `./` (e.g., `./src/jobs/order`) — file extensions are stripped
+- **identifier** is the function name, with nested functions using `/` separators and class members using `.` (static) or `#` (instance)
+
+Examples:
+- `workflow//./workflows/user-signup//handleUserSignup`
+- `step//./workflows/user-signup//createUser`
+- `step//./src/jobs/order//processOrder/innerStep` (nested step)
+- `step//./src/jobs/order//MyClass.staticMethod` (static method)
+- `step//./src/jobs/order//MyClass#instanceMethod` (instance method)
+
+IDs are stable: they don't change unless you rename files or functions. The same ID is generated in all three modes for the same function, ensuring that step mode's `registerStepFunction()`, workflow mode's `WORKFLOW_USE_STEP` proxy, and client mode's `workflowId`/`stepId` property all agree.
+
+### Nested step handling
+
+Steps defined inside workflow functions are hoisted to module level with compound names. In step mode, the nested function is extracted and registered independently. In workflow mode, the nested step becomes an inline `WORKFLOW_USE_STEP` call with an optional closure function:
+
+```ts title="Nested step — workflow mode output (from spec.md)" lineNumbers
+export async function myWorkflow(config) {
+ let count = 0;
+
+ // Nested step replaced with proxy — closure function captures `count`
+ var increment = globalThis[Symbol.for("WORKFLOW_USE_STEP")](
+ "step//./input//myWorkflow/increment",
+ () => ({ count }) // Closure variables serialized at call time
+ );
+
+ return await increment();
+}
+myWorkflow.workflowId = "workflow//./input//myWorkflow";
+globalThis.__private_workflows.set("workflow//./input//myWorkflow", myWorkflow);
+```
+
+The second argument to `WORKFLOW_USE_STEP` is a closure function that captures the variables the nested step needs. On the step side, the hoisted function retrieves these variables via `__private_getClosureVars()`.
+
+Steps can also be defined inside deeply nested object properties. The plugin recursively walks the AST to find step-annotated functions, generating compound path-based IDs (e.g., `step//./input//vade/tools/VercelRequest/execute`) and hoisting them with `$`-separated variable names (e.g., `vade$tools$VercelRequest$execute`).
+
+### Build-time validation
+
+The plugin validates directive usage during compilation and emits errors for invalid patterns:
+
+| Error | Description |
+|-------|-------------|
+| Non-async function | Functions with `"use step"` or `"use workflow"` must be async |
+| Instance methods with `"use workflow"` | Only static methods can have `"use workflow"` |
+| Misplaced directive | Directive must be at the top of the file or start of the function body |
+| Conflicting directives | Cannot have both `"use step"` and `"use workflow"` at module level |
+| Invalid exports | Module-level directive files can only export async functions |
+| Misspelled directive | Detects typos like `"use steps"` or `"use workflows"` |
+
+Most invalid patterns are caught as build-time errors before deployment.
+
+## Runtime Integration
+
+The three bundles are consumed by different parts of the Workflow DevKit infrastructure at runtime. Each bundle plugs into a specific stage of the execution lifecycle — the **Client bundle** initiates a run, the **Workflow bundle** orchestrates it inside a deterministic VM, and the **Step bundle** performs real side effects.
+
+### Client bundle runtime
+
+The **Client bundle** is imported by your application code. When you write `import { handleUserSignup } from './workflows/user'`, you get the client-mode output — a function whose body throws an error and whose `workflowId` property identifies the workflow.
+
+At runtime, `start(handleUserSignup)` reads that `workflowId` property, creates a `run_created` event in the **Event log**, and sends a **Queue message** to the workflow queue. The error-throwing body ensures that a stray `await handleUserSignup(email)` throws a helpful message instead of silently bypassing durable execution.
+
+```ts title="packages/core/src/runtime/start.ts — reading workflowId and queuing"
+return await waitedUntil(() => {
+ // @ts-expect-error this field is added by our client transform
+ const workflowName = workflow?.workflowId;
+
+ if (!workflowName) {
+ throw new WorkflowRuntimeError(
+ `'start' received an invalid workflow function. Ensure the Workflow Development Kit is configured correctly and the function includes a 'use workflow' directive.`,
+ { slug: 'start-invalid-workflow-function' }
+ );
+ }
+```
+
+```ts title="packages/core/src/runtime/start.ts — queuing the workflow"
+await world.queue(
+ getWorkflowQueueName(workflowName),
+ {
+ runId,
+ traceCarrier,
+ } satisfies WorkflowInvokePayload,
+ {
+ deploymentId,
+ }
+);
+```
+
+The **Queue message** is a trigger, not durable state — it carries the `runId` and trace context so the workflow handler knows which run to replay. All durable state lives in the **Event log**.
+
+### Workflow bundle runtime
+
+The **Workflow bundle** is loaded into a sandboxed Node.js VM when the workflow handler processes the queue message. The runtime creates a deterministic execution context seeded from run metadata, then injects the `WORKFLOW_USE_STEP` symbol so compiler-generated proxy calls resolve at runtime:
+
+```ts title="packages/core/src/workflow.ts — VM context setup"
+const {
+ context,
+ globalThis: vmGlobalThis,
+ updateTimestamp,
+} = createContext({
+ seed: `${workflowRun.runId}:${workflowRun.workflowName}:${+startedAt}`,
+ fixedTimestamp: +startedAt,
+});
+
+const useStep = createUseStep(workflowContext);
+// @ts-expect-error — `@types/node` says symbol is not valid, but it does work
+vmGlobalThis[WORKFLOW_USE_STEP] = useStep;
+// @ts-expect-error — `@types/node` says symbol is not valid, but it does work
+vmGlobalThis[WORKFLOW_GET_STREAM_ID] = (namespace?: string) =>
+ getWorkflowRunStreamId(workflowRun.runId, namespace);
+```
+
+The VM patches `Math.random()`, `Date.now()`, `crypto.getRandomValues()`, and `crypto.randomUUID()` with deterministic, seeded alternatives so every replay sees exactly the same values:
+
+```ts title="packages/core/src/vm/index.ts — deterministic patches"
+g.Math.random = rng;
+
+const Date_ = g.Date;
+(g as any).Date = function Date(
+ ...args: Parameters<(typeof globalThis)['Date']>[]
+) {
+ if (args.length === 0) {
+ return new Date_(fixedTimestamp);
+ }
+ // @ts-expect-error — Args is `Date` constructor arguments
+ return new Date_(...args);
+};
+(g as any).Date.prototype = Date_.prototype;
+Object.setPrototypeOf(g.Date, Date_);
+g.Date.now = () => fixedTimestamp;
+```
+
+```ts title="packages/core/src/vm/index.ts — deterministic crypto"
+if (prop === 'getRandomValues') {
+ return getRandomValues;
+}
+if (prop === 'randomUUID') {
+ return randomUUID;
+}
+```
+
+Step bodies are excluded from the **Workflow bundle** because replay must never re-run side effects. When the workflow calls `await createUser(email)`, the `WORKFLOW_USE_STEP` proxy subscribes to the `EventsConsumer`. If a matching `step_completed` event exists in the **Event log**, it returns the cached result. If not, it queues the step for execution and throws a `WorkflowSuspension` to exit the VM:
+
+```ts title="packages/core/src/step.ts — WORKFLOW_USE_STEP proxy behavior"
+const correlationId = `step_${ctx.generateUlid()}`;
+
+const queueItem: StepInvocationQueueItem = {
+ type: 'step',
+ correlationId,
+ stepName,
+ args,
+};
+
+ctx.invocationsQueue.set(correlationId, queueItem);
+
+ctx.eventsConsumer.subscribe((event) => {
+ if (!event) {
+ // End of events — step hasn't completed yet.
+ // Trigger suspension so the workflow handler can queue the step.
+ scheduleWhenIdle(ctx, () => {
+ ctx.onWorkflowError(
+ new WorkflowSuspension(ctx.invocationsQueue, ctx.globalThis)
+ );
+ });
+ return EventConsumerResult.NotConsumed;
+ }
+
+ if (event.eventType === 'step_completed') {
+ ctx.invocationsQueue.delete(event.correlationId);
+
+ ctx.pendingDeliveries++;
+ ctx.promiseQueue = ctx.promiseQueue.then(async () => {
+ try {
+ const hydratedResult = await hydrateStepReturnValue(
+ event.eventData.result,
+ ctx.runId,
+ ctx.encryptionKey,
+ ctx.globalThis
+ );
+ resolve(hydratedResult as Result);
+ } catch (error) {
+ reject(error);
+ } finally {
+ ctx.pendingDeliveries--;
+ }
+ });
+ return EventConsumerResult.Finished;
+ }
+
+ return EventConsumerResult.NotConsumed;
+});
+```
+
+### Step bundle runtime
+
+The **Step bundle** is loaded by the step execution handler when a **Queue message** arrives. The suspension handler writes a `step_created` event to the **Event log** (persisting the step's input) and then sends a lightweight queue message to trigger execution:
+
+```ts title="packages/core/src/runtime/suspension-handler.ts — queue the step"
+await queueMessage(
+ world,
+ `__wkf_step_${queueItem.stepName}`,
+ {
+ workflowName,
+ workflowRunId: runId,
+ workflowStartedAt,
+ stepId: queueItem.correlationId,
+ traceCarrier,
+ requestedAt: new Date(),
+ },
+ {
+ idempotencyKey: queueItem.correlationId,
+ headers: {
+ ...extractTraceHeaders(traceCarrier),
+ },
+ }
+);
+```
+
+The step handler parses the queue payload, emits a `step_started` event, then hydrates the step's input from the persisted `step_created` event — not from the queue message. The step function executes with full Node.js runtime access, and the result is persisted as a `step_completed` event. Finally, the handler re-queues the workflow so it can replay with the new result:
+
+```ts title="packages/core/src/runtime/step-handler.ts — complete and re-queue"
+// Run step_completed and trace serialization concurrently
+let stepCompleted409 = false;
+const [, traceCarrier] = await Promise.all([
+ world.events
+ .create(
+ workflowRunId,
+ {
+ eventType: 'step_completed',
+ specVersion: SPEC_VERSION_CURRENT,
+ correlationId: stepId,
+ eventData: {
+ result: result as Uint8Array,
+ },
+ },
+ { requestId }
+ )
+ .catch((err: unknown) => {
+ if (EntityConflictError.is(err)) {
+ runtimeLogger.info(
+ 'Tried completing step, but step has already finished.',
+ {
+ workflowRunId,
+ stepId,
+ stepName,
+ message: err.message,
+ }
+ );
+ stepCompleted409 = true;
+ return;
+ }
+ throw err;
+ }),
+ serializeTraceCarrier(),
+]);
+
+if (stepCompleted409) {
+ return;
+}
+
+await queueMessage(world, getWorkflowQueueName(workflowName), {
+ runId: workflowRunId,
+ traceCarrier,
+ requestedAt: new Date(),
+});
+```
+
+A finished step wakes the workflow back up by writing `step_completed` to the **Event log** and sending a **Queue message** to the workflow queue. On the next invocation, the workflow replays from the beginning — but this time the `WORKFLOW_USE_STEP` proxy finds the `step_completed` event and returns the cached result instead of suspending. Workflow state is reconstructed by replaying code against the event log.
+
+### One call, end to end
+
+The following diagram traces a single `await createUser(email)` call through the entire system:
+
+```mermaid
+sequenceDiagram
+ participant App as Application Code
+ participant Client as Client Bundle
+ participant WQ as Workflow Queue
+ participant VM as Workflow VM
+ participant EL as Event Log
+ participant SQ as Step Queue
+ participant Step as Step Handler
+
+ App->>Client: start(handleUserSignup, [email])
+ Client->>EL: write run_created event
+ Client->>WQ: queue message (runId)
+
+ WQ->>VM: invoke workflow handler
+ VM->>VM: replay workflow code in sandbox
+ VM->>VM: await createUser(email) → WORKFLOW_USE_STEP proxy
+ VM->>EL: check for step_completed
+ Note over VM,EL: No matching event found
+ VM->>VM: throw WorkflowSuspension
+
+ VM->>EL: write step_created (with input)
+ VM->>SQ: queue message (stepId)
+
+ SQ->>Step: invoke step handler
+ Step->>EL: read step_created (hydrate input)
+ Step->>Step: execute createUser(email) with full Node.js
+ Step->>EL: write step_completed (with result)
+ Step->>WQ: re-queue workflow
+
+ WQ->>VM: invoke workflow handler (replay)
+ VM->>VM: replay workflow code in sandbox
+ VM->>VM: await createUser(email) → WORKFLOW_USE_STEP proxy
+ VM->>EL: check for step_completed
+ Note over VM,EL: Event found — return cached result
+ VM->>VM: continue to next step
+```
+
+## Why This Matters
+
+The three-mode transformation is what enables writing workflows as ordinary JavaScript functions while gaining durable execution:
+
+1. **Single source of truth** — developers write one file. The compiler handles the separation of concerns, eliminating an entire class of consistency bugs where step registrations, workflow IDs, or function signatures could drift.
+
+2. **Correct by construction** — build-time validation catches invalid patterns (non-async functions, misplaced directives, forbidden imports) before code reaches production. The DCE pass ensures that workflow and client bundles don't carry dead code from replaced function bodies.
+
+3. **Stable, portable identifiers** — IDs derived from module path and function name are deterministic across builds, deployments, and runtime environments. Renaming a file changes the ID, but the build process surfaces this as a manifest change rather than a silent runtime failure.
diff --git a/docs/content/deep-dives/compiler-magic-swc-plugin-social.mdx b/docs/content/deep-dives/compiler-magic-swc-plugin-social.mdx
new file mode 100644
index 0000000000..7f5da3ca34
--- /dev/null
+++ b/docs/content/deep-dives/compiler-magic-swc-plugin-social.mdx
@@ -0,0 +1,49 @@
+---
+title: "One File In. Three Bundles Out."
+description: A concise explainer on how the Workflow DevKit SWC compiler plugin transforms a single source file into three execution targets — making durable execution feel like writing normal JavaScript.
+type: conceptual
+summary: The Workflow DevKit SWC plugin takes a single file with "use workflow" and "use step" directives and produces three bundles — step (bodies preserved), workflow (step bodies replaced with WORKFLOW_USE_STEP proxy calls), and client (safe references with workflowId). Stable IDs derived from file path and function name ensure all three modes agree on function identity.
+prerequisites:
+ - /docs/foundations/workflows-and-steps
+related:
+ - /docs/how-it-works/code-transform
+ - /deep-dives/step-execution-model-reference
+ - /deep-dives/durability-replay-reference
+ - /deep-dives/durable-streaming-reference
+---
+
+You write one file with two directives. The compiler produces three bundles. That's the trick behind Workflow DevKit's programming model — and it eliminates an entire category of infrastructure code you'd otherwise write by hand.
+
+## The Split
+
+Mark functions with `"use step"` (side effects) or `"use workflow"` (orchestration). The SWC plugin generates three outputs from the same source:
+
+```ts
+export async function handleUserSignup(email: string) {
+ 'use workflow';
+ const user = await createUser(email);
+ await sendWelcomeEmail(user);
+ await sleep('5s');
+ const webhook = createWebhook();
+ await sendOnboardingEmail(user, webhook.url);
+ await webhook;
+ console.log('Webhook Resolved');
+ return { userId: user.id, status: 'onboarded' };
+}
+
+async function createUser(email: string) {
+ 'use step';
+ console.log(`Creating a new user with email: ${email}`);
+ return { id: crypto.randomUUID(), email };
+}
+```
+
+The compiler produces three bundles from this file. The **Step bundle** preserves `createUser`'s body and registers it via `registerStepFunction()`. The **Workflow bundle** replaces `createUser`'s body with a `WORKFLOW_USE_STEP` proxy call — so `await createUser(email)` checks the Event log for a cached `step_completed` result or suspends the workflow if the step hasn't run yet. The **Client bundle** attaches `workflowId` to `handleUserSignup` so `start()` knows which workflow to queue.
+
+Each function gets a stable ID derived from the file path and function name — like `step//./workflows/user//createUser`. Same ID in all three modes. Rename the function, and the ID changes everywhere in the next build.
+
+Here's the key insight: step bodies are excluded from the Workflow bundle because replay must never re-run side effects. When a step hasn't completed yet, the proxy throws a `WorkflowSuspension`. The suspension handler writes a `step_created` event to the Event log (persisting the input) and sends a Queue message to trigger execution. The step handler runs the real function, writes `step_completed`, and re-queues the workflow. On replay, the proxy finds the cached result and returns it immediately. Workflow state is reconstructed by replaying code against the Event log — the VM's deterministic `Math.random()`, `Date.now()`, `crypto.getRandomValues()`, and `crypto.randomUUID()` ensure every replay sees identical values.
+
+## Why It Matters
+
+No manual file separation. No hand-maintained ID registries. No server-side imports leaking into client bundles. You write ordinary JavaScript, add two directive strings, and the compiler handles the three-way split that makes durable execution possible. One file in, three bundles out.
diff --git a/docs/content/deep-dives/cost-model-fluid-compute-blog.mdx b/docs/content/deep-dives/cost-model-fluid-compute-blog.mdx
new file mode 100644
index 0000000000..51a5a44f44
--- /dev/null
+++ b/docs/content/deep-dives/cost-model-fluid-compute-blog.mdx
@@ -0,0 +1,358 @@
+---
+title: "Why Your Workflow That Sleeps for a Week Costs Almost Nothing"
+description: A developer-facing deep dive into Workflow DevKit's cost model — how queue-driven execution, idle-free suspension, and delayed re-enqueue eliminate always-on workers and make workflow compute proportional to work performed, not wall-clock time.
+type: conceptual
+summary: Workflow DevKit's cost model is a consequence of three runtime mechanics — queue-driven invocation, idle-free suspension for external work, and delayed re-enqueue. Workflow handlers are invoked by queue messages, not long-lived processes. When orchestration hits a step, the handler persists step_created, queues the step, and exits. When orchestration hits a hook, the handler persists durable hook state and exits until external delivery records hook_received. Completed steps wake the workflow by persisting step_completed and re-enqueueing the run. For timed waits, the queue schedules the next delivery after the next delay interval; on Vercel, long waits are chained across multiple delayed messages. No process exists between invocations.
+prerequisites:
+ - /docs/foundations/workflows-and-steps
+related:
+ - /docs/how-it-works/event-sourcing
+ - /deep-dives/durability-replay-reference
+ - /deep-dives/step-execution-model-reference
+ - /deep-dives/durable-streaming-reference
+---
+
+Here's a workflow that sends an onboarding email, waits 3 days for the user to activate, sends a reminder if they haven't, waits another 4 days, then marks the account for cleanup.
+
+In a traditional system, that workflow ties up a worker process for a week. The process sits idle for 168 hours, doing nothing, costing money.
+
+In Workflow DevKit, that same workflow consumes compute only in a handful of brief invocations spread across seven days. No worker process stays resident between those decision points. Event writes and queue operations still occur at the suspension boundaries, but the runtime does not hold compute open while the workflow is waiting. The state lives in durable events and queued wake-ups, not in a running machine.
+
+This article explains how that works: the queue-driven execution model, the suspension mechanics, and the delayed re-enqueue infrastructure that makes wall-clock time effectively free.
+
+## The Core Insight: Run Only When There's Work
+
+
+**Think in decision points, not duration.** A 7-day workflow with 4–6 decision points consumes compute only when it starts, dispatches work, resumes after external completion, or finishes. The 168 hours between those points are represented by durable events and delayed queue messages — not a live worker.
+
+
+The cost model isn't a feature you configure. It's a direct consequence of how the runtime works:
+
+1. **Queue-driven invocation** — workflow handlers run because a queue message arrives, not because a long-lived worker stays alive. No message, no compute.
+2. **External-work suspension** — when orchestration hits a step or hook, the handler persists durable state, dispatches any needed work, and returns. Steps resume when the step handler writes `step_completed` and explicitly re-queues the workflow. Hooks stay suspended until external delivery records `hook_received`.
+3. **Timed re-entry** — waits persist `wait_created` and resume through `timeoutSeconds`. On Vercel, waits longer than 23 hours chain across multiple delayed messages because single-message delay is capped.
+
+
+**What "no polling" means here:** the workflow engine does not keep a worker alive and poll while a run is sleeping or waiting on external work. Suspended state lives in durable events plus queued wake-ups. The public `Run.returnValue` helper is separate: it polls run status once per second until the run reaches a terminal state.
+
+
+A workflow in Workflow DevKit does not stay alive while it waits. Step, hook, and wait suspension all persist durable state, but they re-enter differently. Steps resume when the step handler writes `step_completed` and explicitly re-enqueues the workflow. Waits resume when the handler returns `timeoutSeconds` and the queue schedules the next delivery. The immediate `timeoutSeconds: 0` edge case is `hook_conflict`, which forces a replay so the hook promise fails deterministically on the next invocation.
+
+These three properties compose into something powerful: a workflow that sleeps for a week consumes compute only during the brief moments when it replays state and dispatches or collects results.
+
+## How a Run Starts
+
+When your application calls `start()`, two things happen:
+
+```ts
+// From packages/core/src/runtime/start.ts
+const runId = `wrun_${ulid()}`;
+
+// 1. Persist a run_created event — the run now exists in the event log
+const result = await world.events.create(
+ runId,
+ {
+ eventType: 'run_created',
+ specVersion,
+ eventData: {
+ deploymentId: deploymentId,
+ workflowName: workflowName,
+ input: workflowArguments,
+ executionContext: { traceCarrier, workflowCoreVersion },
+ },
+ },
+ { v1Compat }
+);
+
+// 2. Queue a message to invoke the workflow handler
+await world.queue(
+ getWorkflowQueueName(workflowName),
+ {
+ runId,
+ traceCarrier,
+ } satisfies WorkflowInvokePayload,
+ {
+ deploymentId,
+ }
+);
+
+return new Run(runId);
+```
+
+`start()` returns immediately with a `Run` handle. It does not wait for the workflow to execute. The workflow handler that processes the queued message runs in a separate invocation — potentially on a different compute instance, potentially seconds later.
+
+This is the first cost insight: **starting a workflow doesn't block the caller.** The HTTP request that calls `start()` can return in milliseconds. The workflow itself runs asynchronously.
+
+## Suspension: The Moment Compute Goes to Zero
+
+When the workflow VM encounters a step that hasn't completed, it throws a `WorkflowSuspension` — a structured control-flow signal that collects all pending operations:
+
+```ts
+// From packages/core/src/runtime.ts
+if (WorkflowSuspension.is(err)) {
+ const result = await handleSuspension({
+ suspension: err,
+ world,
+ run: workflowRun,
+ span,
+ requestId,
+ });
+
+ if (result.timeoutSeconds !== undefined) {
+ return { timeoutSeconds: result.timeoutSeconds };
+ }
+
+ return;
+}
+```
+
+Inside `handleSuspension`, each pending item is processed:
+
+- **Steps** get a `step_created` event persisted and a queue message dispatched
+- **Hooks** get a `hook_created` event — the workflow stays suspended until external delivery records `hook_received`
+- **Waits** get a `wait_created` event with a `resumeAt` timestamp
+
+
+Hooks are durable suspension points, not queued jobs. The create phase records `hook_created`; the receive phase records `hook_received`. Unlike a step, hook creation does not itself queue executable work.
+
+
+Then the handler calculates the minimum timeout from any pending waits:
+
+```ts
+// From packages/core/src/runtime/suspension-handler.ts
+const now = Date.now();
+const minTimeoutSeconds = waitItems.reduce(
+ (min, queueItem) => {
+ const resumeAtMs = queueItem.resumeAt.getTime();
+ const delayMs = Math.max(1000, resumeAtMs - now);
+ const timeoutSeconds = Math.ceil(delayMs / 1000);
+ if (min === null) return timeoutSeconds;
+ return Math.min(min, timeoutSeconds);
+ },
+ null
+);
+
+if (minTimeoutSeconds !== null) {
+ return { timeoutSeconds: minTimeoutSeconds };
+}
+
+return {};
+```
+
+**The handler returns. The compute is freed.** No worker process stays resident between decision points. Event writes and queue operations still happen at the suspension boundaries, but the runtime does not hold compute open while the workflow is waiting. When the next message arrives — whether from a step completion or a delayed re-enqueue — a fresh invocation loads the log and replays from the beginning.
+
+Step, hook, and wait suspension all persist durable state, but they re-enter differently. Steps resume when the step handler writes `step_completed` and explicitly re-enqueues the workflow. Waits resume when the handler returns `timeoutSeconds` and the queue schedules the next delivery. The edge case is `hook_conflict`, which forces an immediate replay with `timeoutSeconds: 0`:
+
+```ts
+// packages/core/src/runtime/suspension-handler.ts
+if (hasHookConflict) {
+ return { timeoutSeconds: 0 };
+}
+if (minTimeoutSeconds !== null) {
+ return { timeoutSeconds: minTimeoutSeconds };
+}
+return {};
+```
+
+```ts
+// packages/core/src/runtime/step-handler.ts
+await queueMessage(world, getWorkflowQueueName(workflowName), {
+ runId: workflowRunId,
+ traceCarrier,
+ requestedAt: new Date(),
+});
+```
+
+## Delayed Re-Enqueue: Making Time Free
+
+When `timeoutSeconds` is returned, the queue infrastructure uses it to schedule the next delivery. On Vercel, this is the Queue Service:
+
+```ts
+// From packages/world-vercel/src/queue.ts
+if (typeof result?.timeoutSeconds === 'number') {
+ const delaySeconds =
+ result.timeoutSeconds > 0
+ ? Math.min(result.timeoutSeconds, MAX_DELAY_SECONDS)
+ : undefined;
+
+ // Send new message BEFORE acknowledging current message.
+ // Crash safety: if process dies after send but before ack,
+ // we get a duplicate invocation but don't lose the scheduled wakeup.
+ await queue(queueName, payload, { deploymentId, delaySeconds });
+}
+```
+
+Two details matter here:
+
+**Maximum delay clamping.** The Vercel Queue Service has a maximum single-message delay of 23 hours. For longer sleeps, the system chains messages automatically. Each time the delayed message fires, the workflow handler checks if `now >= resumeAt`. If the sleep hasn't elapsed, it returns another `timeoutSeconds` and the cycle repeats. A `sleep('30d')` chains approximately 32 delayed messages — each one consuming only the milliseconds of replay compute.
+
+**Crash safety ordering.** The new message is sent *before* the current message is acknowledged. If the process crashes between the send and the acknowledgment, the worst case is a duplicate invocation — not a lost wakeup. The event log's exactly-once guarantees (terminal state enforcement on step completions) handle the duplicate safely.
+
+In local development, the same contract is implemented with `setTimeout`:
+
+```ts
+// From packages/world-local/src/queue.ts
+if (response.ok) {
+ try {
+ const timeoutSeconds = Number(JSON.parse(text).timeoutSeconds);
+ if (Number.isFinite(timeoutSeconds) && timeoutSeconds >= 0) {
+ if (timeoutSeconds > 0) {
+ const timeoutMs = Math.min(
+ timeoutSeconds * 1000,
+ MAX_SAFE_TIMEOUT_MS
+ );
+ await setTimeout(timeoutMs);
+ }
+ continue;
+ }
+ } catch {}
+ return;
+}
+```
+
+## Step Completion: The Wake-Up Call
+
+When a step finishes executing, the step handler re-enqueues the workflow:
+
+```ts
+// From packages/core/src/runtime/step-handler.ts
+if (EntityConflictError.is(err)) {
+ runtimeLogger.debug(
+ 'Step in terminal state, re-enqueuing workflow',
+ {
+ stepName,
+ stepId,
+ workflowRunId,
+ error: err.message,
+ }
+ );
+
+ await queueMessage(world, getWorkflowQueueName(workflowName), {
+ runId: workflowRunId,
+ traceCarrier: await serializeTraceCarrier(),
+ requestedAt: new Date(),
+ });
+
+ return;
+}
+```
+
+This is the mechanism that drives the workflow forward. There's no persistent orchestrator process watching for step completions. Each step completion is a discrete event that triggers exactly one workflow re-invocation. The workflow replays through all completed steps (returning cached results in milliseconds), then continues execution until it either completes or suspends at the next pending operation.
+
+## Run Completion: Clean Exit
+
+When the workflow VM finishes without throwing a `WorkflowSuspension`, the handler persists a `run_completed` event:
+
+```ts
+// From packages/core/src/runtime.ts
+await world.events.create(
+ runId,
+ {
+ eventType: 'run_completed',
+ specVersion: SPEC_VERSION_CURRENT,
+ eventData: {
+ output: workflowResult,
+ },
+ },
+ { requestId }
+);
+```
+
+No further messages are queued. No compute resources remain allocated. The workflow is done.
+
+## The Complete Lifecycle
+
+```mermaid
+flowchart TD
+ A["Client calls start()"] --> B["run_created event + queue message"]
+ B --> C["Workflow handler invoked"]
+ C --> D["Event log loaded, VM replays"]
+ D --> E{{"Workflow reaches suspension point"}}
+
+ E -->|"step"| F["step_created persisted, step queued"]
+ F --> G["Handler returns — compute freed"]
+ G --> H["Step executes on separate compute"]
+ H --> I["step_completed persisted"]
+ I --> J["Step handler re-enqueues workflow"]
+ J --> C
+
+ E -->|"hook"| K["hook_created persisted"]
+ K --> L["Handler returns — compute freed"]
+ L --> M["External system delivers payload"]
+ M --> N["hook_received persisted, workflow re-enqueued"]
+ N --> C
+
+ E -->|"sleep()"| O["wait_created persisted"]
+ O --> P["Handler returns timeoutSeconds"]
+ P --> Q["Queue delays next delivery"]
+ Q --> C
+
+ D -->|"All operations done"| R["run_completed — workflow finished"]
+```
+
+## Before and After: Always-On vs. Queue-Driven
+
+Consider a multi-step order pipeline: charge the card, reserve inventory, send confirmation, wait for shipping notification from a third-party API.
+
+**Always-on orchestration:**
+- A worker process stays alive for the entire workflow. If the shipping API takes 6 hours to respond, the worker idles for 6 hours.
+- If the process crashes between steps, you need explicit checkpointing, idempotency keys, and manual retry logic.
+- Scaling means provisioning enough persistent workers for peak concurrency. Each worker is occupied for the full workflow lifetime, regardless of active vs. idle time.
+- A 100-workflow spike requires 100 workers. A workflow that runs for a week occupies one worker for a week.
+
+**Queue-driven execution:**
+- The workflow handler runs for milliseconds per invocation. Between steps, nothing exists. The event log is the complete state.
+- Crashes are invisible. If the process dies after charging the card, the `step_completed` event is already in the log. On re-invocation, the workflow replays through the cached result and suspends at the next uncompleted step.
+- Steps execute as independent queue messages on any available compute. A workflow with 100 concurrent steps (`Promise.all(...)`) dispatches 100 messages — the queue distributes them across available capacity without dedicated workers.
+- Wall-clock time doesn't hold a worker open. A `sleep('7d')` and a `sleep('5s')` both produce delayed queue messages and exit. Short waits usually need one delayed delivery; longer waits chain multiple delayed messages (Vercel caps single-message delay at 23 hours). Either way, no compute stays resident while the workflow is waiting.
+
+The important comparison is not one exact benchmark number. It is the billing shape. An always-on design pays for wall-clock residency. Workflow DevKit pays for short invocations that persist events, dispatch steps, replay orchestration, and finish the run. The gap between those invocations is durable state plus delayed queue delivery — not an allocated worker.
+
+The difference is most dramatic for workflows with long waits. An email drip campaign that sends messages over two weeks costs compute only for the handful of milliseconds each invocation takes — not for the 14 days between them.
+
+## What This Means for Scaling
+
+The queue-driven model changes the scaling equation:
+
+**Traditional:** workers × uptime = cost. A workflow running for a week on a $0.10/hr instance costs $16.80, regardless of how much actual work it does.
+
+**Queue-driven:** invocations × duration per invocation = cost. A workflow that invokes 10 times for 50ms each costs 10 × 50ms = 500ms of compute total — spread across a week.
+
+This scales in both directions. A burst of 10,000 new workflows doesn't require 10,000 workers — it requires 10,000 queue messages, distributed across whatever compute capacity the platform provides. And long-running workflows don't hold resources — they exist only as events in a log, taking no compute until their next invocation.
+
+## Parallel Steps: Shared Nothing
+
+When a workflow calls `Promise.all([stepA(), stepB(), stepC()])`, the runtime doesn't execute all three steps in the same invocation. It:
+
+1. Dispatches three independent queue messages — one per step
+2. Suspends the workflow handler
+3. Each step runs on separate compute, potentially concurrently
+4. Each step completion re-enqueues the workflow
+5. The workflow replays, collecting results as `step_completed` events appear in the log
+6. When all three results are cached, `Promise.all()` resolves and the workflow continues
+
+No shared memory. No thread pool. No worker coordination. Each step is an independent unit of work that runs wherever the queue delivers it.
+
+## Replay Is Cheap
+
+A natural concern: if the workflow replays from the beginning on every invocation, doesn't replay get expensive as the workflow grows?
+
+No. Replay re-executes only the orchestration logic — the `"use workflow"` function. Each step call hits the `EventsConsumer`, finds the cached `step_completed` result, deserializes it, and returns. No network calls. No database queries. No external service interactions. The time is dominated by deserialization, not computation — typically sub-millisecond per event.
+
+A workflow with 200 completed steps replays in a few milliseconds. The 201st step invocation adds negligible overhead to the previous 200 cached results. Replay cost is linear in events but with a tiny constant — it's not the bottleneck.
+
+## Error Handling and Retry
+
+The cost model extends naturally to error recovery. When a step fails with a `RetryableError`, the runtime re-queues it with exponential backoff. When a step fails with a `FatalError`, the step is marked as permanently failed and the workflow is re-invoked to handle the failure in its orchestration logic.
+
+Neither retry path requires a persistent process. Each retry is a new queue message, a new invocation, a few milliseconds of compute. The cost of retrying a failed step is the same as the cost of executing it the first time — just the step's compute duration, not the entire workflow's wall-clock time.
+
+
+The cost characteristics described here are a consequence of the queue and suspension mechanics, not a separately configurable feature. Any deployment target that provides queue-based message delivery with delay support inherits these properties automatically.
+
+
+## Conclusion
+
+The cost model isn't an optimization layered on top of the runtime — it *is* the runtime. Queue-driven invocation means no idle processes. Suspension frees all compute while the workflow waits — event writes and queue operations happen at the boundaries, but nothing is billed in between. Delayed re-enqueue means wall-clock time is free. Together, they make workflow compute proportional to the work performed, not the time elapsed.
+
+A week-long workflow. Milliseconds of compute. That's the model.
diff --git a/docs/content/deep-dives/cost-model-fluid-compute-reference.mdx b/docs/content/deep-dives/cost-model-fluid-compute-reference.mdx
new file mode 100644
index 0000000000..667c58ac4f
--- /dev/null
+++ b/docs/content/deep-dives/cost-model-fluid-compute-reference.mdx
@@ -0,0 +1,320 @@
+---
+title: Cost Model and Fluid Compute
+description: How Workflow DevKit eliminates always-on workers by using queue-driven execution, idle-free suspension, and delayed re-enqueue to run workflow logic only when there is work to do.
+type: conceptual
+summary: A technical reference tracing the runtime control flow that makes workflow execution pay-per-use — from run creation through queue dispatch, suspension, step execution, delayed re-enqueue, and completion.
+prerequisites:
+ - /docs/foundations/workflows-and-steps
+related:
+ - /docs/how-it-works/event-sourcing
+ - /deep-dives/durability-replay-reference
+ - /deep-dives/step-execution-model-reference
+ - /deep-dives/durable-streaming-reference
+---
+
+
+Traditional long-running background jobs keep a process alive for the entire duration of a workflow — even when the workflow is waiting for a step to finish, an external webhook, or a timer to expire. Workflow DevKit inverts this model: the orchestrator runs only when there is a decision to make, suspends by returning from the handler, and wakes up again via queue delivery when new data arrives. The compute cost is proportional to the work performed, not the wall-clock time of the workflow.
+
+
+## Overview
+
+The cost model comes from three distinct re-entry paths in the runtime:
+
+1. **Queue-driven invocation** — `start()` persists `run_created`, queues the workflow message, and returns a `Run` handle immediately.
+2. **External-work suspension** — when orchestration hits a step, the handler persists `step_created`, queues the step, and exits. When orchestration hits a hook, the runtime records hook state durably and exits. The workflow stays suspended until external delivery records `hook_received`.
+3. **Timed re-entry** — waits persist `wait_created` and resume through `timeoutSeconds`.
+
+
+`timeoutSeconds` is not the generic suspension mechanism. In the current runtime, waits return a positive `timeoutSeconds`, and `hook_conflict` returns `timeoutSeconds: 0` for an immediate replay. Step completion resumes the workflow because the step handler writes `step_completed` and explicitly re-queues the workflow.
+
+
+These three properties mean a workflow that sleeps for a week consumes compute only during the brief moments when it replays state and dispatches or collects results — not for the seven days in between.
+
+## Lifecycle
+
+The following diagram traces a single workflow run from creation through suspension, step execution, timed waits, and completion:
+
+```mermaid
+flowchart TD
+ A["Client calls start()"] --> B["run_created event persisted"]
+ B --> C["Workflow message queued"]
+ C --> D["Workflow handler invoked"]
+ D --> E["run_started event — status: running"]
+ E --> F["Event log loaded, VM replays workflow code"]
+ F --> G{"Next pending operation?"}
+ G -->|"step / hook"| H["step_created / hook_created events persisted"]
+ H --> I["Step messages queued"]
+ I --> J["Handler returns — compute released"]
+ J --> K["Step handler or external delivery completes"]
+ K --> L["step_completed / hook_received event persisted"]
+ L --> M["Workflow message re-enqueued"]
+ M --> D
+ G -->|"wait / sleep"| N["wait_created event persisted"]
+ N --> O["Handler returns timeoutSeconds"]
+ O --> P["Queue delays next delivery"]
+ P --> D
+ F -->|"No pending work"| Q["Workflow runs to completion"]
+ Q --> R["run_completed event persisted"]
+```
+
+## Code Walkthrough
+
+### Run creation and initial queue dispatch
+
+When client code calls `start()`, two things happen in sequence: a `run_created` event is persisted to the event log, and a message is placed on the workflow queue. The function returns immediately with a `Run` handle — it does not wait for the workflow to execute.
+
+```ts title="packages/core/src/runtime/start.ts" lineNumbers {29-30,33-39}
+// Generate runId client-side so we have it before serialization
+const runId = `wrun_${ulid()}`;
+
+// ...
+
+// Create run via run_created event (event-sourced architecture)
+const result = await world.events.create(
+ runId,
+ {
+ eventType: 'run_created',
+ specVersion,
+ eventData: {
+ deploymentId: deploymentId,
+ workflowName: workflowName,
+ input: workflowArguments,
+ executionContext: { traceCarrier, workflowCoreVersion },
+ },
+ },
+ { v1Compat }
+);
+
+// ...
+
+await world.queue(
+ getWorkflowQueueName(workflowName),
+ {
+ runId,
+ traceCarrier,
+ } satisfies WorkflowInvokePayload,
+ {
+ deploymentId,
+ }
+);
+
+return new Run(runId);
+```
+
+The call to `world.queue()` is asynchronous but non-blocking from the caller's perspective — `start()` returns the `Run` object as soon as the message is accepted. The workflow handler that processes this message runs in a separate invocation, potentially on a different compute instance.
+
+### Suspension: recording pending work and exiting
+
+When the workflow VM encounters a step, hook, or sleep, it throws a `WorkflowSuspension` — a structured control-flow signal, not an error. The workflow handler in `runtime.ts` catches this, delegates to `handleSuspension`, and returns the result:
+
+```ts title="packages/core/src/runtime.ts" lineNumbers
+// WorkflowSuspension is normal control flow — not an error
+if (WorkflowSuspension.is(err)) {
+ const result = await handleSuspension({
+ suspension: err,
+ world,
+ run: workflowRun,
+ span,
+ requestId,
+ });
+
+ if (result.timeoutSeconds !== undefined) {
+ return { timeoutSeconds: result.timeoutSeconds };
+ }
+
+ // Suspension handled, no further work needed
+ return;
+}
+```
+
+Inside `handleSuspension`, each pending item is recorded as an event and dispatched:
+
+- **Steps** get a `step_created` event and a queue message to the step handler
+- **Hooks** get a `hook_created` event — the workflow stays suspended until external delivery records `hook_received`
+- **Waits** get a `wait_created` event with a `resumeAt` timestamp
+
+
+Hooks are durable suspension points, not queued jobs. The create phase records `hook_created`; the receive phase records `hook_received`. Unlike a step, hook creation does not itself queue executable work.
+
+
+The handler then calculates the minimum timeout from any pending waits:
+
+```ts title="packages/core/src/runtime/suspension-handler.ts" lineNumbers
+// Calculate minimum timeout from waits
+const now = Date.now();
+const minTimeoutSeconds = waitItems.reduce(
+ (min, queueItem) => {
+ const resumeAtMs = queueItem.resumeAt.getTime();
+ const delayMs = Math.max(1000, resumeAtMs - now);
+ const timeoutSeconds = Math.ceil(delayMs / 1000);
+ if (min === null) return timeoutSeconds;
+ return Math.min(min, timeoutSeconds);
+ },
+ null
+);
+
+// ...
+
+if (hasHookConflict) {
+ return { timeoutSeconds: 0 };
+}
+
+if (minTimeoutSeconds !== null) {
+ return { timeoutSeconds: minTimeoutSeconds };
+}
+
+return {};
+```
+
+Timed waits are the normal source of delayed workflow wake-ups here. A step-driven suspension usually returns `{}` from `handleSuspension`; the later wake-up happens when the step handler persists `step_completed` and explicitly re-enqueues the workflow. One edge case exists: hook conflicts force an immediate replay via `timeoutSeconds: 0` so the next invocation can surface `hook_conflict` deterministically.
+
+When `timeoutSeconds` is returned, the queue infrastructure uses it to schedule the next delivery. **The handler exits and the compute is freed.** No process sleeps or polls during the delay.
+
+### Delayed re-enqueue in production (Vercel Queue Service)
+
+On Vercel, the queue handler uses `delaySeconds` to schedule a new message rather than holding the current one:
+
+```ts title="packages/world-vercel/src/queue.ts" lineNumbers
+if (typeof result?.timeoutSeconds === 'number') {
+ // When timeoutSeconds is 0, skip delaySeconds entirely for immediate re-enqueue.
+ // Otherwise, clamp to max delay (23h) - for longer sleeps, the workflow will chain
+ // multiple delayed messages until the full sleep duration has elapsed.
+ const delaySeconds =
+ result.timeoutSeconds > 0
+ ? Math.min(result.timeoutSeconds, MAX_DELAY_SECONDS)
+ : undefined;
+
+ // Send new message BEFORE acknowledging current message.
+ // This ensures crash safety: if process dies after send but before ack,
+ // we may get a duplicate invocation but won't lose the scheduled wakeup.
+ await queue(queueName, payload, { deploymentId, delaySeconds });
+}
+```
+
+For sleeps longer than 23 hours (the maximum single-message delay), the system chains messages automatically. Each time the delayed message fires, the workflow handler checks whether `now >= resumeAt`. If the sleep has not elapsed, it returns another `timeoutSeconds` and the cycle repeats.
+
+### Delayed re-enqueue in local development
+
+The local queue implements the same contract using `setTimeout`:
+
+```ts title="packages/world-local/src/queue.ts" lineNumbers
+if (response.ok) {
+ try {
+ const timeoutSeconds = Number(JSON.parse(text).timeoutSeconds);
+ if (Number.isFinite(timeoutSeconds) && timeoutSeconds >= 0) {
+ if (timeoutSeconds > 0) {
+ const timeoutMs = Math.min(
+ timeoutSeconds * 1000,
+ MAX_SAFE_TIMEOUT_MS
+ );
+ await setTimeout(timeoutMs);
+ }
+ continue;
+ }
+ } catch {}
+ return;
+}
+```
+
+The local queue keeps the message in-process and uses a `setTimeout` to delay the next loop iteration. This simulates the production behavior where the message is invisible for the delay period.
+
+### Step completion triggers workflow re-invocation
+
+When a step finishes — whether successfully or with a terminal failure — the step handler re-enqueues the workflow so it can replay with the new result:
+
+```ts title="packages/core/src/runtime/step-handler.ts" lineNumbers
+if (EntityConflictError.is(err)) {
+ runtimeLogger.debug(
+ 'Step in terminal state, re-enqueuing workflow',
+ {
+ stepName,
+ stepId,
+ workflowRunId,
+ error: err.message,
+ }
+ );
+
+ await queueMessage(world, getWorkflowQueueName(workflowName), {
+ runId: workflowRunId,
+ traceCarrier: await serializeTraceCarrier(),
+ requestedAt: new Date(),
+ });
+
+ return;
+}
+```
+
+This is the mechanism that drives the workflow forward without a persistent orchestrator process. Each step completion is a discrete event that triggers exactly one workflow re-invocation.
+
+### Run completion
+
+When the workflow VM runs to the end of the function without throwing a `WorkflowSuspension`, the handler persists a `run_completed` event and returns without re-enqueueing:
+
+```ts title="packages/core/src/runtime.ts" lineNumbers
+await world.events.create(
+ runId,
+ {
+ eventType: 'run_completed',
+ specVersion: SPEC_VERSION_CURRENT,
+ eventData: {
+ output: workflowResult,
+ },
+ },
+ { requestId }
+);
+```
+
+No further messages are queued. The workflow is done, and no compute resources remain allocated.
+
+## Why This Matters
+
+The queue-driven execution model means workflow compute is consumed only during active processing:
+
+- **No always-on worker loop** — workflow handlers are invoked on demand by queue messages. Between invocations, no process exists. This is fundamentally different from traditional job runners that poll a database or maintain long-lived connections.
+- **Wall-clock time is free** — a `sleep('7d')` call costs the same as a `sleep('5s')` call in terms of compute. Both produce a delayed queue message and release all resources immediately.
+- **Replay is cheap** — re-executing the workflow VM to reconstruct state takes milliseconds for typical workflows. Step results are cached in the event log, so replayed steps return instantly without re-executing their bodies.
+- **Parallel steps share nothing** — `Promise.all([stepA(), stepB()])` dispatches both steps as independent queue messages. Each step runs in its own invocation with full Node.js access, and both can execute concurrently on separate compute instances.
+- **Crash safety without cost** — if a handler crashes mid-execution, the queue automatically re-delivers the message. The event log ensures that any work already persisted is not repeated. Recovery is a normal re-invocation, not a special monitoring process.
+
+
+The cost characteristics described here are a consequence of the queue and suspension mechanics, not a separately configurable feature. Any deployment target that provides queue-based message delivery with delay support (such as Vercel Queue Service in production, or the local filesystem queue in development) inherits these properties automatically.
+
+
+
+Workflow execution is event-driven. The `Run.returnValue` convenience getter is separate client-side behavior and currently polls run state once per second.
+
+
+```ts title="packages/core/src/runtime/run.ts" lineNumbers
+private async pollReturnValue(): Promise {
+ while (true) {
+ try {
+ const run = await this.world.runs.get(this.runId);
+
+ if (run.status === 'completed') {
+ const encryptionKey = await this.getEncryptionKey();
+ return await hydrateWorkflowReturnValue(
+ run.output,
+ this.runId,
+ encryptionKey
+ );
+ }
+
+ if (run.status === 'cancelled') {
+ throw new WorkflowRunCancelledError(this.runId);
+ }
+
+ if (run.status === 'failed') {
+ throw new WorkflowRunFailedError(this.runId, run.error);
+ }
+
+ throw new WorkflowRunNotCompletedError(this.runId, run.status);
+ } catch (error) {
+ if (WorkflowRunNotCompletedError.is(error)) {
+ await new Promise((resolve) => setTimeout(resolve, 1_000));
+ continue;
+ }
+ throw error;
+ }
+ }
+}
+```
diff --git a/docs/content/deep-dives/cost-model-fluid-compute-social.mdx b/docs/content/deep-dives/cost-model-fluid-compute-social.mdx
new file mode 100644
index 0000000000..fb8294bc1c
--- /dev/null
+++ b/docs/content/deep-dives/cost-model-fluid-compute-social.mdx
@@ -0,0 +1,52 @@
+---
+title: "A Week-Long Workflow. Milliseconds of Compute."
+description: A concise explainer on how Workflow DevKit's queue-driven runtime avoids idle worker residency — compute runs only at durable decision points, not throughout the wall-clock lifetime of a workflow.
+type: conceptual
+summary: Workflow DevKit's cost model comes from queue-driven invocation, external-work suspension, and timed re-entry. `start()` persists `run_created` and queues the run. Waits wake it through `timeoutSeconds`, and completed steps wake it by persisting `step_completed` and re-queuing the workflow. Between those decision points, no worker process stays resident.
+prerequisites:
+ - /docs/foundations/workflows-and-steps
+related:
+ - /docs/how-it-works/event-sourcing
+ - /deep-dives/durability-replay-reference
+ - /deep-dives/step-execution-model-reference
+ - /deep-dives/durable-streaming-reference
+---
+
+A workflow that sends an email and waits 3 days for a response. In a traditional system, a worker sits idle for 72 hours. In Workflow DevKit, no worker process stays resident during those 72 hours — the state lives in durable events and queued wake-ups.
+
+## The Key Insight
+
+Compute exists only at decision points — not during the gaps between them.
+
+A workflow has **decision points** — dispatching steps, collecting results, advancing — and **gaps** between them. Traditional orchestrators keep a process alive during gaps. Workflow DevKit doesn't.
+
+## Three Mechanics
+
+1. **Queue-driven invocation** — workflow handlers run only when a queue message arrives. No message, no compute.
+2. **No idle compute during suspension** — when orchestration hits a step or hook, the runtime persists durable state and exits. Nothing stays resident while the workflow waits.
+3. **Delayed re-enqueue** — waits like `sleep()` resume by returning `timeoutSeconds`; the queue delivers the next invocation later.
+
+Only waits use `timeoutSeconds` as the wake-up path. When a step finishes, the step handler writes `step_completed` and re-enqueues the workflow directly:
+
+```ts
+// From packages/core/src/runtime/step-handler.ts
+await queueMessage(world, getWorkflowQueueName(workflowName), {
+ runId: workflowRunId,
+ traceCarrier: await serializeTraceCarrier(),
+ requestedAt: new Date(),
+});
+```
+
+
+Hooks are durable suspension points, not queued jobs. The create phase records `hook_created`; the receive phase records `hook_received`. Unlike a step, hook creation does not itself queue executable work.
+
+
+## Why It Matters
+
+A long sleep and a short sleep both avoid holding a worker open, which is the real cost-model win. They are not literally identical in the current implementation: short waits usually need one delayed delivery, while longer waits can chain multiple delayed queue messages before the workflow finally resumes. On Vercel, single-message delay is capped at 23 hours, so a `sleep('30d')` chains roughly 32 messages.
+
+The workflow engine itself does not poll while suspended; the separate `Run.returnValue` helper polls run status once per second.
+
+Replay stays cheap because the workflow re-runs orchestration code and reuses cached step results instead of re-executing side effects. For wait-heavy workflows, orchestration cost is dramatically lower than always-on workers because the runtime bills decision points, not idle wall-clock time. No always-on worker. No sleeping process. No paying to wait.
+
+A week-long workflow. Milliseconds of compute. That's the model.
diff --git a/docs/content/deep-dives/durability-replay-blog.mdx b/docs/content/deep-dives/durability-replay-blog.mdx
new file mode 100644
index 0000000000..7b44f449d5
--- /dev/null
+++ b/docs/content/deep-dives/durability-replay-blog.mdx
@@ -0,0 +1,391 @@
+---
+title: "Your Workflow Crashed. Here's How It Picks Up Exactly Where It Left Off."
+description: A developer-facing deep dive into Workflow DevKit's event-sourced durability model — how an append-only event log, ULID-ordered events, and deterministic replay reconstruct workflow state after cold starts without snapshots or serialized heaps.
+type: conceptual
+summary: Workflow DevKit persists every state transition as an immutable event in an append-only log. When a workflow resumes after a crash, deploy, or scale-from-zero event, the runtime loads the event log, recreates a deterministic VM sandbox, and replays the workflow code. Completed steps return cached results instantly from the log — no heap snapshots, no state serialization format, no re-execution of side effects.
+prerequisites:
+ - /docs/foundations/workflows-and-steps
+related:
+ - /docs/how-it-works/event-sourcing
+ - /deep-dives/step-execution-model-reference
+ - /deep-dives/durable-streaming-reference
+ - /deep-dives/cost-model-fluid-compute-reference
+---
+
+Your workflow charged a credit card, reserved inventory, and sent a confirmation email. Then the process died. What happens next?
+
+In a traditional system, the answer is complicated. You'd need to figure out what already happened — maybe by querying a database, checking an external service, or reading from a checkpoint file. If your checkpoint was stale or your serialization format changed between deploys, you'd be debugging state reconstruction bugs at 2 AM.
+
+Workflow DevKit takes a fundamentally different approach. There's no checkpoint. There's no serialized heap snapshot. Instead, there's an append-only **Event log** — and a runtime that can reconstruct any workflow's state by replaying that log from the beginning.
+
+This article explains exactly how that works: the **Event log** structure, the ordering guarantees, the replay engine, and why this model eliminates entire categories of durability bugs.
+
+## The Event Log Is the State
+
+Most workflow systems store state as a mutable object — a row in a database, a JSON blob, a serialized heap. When something changes, the state object is updated in place. This works until it doesn't: concurrent updates conflict, partial writes corrupt state, and version mismatches between the serialization format and the running code produce silent bugs.
+
+Workflow DevKit stores no mutable state. Instead, every transition — a run starting, a step completing, a webhook arriving, a sleep elapsing — is recorded as an immutable, typed event appended to a log. The current state of any entity is derived by reading its events in order.
+
+The event types are defined in `packages/world/src/events.ts` as a Zod discriminated union:
+
+```ts
+// From packages/world/src/events.ts
+export const EventTypeSchema = z.enum([
+ // Run lifecycle events
+ 'run_created',
+ 'run_started',
+ 'run_completed',
+ 'run_failed',
+ 'run_cancelled',
+ // Step lifecycle events
+ 'step_created',
+ 'step_completed',
+ 'step_failed',
+ 'step_retrying',
+ 'step_started',
+ // Hook lifecycle events
+ 'hook_created',
+ 'hook_received',
+ 'hook_disposed',
+ 'hook_conflict',
+ // Wait lifecycle events
+ 'wait_created',
+ 'wait_completed',
+]);
+```
+
+Every event shares a common structure — an `eventType`, an optional `correlationId` that links events for the same entity, and typed `eventData` specific to that event:
+
+```ts
+// From packages/world/src/events.ts
+export const BaseEventSchema = z.object({
+ eventType: EventTypeSchema,
+ correlationId: z.string().optional(),
+ specVersion: z.number().optional(),
+});
+
+// Server response adds run-level and temporal fields
+export const EventSchema = AllEventsSchema.and(
+ z.object({
+ runId: z.string(),
+ eventId: z.string(),
+ createdAt: z.coerce.date(),
+ specVersion: z.number().optional(),
+ })
+);
+```
+
+Four entity categories — runs, steps, hooks, and waits — each follow a defined state machine:
+
+| Entity | Events | Terminal States |
+|--------|--------|-----------------|
+| Run | `run_created` → `run_started` → `run_completed` / `run_failed` / `run_cancelled` | completed, failed, cancelled |
+| Step | `step_created` → `step_started` → `step_completed` / `step_failed` (with optional `step_retrying` loops) | completed, failed |
+| Hook | `hook_created` → `hook_received`* → `hook_disposed` (or `hook_conflict` on token collision) | disposed, conflicted |
+| Wait | `wait_created` → `wait_completed` | completed |
+
+
+Terminal states are enforced at the world backend layer. Attempting to create an event that would transition an entity out of a terminal state — for example, a second `step_completed` on an already-completed step — results in an `EntityConflictError`. This is the mechanism that guarantees exactly-once step execution, even when duplicate messages arrive from the queue.
+
+
+## ULID Ordering: Time in the ID
+
+Events need to be ordered. Most event-sourced systems use a separate sequence number — an auto-incrementing integer managed by the database. Workflow DevKit embeds the ordering directly in the event ID using ULIDs (Universally Unique Lexicographically Sortable Identifiers).
+
+A ULID encodes a millisecond timestamp in its first 48 bits, followed by 80 bits of randomness. Sorting ULIDs lexicographically produces chronological order without any additional column or index. This means the event log is self-ordering — you can sort by `eventId` and get the correct temporal sequence.
+
+But embedding timestamps in client-generated IDs creates a trust problem. What if a client sends an event with a forged timestamp that sorts before legitimate events? The `validateUlidTimestamp` function in `packages/world/src/ulid.ts` prevents this:
+
+```ts
+// From packages/world/src/ulid.ts
+export const DEFAULT_TIMESTAMP_THRESHOLD_MS = 5 * 60 * 1000;
+
+export function validateUlidTimestamp(
+ prefixedUlid: string,
+ prefix: string,
+ thresholdMs: number = DEFAULT_TIMESTAMP_THRESHOLD_MS
+): string | null {
+ const raw = prefixedUlid.startsWith(prefix)
+ ? prefixedUlid.slice(prefix.length)
+ : prefixedUlid;
+
+ const ulidTimestamp = ulidToDate(raw);
+ if (!ulidTimestamp) {
+ return `Invalid runId: "${prefixedUlid}" is not a valid ULID`;
+ }
+
+ const serverTimestamp = new Date();
+ const driftMs = Math.abs(
+ serverTimestamp.getTime() - ulidTimestamp.getTime()
+ );
+
+ if (driftMs <= thresholdMs) {
+ return null;
+ }
+
+ const driftSeconds = Math.round(driftMs / 1000);
+ const thresholdSeconds = Math.round(thresholdMs / 1000);
+ return `Invalid runId timestamp: embedded timestamp differs from server time by ${driftSeconds}s (threshold: ${thresholdSeconds}s)`;
+}
+```
+
+Any client-generated ULID whose embedded timestamp drifts more than 5 minutes from server time is rejected. This prevents clock-skew attacks where manipulated IDs could sort before or after legitimate events, corrupting the log's chronological integrity.
+
+Inside the workflow VM, ULIDs use the seeded RNG for their random component, so IDs generated during replay match those from the original execution:
+
+```ts
+// From packages/core/src/workflow.ts
+const ulid = monotonicFactory(() => vmGlobalThis.Math.random());
+```
+
+## The Replay Engine: EventsConsumer
+
+When a workflow needs to resume — after a cold start, a step completion, or a sleep elapsing — the runtime doesn't try to deserialize a snapshot. It loads the **Event log** and replays the **Workflow bundle** from the beginning. The `EventsConsumer` class in `packages/core/src/events-consumer.ts` is the mechanism that makes this work. Workflow state is reconstructed by replaying code against the **Event log**, not by resuming in-memory state.
+
+The consumer holds the full event array and a cursor (`eventIndex`) that starts at zero and advances as events are matched to callbacks:
+
+```ts
+// From packages/core/src/events-consumer.ts
+export class EventsConsumer {
+ eventIndex: number;
+ readonly events: Event[] = [];
+ readonly callbacks: EventConsumerCallback[] = [];
+
+ private consume = () => {
+ const currentEvent = this.events[this.eventIndex] ?? null;
+ for (let i = 0; i < this.callbacks.length; i++) {
+ const callback = this.callbacks[i];
+ let handled = EventConsumerResult.NotConsumed;
+ try {
+ handled = callback(currentEvent);
+ } catch (error) {
+ eventsLogger.error('EventConsumer callback threw an error', { error });
+ }
+ if (
+ handled === EventConsumerResult.Consumed ||
+ handled === EventConsumerResult.Finished
+ ) {
+ this.eventIndex++;
+ if (handled === EventConsumerResult.Finished) {
+ this.callbacks.splice(i, 1);
+ }
+ process.nextTick(this.consume);
+ return;
+ }
+ }
+ };
+}
+```
+
+Each callback returns one of three results:
+
+- **`Consumed`** — the callback handled this event but wants to stay registered for future events (used for multi-event entity lifecycles)
+- **`NotConsumed`** — the callback doesn't match this event; pass it to the next callback
+- **`Finished`** — the callback handled this event and is done; remove it from the list
+
+When a callback consumes an event, the cursor advances by one, and the next `consume()` call is scheduled via `process.nextTick`. This microtask-based scheduling ensures events are processed one at a time in order, while yielding to other async work between events.
+
+### How Callbacks Map to Workflow Primitives
+
+When the runtime sets up a replay, it registers several categories of callbacks:
+
+1. **Timestamp subscriber** — a passive callback that never consumes events but updates the VM's `fixedTimestamp` from each event's `createdAt`. This makes `Date.now()` inside the workflow advance through the original execution timeline during replay.
+
+2. **Run lifecycle consumer** — consumes `run_created` and `run_started` events to advance past the structural events that precede workflow code execution.
+
+3. **Step/hook/wait consumers** — registered dynamically as the workflow code re-executes. When the replaying workflow calls `await myStep(input)`, the `useStep` proxy subscribes a callback that looks for the matching `step_completed` event. If found, the cached result is returned. If the cursor reaches end-of-log (`null`), the step hasn't completed yet and the workflow suspends.
+
+```ts
+// From packages/core/src/workflow.ts — timestamp advancement
+workflowContext.eventsConsumer.subscribe((event) => {
+ const createdAt = event?.createdAt;
+ if (createdAt) {
+ updateTimestamp(+createdAt);
+ }
+ // Never consume events - this is only a passive subscriber
+ return EventConsumerResult.NotConsumed;
+});
+
+// Run lifecycle events must be consumed to advance past them
+workflowContext.eventsConsumer.subscribe((event) => {
+ if (!event) {
+ return EventConsumerResult.NotConsumed;
+ }
+ if (event.eventType === 'run_created') {
+ return EventConsumerResult.Consumed;
+ }
+ if (event.eventType === 'run_started') {
+ return EventConsumerResult.Consumed;
+ }
+ return EventConsumerResult.NotConsumed;
+});
+```
+
+### Orphan Event Protection
+
+What happens if an event in the log doesn't match any registered callback? This indicates either a corrupted log or a code change that removed a step the log still contains events for. The `EventsConsumer` handles this with a deferred orphan check:
+
+```ts
+// From packages/core/src/events-consumer.ts
+if (currentEvent !== null) {
+ const checkVersion = ++this.unconsumedCheckVersion;
+ this.pendingUnconsumedCheck = this.getPromiseQueue().then(() => {
+ this.pendingUnconsumedTimeout = setTimeout(() => {
+ if (this.unconsumedCheckVersion === checkVersion) {
+ this.onUnconsumedEvent(currentEvent);
+ }
+ }, 100);
+ });
+}
+```
+
+The check is deliberately deferred in two stages. First, it chains onto the promise queue — this ensures that any pending async deserialization (which might trigger downstream `subscribe()` calls) has time to complete. Then it waits 100ms via `setTimeout`, giving cross-VM-boundary microtasks time to settle. If a new `subscribe()` call arrives during this window, it increments `unconsumedCheckVersion`, which cancels the check. Only a truly orphaned event — one that no callback claims after all async work is done — triggers the `onUnconsumedEvent` handler, which flags the log as corrupted.
+
+## The Cold Start Sequence
+
+Here's the complete reconstruction sequence when a workflow resumes. This happens after a cold start, a step completion, a webhook delivery, or a sleep elapsing:
+
+```mermaid
+flowchart TD
+ A["Workflow handler invoked via queue message"] --> B["Load all events for the run, sorted by eventId (ascending)"]
+ B --> C["Check for elapsed waits — create wait_completed events"]
+ C --> D["Create VM sandbox with seeded RNG from runId + workflowName + startedAt"]
+ D --> E["Initialize EventsConsumer with full event array"]
+ E --> F["Register timestamp subscriber and run lifecycle consumer"]
+ F --> G["Execute workflow function from the beginning"]
+ G --> H["Each step call subscribes to EventsConsumer"]
+ H --> I{"Matching step_completed in log?"}
+ I -->|"Yes"| J["Return cached result, advance cursor and timestamp"]
+ J --> K["Workflow continues to next step"]
+ K --> H
+ I -->|"No — end of log"| L["WorkflowSuspension — new steps queued"]
+ L --> M["Handler returns, nothing in memory"]
+```
+
+The key code in `packages/core/src/runtime.ts` loads and pre-processes events before handing them to the workflow:
+
+```ts
+// From packages/core/src/runtime.ts
+// Load all events into memory before running
+const events = await getAllWorkflowRunEvents(workflowRun.runId);
+
+// Check for any elapsed waits and create wait_completed events
+const now = Date.now();
+
+const completedWaitIds = new Set(
+ events
+ .filter((e) => e.eventType === 'wait_completed')
+ .map((e) => e.correlationId)
+);
+
+const waitsToComplete = events
+ .filter(
+ (e) =>
+ e.eventType === 'wait_created' &&
+ e.correlationId !== undefined &&
+ !completedWaitIds.has(e.correlationId) &&
+ now >= (e.eventData.resumeAt).getTime()
+ )
+ .map((e) => ({
+ eventType: 'wait_completed' as const,
+ specVersion: SPEC_VERSION_CURRENT,
+ correlationId: e.correlationId,
+ }));
+```
+
+Then the VM context is created with the same deterministic seed:
+
+```ts
+// From packages/core/src/workflow.ts
+const {
+ context,
+ globalThis: vmGlobalThis,
+ updateTimestamp,
+} = createContext({
+ seed: `${workflowRun.runId}:${workflowRun.workflowName}:${+startedAt}`,
+ fixedTimestamp: +startedAt,
+});
+```
+
+Same run ID, same workflow name, same start timestamp — same seed. Same seed — same `Math.random()` sequence, same initial `Date.now()`. The **Workflow bundle** re-executes identically, and every step call returns its cached result from the **Event log**. Step input is hydrated from the persisted `step_created` event, not from the **Queue message**.
+
+The `getAllWorkflowRunEvents` function in `packages/core/src/runtime/helpers.ts` paginates through all events, always in ascending sort order:
+
+```ts
+// From packages/core/src/runtime/helpers.ts
+const response = await world.events.list({
+ runId,
+ pagination: {
+ sortOrder: 'asc', // Required: events must be in chronological order for replay
+ cursor: cursor ?? undefined,
+ },
+});
+```
+
+## Why Not Snapshots?
+
+The event-sourced replay model is counterintuitive at first. Why replay from the beginning when you could just checkpoint the current state? The answer is that snapshot-based approaches introduce three problems that event sourcing eliminates:
+
+**1. No serialization format to version.** Snapshot systems must serialize the workflow's in-flight state — local variables, call stack, pending promises — into a storable format. When you deploy new code that changes a variable name, adds a field, or restructures a function, existing snapshots become invalid. You need migration logic, versioned serialization, or both.
+
+With event sourcing, the log contains step results, not heap state. New code replays against the same events and reconstructs the same derived state. There's no snapshot format to migrate.
+
+**2. Complete auditability.** Every state transition is an immutable event with a timestamp, correlation ID, and typed payload. Debugging a failed workflow means reading a flat list of events — not reconstructing an opaque binary snapshot. You can see exactly what happened, in what order, and what each step returned.
+
+**3. Exactly-once step execution.** Terminal state enforcement at the world layer guarantees that concurrent step completions resolve to a single winner. If two queue workers try to complete the same step simultaneously, one succeeds and one receives an `EntityConflictError`. The event log never contains duplicate completions.
+
+Snapshot systems don't get this for free. They need external locking, distributed consensus, or idempotency keys bolted on after the fact.
+
+## Replay Performance
+
+A natural concern with replay-from-the-beginning is performance. If a workflow has 200 completed steps, does replay take 200× longer than a single step?
+
+No. Replay re-executes only the orchestration logic — the `"use workflow"` function — which is lightweight branching and `await` calls. Each step call hits the `EventsConsumer`, finds the cached `step_completed` result, deserializes it, and returns. No network calls, no database queries, no external service interactions.
+
+The workflow handler is invoked once per new step completion. A workflow with 200 steps invokes the handler roughly 200 times over its lifetime. But each invocation replays all previously completed steps from cache in milliseconds — the time is dominated by deserialization, not computation. The total replay cost is proportional to the number of events times the deserialization cost per event, which is typically sub-millisecond for each.
+
+## The Lifecycle in Full
+
+Putting it all together, here's the complete lifecycle of a workflow run from creation to completion:
+
+```mermaid
+flowchart TD
+ A["run_created — Run entity created in pending state"] --> B["run_started — Run transitions to running"]
+ B --> C["Workflow code executes in sandboxed VM"]
+ C --> D{"Step encountered"}
+ D -->|"No cached result"| E["step_created — Step entity created in pending state"]
+ E --> F["WorkflowSuspension thrown, step queued"]
+ F --> G["Step executes out-of-band with full Node.js access"]
+ G --> H["step_completed — Result persisted to event log"]
+ H --> I["Workflow re-enqueued for replay"]
+ I --> C
+ D -->|"Cached result found"| J["EventsConsumer returns cached step_completed result"]
+ J --> K{"More steps?"}
+ K -->|"Yes"| D
+ K -->|"No"| L["run_completed — Run reaches terminal state"]
+
+ C --> M{"Hook encountered"}
+ M --> N["hook_created — Hook entity created"]
+ N --> O["Workflow suspends, waits for external delivery"]
+ O --> P["hook_received — Payload persisted"]
+ P --> I
+
+ C --> Q{"Sleep encountered"}
+ Q --> R["wait_created — Wait entity created with resumeAt"]
+ R --> S["Workflow suspends, delayed re-enqueue scheduled"]
+ S --> T["wait_completed — Delay elapsed"]
+ T --> I
+```
+
+Every arrow in this diagram corresponds to an immutable event in the log. The workflow's state at any point in time is the projection of all events up to that point. There's no separate state store, no mutable row, no snapshot — the event log is the single source of truth.
+
+## What This Enables
+
+The durability model is the foundation that makes the rest of the Workflow DevKit architecture possible:
+
+- **Cost efficiency** — Between steps, nothing is in memory. The **Event log** is the complete state. A **Queue message** is a trigger, not durable state. A workflow waiting 3 days for a webhook costs zero compute during those 3 days.
+- **Durable streaming** — Stream state is persisted alongside the **Event log**, surviving restarts and replays.
+- **Deployment safety** — New code deploys don't invalidate in-flight workflows. The **Event log** doesn't change; the new **Workflow bundle** replays against the same events.
+- **Debuggability** — A failed workflow's **Event log** is a complete, ordered record of everything that happened. No guesswork.
+
+The **Event log** is the state. Everything else is derived.
diff --git a/docs/content/deep-dives/durability-replay-reference.mdx b/docs/content/deep-dives/durability-replay-reference.mdx
new file mode 100644
index 0000000000..b85017ff4c
--- /dev/null
+++ b/docs/content/deep-dives/durability-replay-reference.mdx
@@ -0,0 +1,220 @@
+---
+title: Durability & Replay
+description: How Workflow DevKit uses event sourcing with ULID-ordered logs and an EventsConsumer to deterministically reconstruct workflow state after cold starts.
+type: conceptual
+summary: A technical reference covering the event-sourced persistence model — entity lifecycles, correlation IDs, terminal states, ULID-based ordering, the EventsConsumer callback system, orphan event detection, and how the runtime reconstructs workflow state from a persisted event log.
+prerequisites:
+ - /docs/foundations/workflows-and-steps
+related:
+ - /docs/how-it-works/event-sourcing
+ - /deep-dives/step-execution-model-reference
+ - /deep-dives/durable-streaming-reference
+ - /deep-dives/cost-model-fluid-compute-reference
+---
+
+
+Durability in the Workflow DevKit means that no workflow progress is ever lost. Every step result, hook delivery, and sleep completion is persisted as an immutable event. When a workflow resumes — after a cold start, a deploy, or a scale-from-zero event — the runtime reads the event log, recreates the VM context, and replays the workflow code. Cached results are returned instantly from the log, so the workflow arrives at exactly the same point it left off without re-executing any side effects.
+
+
+## Overview
+
+The Workflow DevKit's durability model is built on three pillars:
+
+1. **Event sourcing** — All state mutations are stored as an append-only sequence of typed events in the **Event log**. Entity state (runs, steps, hooks, waits) is derived from events, never stored independently of them.
+2. **ULID-based ordering** — Event IDs use ULIDs (Universally Unique Lexicographically Sortable Identifiers), embedding a millisecond timestamp in the first 48 bits. Sorting by event ID produces chronological order without a separate sequence number.
+3. **Deterministic replay** — The `EventsConsumer` feeds persisted events to registered callbacks in order. Each callback either consumes the event (advancing the cursor) or passes it to the next callback. The **Workflow bundle** re-executes orchestration code in the VM, but every step call returns its cached result from the **Event log** instead of re-executing. A **Queue message** is a trigger, not durable state — the **Event log** is the source of truth for replay.
+
+## Lifecycle
+
+```mermaid
+flowchart TD
+ A["run_created — Run entity created in pending state"] --> B["run_started — Run transitions to running"]
+ B --> C["Workflow code executes in sandboxed VM"]
+ C --> D{"Step encountered"}
+ D -->|"No cached result"| E["step_created — Step entity created in pending state"]
+ E --> F["WorkflowSuspension thrown, step queued"]
+ F --> G["Step executes out-of-band"]
+ G --> H["step_completed — Result persisted to event log"]
+ H --> I["Workflow re-enqueued for replay"]
+ I --> C
+ D -->|"Cached result found"| J["EventsConsumer returns cached step_completed result"]
+ J --> K{"More steps?"}
+ K -->|"Yes"| D
+ K -->|"No"| L["run_completed — Run reaches terminal state"]
+
+ C --> M{"Hook encountered"}
+ M --> N["hook_created — Hook entity created"]
+ N --> O["Workflow suspends, waits for external delivery"]
+ O --> P["hook_received — Payload persisted"]
+ P --> I
+
+ C --> Q{"Sleep encountered"}
+ Q --> R["wait_created — Wait entity created with resumeAt"]
+ R --> S["Workflow suspends, delayed re-enqueue scheduled"]
+ S --> T["wait_completed — Delay elapsed"]
+ T --> I
+```
+
+## Code Walkthrough
+
+### Event types and entity lifecycles
+
+The **Event log** is a flat, append-only sequence of typed events defined in `packages/world/src/events.ts`. Each event belongs to one of four entity categories — run, step, hook, or wait — and transitions the entity through a defined state machine:
+
+| Entity | Events | Terminal States |
+|--------|--------|-----------------|
+| Run | `run_created` → `run_started` → `run_completed` / `run_failed` / `run_cancelled` | completed, failed, cancelled |
+| Step | `step_created` → `step_started` → `step_completed` / `step_failed` (with optional `step_retrying` loops) | completed, failed |
+| Hook | `hook_created` → `hook_received`* → `hook_disposed` (or `hook_conflict` on token collision) | disposed, conflicted |
+| Wait | `wait_created` → `wait_completed` | completed |
+
+Events within a run share the `runId`. Events within a step, hook, or wait share a `correlationId` — a prefixed ULID that links all events for that entity:
+
+```ts title="packages/world/src/events.ts (base schema)" lineNumbers
+export const BaseEventSchema = z.object({
+ eventType: EventTypeSchema,
+ correlationId: z.string().optional(),
+ specVersion: z.number().optional(),
+});
+
+// Server response adds run-level and temporal fields
+export const EventSchema = AllEventsSchema.and(
+ z.object({
+ runId: z.string(),
+ eventId: z.string(),
+ createdAt: z.coerce.date(),
+ specVersion: z.number().optional(),
+ })
+);
+```
+
+
+Terminal states are enforced by the world backend. Attempting to create an event that would transition an entity out of a terminal state (e.g., `step_completed` on an already-completed step) results in an `EntityConflictError`. This guarantees that every step completes exactly once, even under concurrent execution.
+
+
+### EventsConsumer: replay engine
+
+The `EventsConsumer` class in `packages/core/src/events-consumer.ts` is the core replay mechanism. It holds the full event array for a run and a cursor (`eventIndex`) that advances as events are consumed by registered callbacks:
+
+```ts title="packages/core/src/events-consumer.ts" lineNumbers
+export class EventsConsumer {
+ eventIndex: number;
+ readonly events: Event[] = [];
+ readonly callbacks: EventConsumerCallback[] = [];
+
+ private consume = () => {
+ const currentEvent = this.events[this.eventIndex] ?? null;
+ for (let i = 0; i < this.callbacks.length; i++) {
+ const callback = this.callbacks[i];
+ let handled = EventConsumerResult.NotConsumed;
+ try {
+ handled = callback(currentEvent);
+ } catch (error) {
+ eventsLogger.error('EventConsumer callback threw an error', { error });
+ }
+ if (
+ handled === EventConsumerResult.Consumed ||
+ handled === EventConsumerResult.Finished
+ ) {
+ this.eventIndex++;
+ if (handled === EventConsumerResult.Finished) {
+ this.callbacks.splice(i, 1);
+ }
+ process.nextTick(this.consume);
+ return;
+ }
+ }
+
+ // If no callback consumed a real event, schedule orphan detection
+ if (currentEvent !== null) {
+ const checkVersion = ++this.unconsumedCheckVersion;
+ this.pendingUnconsumedCheck = this.getPromiseQueue().then(() => {
+ this.pendingUnconsumedTimeout = setTimeout(() => {
+ if (this.unconsumedCheckVersion === checkVersion) {
+ this.onUnconsumedEvent(currentEvent);
+ }
+ }, 100);
+ });
+ }
+ };
+}
+```
+
+**How replay works step by step:**
+
+1. The runtime loads the event array for the run from the world backend.
+2. `EventsConsumer` is initialized with the full array and `eventIndex = 0`.
+3. Callbacks are registered via `subscribe()` — one passive subscriber for timestamp advancement, one for run lifecycle events (`run_created`, `run_started`), and one per step/hook/wait as the workflow code re-executes.
+4. Each call to `consume()` tries the current event against all callbacks. The first callback that returns `Consumed` or `Finished` advances the cursor and schedules the next `consume()` via `process.nextTick`.
+5. When the cursor reaches past all persisted events, step callbacks receive `null` (end of log). A step callback seeing `null` knows the step hasn't executed yet and triggers suspension.
+
+**Orphan event protection:** If a non-null event passes through all callbacks without being consumed, the consumer defers an orphan check. It chains onto the promise queue (to let pending async deserialization complete) and then waits 100 ms. If no new `subscribe()` call cancels the check by incrementing `unconsumedCheckVersion`, the event is flagged as orphaned — indicating a corrupted or invalid event log.
+
+### ULID ordering and timestamp validation
+
+All entity IDs in the Workflow DevKit are prefixed ULIDs (e.g., `wrun_01HXYZ...`, `step_01HXYZ...`, `evnt_01HXYZ...`). The `packages/world/src/ulid.ts` module provides utilities for extracting and validating the embedded timestamps:
+
+```ts title="packages/world/src/ulid.ts" lineNumbers
+export const DEFAULT_TIMESTAMP_THRESHOLD_MS = 5 * 60 * 1000;
+
+export function validateUlidTimestamp(
+ prefixedUlid: string,
+ prefix: string,
+ thresholdMs: number = DEFAULT_TIMESTAMP_THRESHOLD_MS
+): string | null {
+ const raw = prefixedUlid.startsWith(prefix)
+ ? prefixedUlid.slice(prefix.length)
+ : prefixedUlid;
+
+ const ulidTimestamp = ulidToDate(raw);
+ if (!ulidTimestamp) {
+ return `Invalid runId: "${prefixedUlid}" is not a valid ULID`;
+ }
+
+ const serverTimestamp = new Date();
+ const driftMs = Math.abs(
+ serverTimestamp.getTime() - ulidTimestamp.getTime()
+ );
+
+ if (driftMs <= thresholdMs) {
+ return null;
+ }
+
+ const driftSeconds = Math.round(driftMs / 1000);
+ const thresholdSeconds = Math.round(thresholdMs / 1000);
+ return `Invalid runId timestamp: embedded timestamp differs from server time by ${driftSeconds}s (threshold: ${thresholdSeconds}s)`;
+}
+```
+
+ULIDs provide two guarantees that the event log depends on:
+
+- **Chronological sortability** — Sorting events by their ULID-based `eventId` produces the correct temporal order. No separate sequence column is needed.
+- **Timestamp validation** — `validateUlidTimestamp` rejects client-generated IDs whose embedded timestamp drifts more than 5 minutes from server time. This prevents clock-skew attacks where a malicious client could forge IDs that sort before or after legitimate events.
+
+Inside the workflow VM, ULIDs are generated using the seeded RNG for the random component, ensuring that IDs created during replay match those created during the original execution:
+
+```ts title="packages/core/src/workflow.ts (ULID generation)" lineNumbers
+const ulid = monotonicFactory(() => vmGlobalThis.Math.random());
+```
+
+### Deterministic reconstruction after cold starts
+
+When a workflow is re-invoked (after suspension, a deploy, or a cold start), the reconstruction sequence is:
+
+1. **Event log loaded** — The world backend returns all events for the run, ordered by event ID.
+2. **VM context created** — `createContext()` builds a fresh sandbox with the same seed (derived from run ID, workflow name, and start timestamp). This produces the same seeded RNG and the same initial `fixedTimestamp`.
+3. **Timestamp subscriber registered** — A passive `EventsConsumer` callback updates the VM's `fixedTimestamp` from each event's `createdAt`. As events are consumed during replay, `Date.now()` inside the workflow advances to match the original execution timeline.
+4. **Workflow code re-executed** — The **Workflow bundle** runs from the beginning. Each step call hits the `WORKFLOW_USE_STEP` proxy, which consults the `EventsConsumer`. For completed steps, the consumer returns the cached `step_completed` result instantly. Step input is hydrated from the persisted `step_created` event, not from the **Queue message**. For steps not yet in the log, the workflow suspends.
+5. **Promise queue drained** — Async operations (deserialization, decryption of step results) are chained onto a shared promise queue. The `EventsConsumer` waits for the queue to drain before checking for orphan events, preventing false positives from async gaps.
+
+The result: workflow state is reconstructed by replaying code against the **Event log**. The workflow arrives at exactly the point where it last suspended, with all local variables holding the same values, all branching decisions following the same paths, and `Date.now()` reflecting the time of the most recent event — not wall-clock time.
+
+## Why This Matters
+
+The event-sourced replay model eliminates three classes of problems that plague durable workflow systems:
+
+1. **No state serialization format** — The workflow doesn't checkpoint its JavaScript heap. Instead, the **Event log** contains step results, and the deterministic VM reconstructs everything else. Step bodies are excluded from the **Workflow bundle** because replay must never re-run side effects. This means there's no versioning problem when workflow code changes between deploys — new code replays against the same events.
+
+2. **Complete auditability** — Every state transition is an immutable event with a timestamp, correlation ID, and typed payload. Debugging a failed workflow means reading a flat list of events, not reconstructing opaque snapshots.
+
+3. **Exactly-once step execution** — Terminal state enforcement at the world layer guarantees that concurrent step completions resolve to a single winner. The `EntityConflictError` mechanism means a step can never complete twice, even if duplicate messages arrive from the queue.
diff --git a/docs/content/deep-dives/durability-replay-social.mdx b/docs/content/deep-dives/durability-replay-social.mdx
new file mode 100644
index 0000000000..3b666ca525
--- /dev/null
+++ b/docs/content/deep-dives/durability-replay-social.mdx
@@ -0,0 +1,35 @@
+---
+title: "The Event Log Is the State. Replay Just Recomputes the Stack Frame."
+description: A concise explainer on how Workflow DevKit uses an append-only event log and deterministic replay to reconstruct workflow state after crashes — no heap snapshots, no serialized checkpoints.
+type: conceptual
+summary: Workflow DevKit doesn't checkpoint your workflow's memory. It persists every step result as an immutable event in an append-only log. When the workflow resumes, the runtime replays the log — the event log is the complete state, and replay just recomputes the call stack to arrive at the right suspension point.
+prerequisites:
+ - /docs/foundations/workflows-and-steps
+related:
+ - /docs/how-it-works/event-sourcing
+ - /deep-dives/step-execution-model-reference
+ - /deep-dives/durable-streaming-reference
+ - /deep-dives/cost-model-fluid-compute-reference
+---
+
+Your workflow charged a credit card, then the process died. How does it pick up without charging again?
+
+Most durable execution systems snapshot the heap — serialize local variables, the call stack, pending promises — and persist that blob. When the process restarts, they deserialize the snapshot and resume. This works until you deploy new code and the snapshot format doesn't match. Now you need migration logic, versioned serialization, or a painful manual recovery.
+
+Workflow DevKit skips snapshots entirely. The **Event log** *is* the state.
+
+## No Snapshots, Just Events
+
+Every transition — `step_completed`, `hook_received`, `wait_completed` — is an immutable event appended to the **Event log**. Events are ordered by ULID-based IDs that embed millisecond timestamps, so sorting by event ID produces chronological order without a separate sequence number. The `validateUlidTimestamp` function rejects any client-generated ID that drifts more than 5 minutes from server time, preventing clock-skew attacks on the log's ordering.
+
+When the workflow resumes after a crash, deploy, or cold start, the runtime doesn't try to deserialize a heap. It loads the **Event log**, creates a fresh VM sandbox with the same deterministic seed — same `Math.random()`, same `Date.now()`, same `crypto.randomUUID()` — and re-executes the **Workflow bundle** from the beginning.
+
+The `EventsConsumer` feeds events to the workflow in order. Each step call checks: is there a `step_completed` for this step? If yes, return the cached result instantly. If no, suspend and queue the step for background execution.
+
+The workflow re-runs its orchestration logic — the `"use workflow"` function — but it never re-runs a step. Step bodies are excluded from the **Workflow bundle** because replay must never re-run side effects. Local variables, branching decisions, and promise chains are all recomputed from the **Event log**. Workflow state is reconstructed by replaying code against the **Event log**. The stack frame is ephemeral; the log is permanent.
+
+## Why It Matters
+
+There's no serialization format to version across deploys. There's no opaque binary snapshot to debug when things go wrong. Every state transition is an auditable, immutable record with a timestamp and typed payload. And terminal state enforcement at the storage layer guarantees exactly-once step execution — even if duplicate messages arrive from the queue.
+
+The **Event log** is the state. Replay just recomputes the stack frame. That's the entire durability model.
diff --git a/docs/content/deep-dives/durable-streaming-blog.mdx b/docs/content/deep-dives/durable-streaming-blog.mdx
new file mode 100644
index 0000000000..a260d427e7
--- /dev/null
+++ b/docs/content/deep-dives/durable-streaming-blog.mdx
@@ -0,0 +1,364 @@
+---
+title: "Streaming Data That Survives a Crash — How Workflow DevKit Makes Streams Durable"
+description: A developer-facing deep dive into Workflow DevKit's durable streaming architecture — how streams survive function suspensions and crashes through world-backed persistence, namespaced IDs, buffered flushes, and lock-aware completion semantics.
+type: conceptual
+summary: Workflow DevKit persists streaming data through world-backed storage so that streams outlive the functions that create them. Chunks are batched in a 10 ms window and flushed to the configured backend — filesystem locally, Vercel's workflow-server over HTTP in hosted deployments, or Postgres when using the `world-postgres` backend. A lock-polling mechanism detects when the writer is done without requiring explicit stream closure, and namespaced stream IDs allow a single run to own multiple independent streams.
+prerequisites:
+ - /docs/foundations/workflows-and-steps
+related:
+ - /docs/how-it-works/understanding-directives
+ - /deep-dives/step-execution-model-reference
+ - /deep-dives/durability-replay-reference
+ - /deep-dives/cost-model-fluid-compute-reference
+---
+
+You're building an AI-powered feature. Your step function calls an LLM, and tokens stream back one at a time. The user sees them appear in the UI as they arrive. Then the serverless function hits its timeout, or the instance scales down, or a deploy rolls through.
+
+What happens to those tokens?
+
+In most frameworks: they're gone. The in-memory stream buffer evaporates with the process. The UI shows a spinner forever, or an error, or — worst case — partial results with no indication that they're incomplete.
+
+Workflow DevKit's durable streaming solves this. Every chunk is persisted to a world backend as it arrives. Readers can reconnect from any index. And the stream outlives the function that created it.
+
+This article walks through the full streaming pipeline — from the `getWritable()` call in your code, through serialization, buffered flushes, and lock-aware completion, to the persistence backends that store chunks for local development and production.
+
+## Two Sides of getWritable()
+
+The same `getWritable()` function call does fundamentally different things depending on where it's called.
+
+### The Workflow Side: A Serializable Handle
+
+Inside a `"use workflow"` function, `getWritable()` returns a lightweight handle — not a real writable stream. It's an object that carries a `STREAM_NAME_SYMBOL` property with the stream's deterministic ID, but sets up no I/O pipeline:
+
+```ts
+// From packages/core/src/workflow/writable-stream.ts
+export function getWritable(
+ options: WorkflowWritableStreamOptions = {}
+): WritableStream {
+ const { namespace } = options;
+ const name = (globalThis as any)[WORKFLOW_GET_STREAM_ID](namespace);
+ return Object.create(globalThis.WritableStream.prototype, {
+ [STREAM_NAME_SYMBOL]: {
+ value: name,
+ writable: false,
+ },
+ });
+}
+```
+
+This handle looks like a `WritableStream` to TypeScript, but it's really a token. The framework's serialization layer recognizes the `STREAM_NAME_SYMBOL` and reconstitutes a real writable when the object crosses the workflow → step boundary.
+
+Why not set up real I/O in the workflow? Because workflow functions run in a sandboxed VM without network access. They orchestrate — they don't do I/O. The handle is a promise of a stream, not the stream itself.
+
+
+Real stream I/O cannot happen inside workflow functions. The `getWritable()` call in a `"use workflow"` context returns a serializable handle, not a functioning stream. The full I/O pipeline — serialization, buffered flushes, world-backend persistence — is set up only when the handle reaches a step function. This restriction exists because workflow code must be deterministic and replayable; streaming is a side effect that belongs in step functions.
+
+
+### The Step Side: The Full Pipeline
+
+Inside a `"use step"` function (or any step-level runtime context), `getWritable()` sets up the complete I/O pipeline:
+
+```ts
+// From packages/core/src/step/writable-stream.ts
+export function getWritable(
+ options: WorkflowWritableStreamOptions = {}
+): WritableStream {
+ const ctx = contextStorage.getStore();
+ if (!ctx) {
+ throw new Error(
+ '`getWritable()` can only be called inside a workflow or step function'
+ );
+ }
+
+ const { namespace } = options;
+ const runId = ctx.workflowMetadata.workflowRunId;
+ const name = getWorkflowRunStreamId(runId, namespace);
+
+ const serialize = getSerializeStream(
+ getExternalReducers(globalThis, ctx.ops, runId, ctx.encryptionKey),
+ ctx.encryptionKey
+ );
+
+ const serverWritable = new WorkflowServerWritableStream(name, runId);
+ const state = createFlushableState();
+ ctx.ops.push(state.promise);
+
+ flushablePipe(serialize.readable, serverWritable, state).catch(() => {});
+ pollWritableLock(serialize.writable, state);
+
+ return serialize.writable;
+}
+```
+
+Five things happen in sequence:
+
+1. **Stream ID generated** — `getWorkflowRunStreamId` builds a deterministic ID from the run ID and optional namespace
+2. **Serialization transform created** — a `TransformStream` that handles serialization (and optional encryption) of chunks
+3. **Server writable created** — `WorkflowServerWritableStream` is the bridge to the world backend
+4. **Flush pipe connected** — `flushablePipe` reads from the transform and writes to the server writable
+5. **Lock polling started** — `pollWritableLock` watches for the user to release their writer lock
+
+The `state.promise` is pushed onto `ctx.ops`, which ties stream completion to step completion. The step can't finish until the stream is done.
+
+## Namespaced Stream IDs
+
+Every stream needs a unique, deterministic identifier. `getWorkflowRunStreamId` in `packages/core/src/util.ts` builds it:
+
+```ts
+// From packages/core/src/util.ts
+export function getWorkflowRunStreamId(runId: string, namespace?: string) {
+ const streamId = `${runId.replace('wrun_', 'strm_')}_user`;
+ if (!namespace) {
+ return streamId;
+ }
+ const encodedNamespace = Buffer.from(namespace, 'utf-8').toString(
+ 'base64url'
+ );
+ return `${streamId}_${encodedNamespace}`;
+}
+```
+
+The format is `strm_{ULID}_user_{base64url(namespace)?}`. When you call `getWritable({ namespace: 'progress' })`, the namespace is base64url-encoded and appended. This lets a single workflow run own multiple independent streams — one for LLM tokens, one for progress events, one for structured logs — without collision.
+
+The `_user` segment distinguishes user-created streams from any internal framework streams. It's a namespace for namespaces.
+
+On the reader side, the same namespace flows through `run.getReadable({ namespace })` — which calls `getWorkflowRunStreamId` with the run ID and namespace to reconstruct the same deterministic stream ID. Readers and writers agree on the stream identity without any external coordination.
+
+## Buffered Writes: The 10 ms Window
+
+Individual chunk writes to an HTTP backend would be catastrophically slow. If you're streaming LLM tokens, you might get 50-100 chunks per second. One HTTP request per chunk would saturate connections and add seconds of latency.
+
+`WorkflowServerWritableStream` solves this with a buffered write strategy:
+
+```ts
+// From packages/core/src/serialization.ts (simplified)
+const STREAM_FLUSH_INTERVAL_MS = 10;
+
+let buffer: Uint8Array[] = [];
+
+const flush = async (): Promise => {
+ if (buffer.length === 0) return;
+ const chunksToFlush = buffer.slice();
+
+ if (typeof world.writeToStreamMulti === 'function' && chunksToFlush.length > 1) {
+ await world.writeToStreamMulti(name, runId, chunksToFlush);
+ } else {
+ for (const chunk of chunksToFlush) {
+ await world.writeToStream(name, runId, chunk);
+ }
+ }
+ buffer = [];
+};
+```
+
+Chunks accumulate in a buffer. Every 10 milliseconds, a flush timer fires and sends the entire batch in a single `writeToStreamMulti` call. Each `write()` caller awaits the flush result, so backpressure propagates naturally — if the backend is slow, the buffer grows but callers block.
+
+The result: 100 chunks/second becomes roughly 10 batched network calls/second. The backend handles far less traffic, and the writer barely notices the batching.
+
+## The Lock Polling Problem
+
+Here's a subtle problem with the Web Streams API. When you get a writer from a `WritableStream`, the stream is "locked" — other code can't write to it. When you release the writer (by calling `releaseLock()`), the stream becomes unlocked. But there's no event for "the lock was released."
+
+This matters because a step function might look like this:
+
+```ts
+export async function generateResponse(prompt: string) {
+ "use step";
+ const stream = getWritable({ namespace: 'ai-tokens' });
+ const writer = stream.getWriter();
+ for await (const token of llm.stream(prompt)) {
+ await writer.write(token);
+ }
+ writer.releaseLock();
+ return { status: 'done' };
+}
+```
+
+The developer releases the lock and returns. But how does the framework know the stream is "done"? The stream itself isn't closed — `releaseLock()` doesn't close it. And `.pipeTo()` only resolves when the stream closes, so using that would hang the serverless function until timeout.
+
+`flushable-stream.ts` solves this with a 100 ms polling loop:
+
+```ts
+// From packages/core/src/flushable-stream.ts (simplified)
+export const LOCK_POLL_INTERVAL_MS = 100;
+
+export function pollWritableLock(
+ writable: WritableStream,
+ state: FlushableStreamState
+): void {
+ const intervalId = setInterval(() => {
+ if (state.doneResolved || state.streamEnded) {
+ clearInterval(intervalId);
+ return;
+ }
+ if (isWritableUnlockedNotClosed(writable) && state.pendingOps === 0) {
+ state.doneResolved = true;
+ state.resolve();
+ clearInterval(intervalId);
+ }
+ }, LOCK_POLL_INTERVAL_MS);
+}
+```
+
+Every 100 ms, the poller checks: is the writable unlocked? Are there zero pending flush operations? If both conditions are true, the state promise resolves and the step can complete. The framework handles the stream lifecycle so the developer doesn't have to call `.close()` explicitly.
+
+## The Pump: flushablePipe
+
+`flushablePipe` is the read-write loop that connects the serialization transform to the server writable. It tracks `pendingOps` to coordinate with the lock poller:
+
+```ts
+// From packages/core/src/flushable-stream.ts (excerpt)
+export async function flushablePipe(
+ source: ReadableStream,
+ sink: WritableStream,
+ state: FlushableStreamState
+): Promise {
+ const reader = source.getReader();
+ const writer = sink.getWriter();
+
+ try {
+ while (true) {
+ const readResult = await reader.read();
+ if (readResult.done) {
+ state.streamEnded = true;
+ await writer.close();
+ if (!state.doneResolved) {
+ state.doneResolved = true;
+ state.resolve();
+ }
+ return;
+ }
+ state.pendingOps++;
+ try {
+ await writer.write(readResult.value);
+ } finally {
+ state.pendingOps--;
+ }
+ }
+ } catch (err) {
+ state.streamEnded = true;
+ if (!state.doneResolved) {
+ state.doneResolved = true;
+ state.reject(err);
+ }
+ throw err;
+ } finally {
+ reader.releaseLock();
+ writer.releaseLock();
+ }
+}
+```
+
+There are two paths to completion:
+
+1. **Stream close** — the readable side ends (`readResult.done`), the sink is closed, and the state promise resolves
+2. **Lock release** — the user releases their writer lock, the poller detects it after pending ops drain, and the state promise resolves
+
+Both paths lead to the same result: the step's `ctx.ops` promise resolves, and the step can return its result. This dual-path design means developers can either explicitly close their stream or simply release the lock — the framework handles both correctly.
+
+## Persistence: Backend-Agnostic Storage
+
+The streaming pipeline is backend-agnostic. The `Streamer` interface in `@workflow/world` defines the contract; each world implementation stores chunks differently.
+
+### Local Development (world-local)
+
+The local backend in `packages/world-local/src/streamer.ts` persists each chunk as a binary file:
+
+- **Path format:** `streams/chunks/{streamName}-chnk_{ULID}.bin`
+- **Chunk format:** 1 byte EOF flag + payload bytes
+- **Ordering:** Monotonic ULID generation ensures lexicographic sort matches chronological order
+- **Multi-chunk batching:** `writeToStreamMulti` generates all ULIDs synchronously before any async I/O, preserving call order
+- **EOF:** `closeStream` writes a zero-payload chunk with the EOF byte set to `1`
+
+The reader uses Node.js `EventEmitter` for real-time updates — when new chunk files appear, the reader is notified immediately. It also reconciles disk state with buffered events to avoid duplicates.
+
+### Vercel Platform (world-vercel)
+
+The Vercel backend in `packages/world-vercel/src/streamer.ts` delegates to the Vercel workflow-server over HTTP:
+
+- **Write:** `PUT /v2/runs/{runId}/stream/{name}` with the chunk body
+- **Multi-chunk write:** Same endpoint with `X-Stream-Multi: true` header and length-prefixed binary encoding
+- **Close:** `PUT` with `X-Stream-Done: true` header
+- **Read:** `GET /v2/stream/{name}` returns a streaming response body
+- **Pagination:** `GET /v2/runs/{runId}/streams/{name}/chunks` supports `limit` and `cursor`
+
+The Vercel backend doesn't use an in-process event emitter — the workflow-server handles chunk storage and real-time delivery to readers over HTTP streaming responses.
+
+### Postgres (world-postgres)
+
+The Postgres backend in `packages/world-postgres/src/streamer.ts` stores chunks as rows in a `streams` table via Drizzle ORM. Each chunk gets a monotonic ULID-based `chunkId` for ordering, and real-time delivery uses PostgreSQL `LISTEN`/`NOTIFY` on a `workflow_event_chunk` channel. `writeToStreamMulti` batch-inserts all chunks in a single query with pre-generated IDs to preserve ordering.
+
+## The Complete Picture
+
+```mermaid
+flowchart TD
+ A["getWritable({ namespace: 'progress' })"] --> B["Stream ID: strm_{ULID}_user_{base64('progress')}"]
+ B --> C["Serialization TransformStream created"]
+ C --> D["flushablePipe: readable → WorkflowServerWritableStream"]
+ D --> E["Chunks buffered, flushed every 10 ms"]
+ E --> F{"World backend"}
+ F -->|"Local"| G["ULID-named .bin files on disk"]
+ F -->|"Vercel"| H["HTTP PUT to workflow-server"]
+ F -->|"Postgres"| P["Rows via Drizzle + LISTEN/NOTIFY"]
+ G --> I["Reader with EventEmitter + disk reconciliation"]
+ H --> J["HTTP streaming response to reader"]
+ P --> Q["Reader via pg NOTIFY events"]
+ D --> K["pollWritableLock checks every 100 ms"]
+ K --> L["Lock released + pending ops = 0 → step completes"]
+```
+
+## Before and After: Ephemeral vs. Durable Streams
+
+Consider an AI feature that streams LLM tokens to a UI.
+
+**Ephemeral streaming (traditional approach):**
+- Tokens exist only in the HTTP response body. If the function times out or the connection drops, the partial response is lost.
+- The client must detect the broken connection, decide whether to retry, and handle deduplication if the LLM generates different output on retry.
+- No persistence means no reconnection — a new reader has to start over or accept a gap.
+- In a serverless environment, you're fighting the execution time limit. Long generations either need to fit within the timeout or require a separate long-lived process.
+
+**Durable streaming (Workflow DevKit):**
+- Every chunk is persisted to the world backend as it arrives. If the function suspends, the chunks survive. When the workflow replays and the step re-creates the stream, previously persisted chunks are already available.
+- Readers can reconnect from any index using `startIndex`. A UI that reconnects after a network blip picks up right where it left off — no gap, no duplicate tokens.
+- The stream outlives any single function invocation. A step that times out doesn't lose its progress. The step can be retried, and new chunks append to the same stream.
+- Batched 10 ms flushes and `writeToStreamMulti` mean high-frequency writes don't generate one HTTP request per chunk. The overhead is minimal.
+
+The difference is most visible during failures. With ephemeral streaming, a function timeout means data loss. With durable streaming, it means a brief pause — the persisted chunks are still there, and the stream continues when the step resumes.
+
+## Writing Streaming Code
+
+Understanding the pipeline changes how you think about streaming in workflows:
+
+**Use namespaces for multiple streams per run:**
+```ts
+const tokens = getWritable({ namespace: 'tokens' });
+const progress = getWritable({ namespace: 'progress' });
+```
+
+**Release the lock when you're done writing — you don't need to close the stream:**
+```ts
+const writer = stream.getWriter();
+for (const item of items) {
+ await writer.write(item);
+}
+writer.releaseLock();
+// The framework detects the unlock and completes the stream
+```
+
+**Stream handles cross the workflow → step boundary automatically:**
+```ts
+export async function orchestrate() {
+ "use workflow";
+ const stream = getWritable({ namespace: 'results' });
+ // stream is a handle here — it becomes a real writable inside the step
+ await processItems(stream);
+}
+```
+
+The handle serializes across the boundary. Inside the step, it reconstitutes as a full writable with the pipeline attached.
+
+## Conclusion
+
+Durable streaming is what makes Workflow DevKit's real-time features work in a serverless environment. The pipeline — serialization transforms, buffered flushes, lock-aware completion, backend-agnostic persistence — turns the ephemeral nature of serverless functions into something that streams data reliably across suspensions, crashes, and scale events.
+
+Every chunk persisted. Every reader reconnectable. Streams that outlive the functions that write them.
diff --git a/docs/content/deep-dives/durable-streaming-reference.mdx b/docs/content/deep-dives/durable-streaming-reference.mdx
new file mode 100644
index 0000000000..dd2769dd5f
--- /dev/null
+++ b/docs/content/deep-dives/durable-streaming-reference.mdx
@@ -0,0 +1,293 @@
+---
+title: Durable Streaming
+description: How Workflow DevKit persists streaming data through world-backed storage with namespaced stream IDs, batched flushes, and lock-aware completion semantics.
+type: conceptual
+summary: A technical reference covering the full lifecycle of a durable stream — from handle creation in workflow code, through serialization and buffered writes in step functions, to chunk persistence in local and production backends, and reader reconnection.
+prerequisites:
+ - /docs/foundations/workflows-and-steps
+related:
+ - /docs/how-it-works/understanding-directives
+ - /deep-dives/step-execution-model-reference
+ - /deep-dives/durability-replay-reference
+ - /deep-dives/cost-model-fluid-compute-reference
+---
+
+
+Durable streaming lets step functions write data that outlives any single function invocation. Chunks are persisted to the world backend as they arrive, so external consumers can read the stream even if the producing step suspends, crashes, or scales to zero. This mechanism powers real-time UI updates, progress reporting, and any scenario where a workflow needs to emit incremental results.
+
+
+## Overview
+
+A durable stream has two distinct phases:
+
+1. **Workflow-side handle creation** — inside a `"use workflow"` function, `getWritable()` returns a serializable handle that carries the deterministic stream ID. It does **not** create a real writable or any I/O pipeline.
+2. **Step-side reconstitution and flush pipeline** — when that handle reaches a `"use step"` function — or when `getWritable()` is called directly in step code — the runtime creates the real `WritableStream`, attaches the serialization transform, connects `flushablePipe`, and starts `pollWritableLock`.
+3. **Persistence** — the configured world backend (`world-local`, `world-vercel`, or `world-postgres`) stores each chunk as an individually addressable record that readers can consume from any index.
+
+### Namespaced stream IDs
+
+Every stream is identified by a deterministic ID derived from the workflow run ID. The function `getWorkflowRunStreamId` in `packages/core/src/util.ts` builds it:
+
+```ts title="packages/core/src/util.ts" lineNumbers
+// Format: strm_{ULID}_user_{base64url(namespace)?}
+export function getWorkflowRunStreamId(runId: string, namespace?: string) {
+ const streamId = `${runId.replace('wrun_', 'strm_')}_user`;
+ if (!namespace) {
+ return streamId;
+ }
+ const encodedNamespace = Buffer.from(namespace, 'utf-8').toString(
+ 'base64url'
+ );
+ return `${streamId}_${encodedNamespace}`;
+}
+```
+
+When you call `getWritable({ namespace: 'progress' })`, the namespace is base64url-encoded and appended to the stream ID. This lets a single run own multiple independent streams without collision.
+
+## Lifecycle
+
+```mermaid
+flowchart TD
+ A{"Where is `getWritable()` called?"}
+ A -->|"Workflow"| B["Serializable handle created with stream ID only"]
+ B --> C["Handle crosses workflow → step boundary"]
+ A -->|"Step"| D["Real writable created for current run"]
+ C --> D
+ D --> E["Serialization TransformStream created"]
+ E --> F["flushablePipe connects readable → WorkflowServerWritableStream"]
+ F --> G["Chunks buffered, flushed every 10 ms via writeToStreamMulti"]
+ G --> H{"World backend"}
+ H -->|"world-local"| I["Chunk files written to streams/chunks/ as ULID-named .bin files"]
+ H -->|"world-vercel"| J["HTTP PUT to workflow-server /v2/runs/:runId/stream/:name"]
+ H -->|"world-postgres"| K["Rows via Drizzle ORM + LISTEN/NOTIFY"]
+ I --> L["Reader calls readFromStream with optional startIndex"]
+ J --> L
+ K --> L
+ L --> M["ReadableStream delivered to external consumer"]
+ F --> N["pollWritableLock detects lock release at 100 ms intervals"]
+ N --> O["State promise resolves → step completion unblocked"]
+```
+
+### Workflow-side handle creation
+
+In **workflow code** (inside a `"use workflow"` function), `getWritable()` creates a lightweight handle — an object that carries a `STREAM_NAME_SYMBOL` property but does **not** set up any I/O pipeline:
+
+```ts title="packages/core/src/workflow/writable-stream.ts" lineNumbers
+export function getWritable(
+ options: WorkflowWritableStreamOptions = {}
+): WritableStream {
+ const { namespace } = options;
+ const name = (globalThis as any)[WORKFLOW_GET_STREAM_ID](namespace);
+ return Object.create(globalThis.WritableStream.prototype, {
+ [STREAM_NAME_SYMBOL]: {
+ value: name,
+ writable: false,
+ },
+ });
+}
+```
+
+The `WORKFLOW_GET_STREAM_ID` symbol is injected into the sandboxed VM's `globalThis` by the workflow runtime (`packages/core/src/workflow.ts`). The returned object looks like a `WritableStream` but is really a serializable token — the framework's serialization layer recognizes `STREAM_NAME_SYMBOL` and reconstitutes a real writable on the step side.
+
+
+Streams cannot be accessed from workflow functions. The workflow-side `getWritable()` returns only a serializable handle — no I/O pipeline is created. Real stream persistence (serialization, buffered flushes, world-backend writes) happens exclusively in step functions, where full Node.js runtime access is available. This enforces the fundamental split: workflow functions orchestrate deterministically, step functions perform side effects.
+
+
+### Step-side stream setup
+
+In **step code** (inside a `"use step"` function or runtime context), `getWritable()` sets up the full I/O pipeline:
+
+```ts title="packages/core/src/step/writable-stream.ts" lineNumbers
+export function getWritable(
+ options: WorkflowWritableStreamOptions = {}
+): WritableStream {
+ const ctx = contextStorage.getStore();
+ if (!ctx) {
+ throw new Error(
+ '`getWritable()` can only be called inside a workflow or step function'
+ );
+ }
+
+ const { namespace } = options;
+ const runId = ctx.workflowMetadata.workflowRunId;
+ const name = getWorkflowRunStreamId(runId, namespace);
+
+ const serialize = getSerializeStream(
+ getExternalReducers(globalThis, ctx.ops, runId, ctx.encryptionKey),
+ ctx.encryptionKey
+ );
+
+ const serverWritable = new WorkflowServerWritableStream(name, runId);
+ const state = createFlushableState();
+ ctx.ops.push(state.promise);
+
+ flushablePipe(serialize.readable, serverWritable, state).catch(() => {});
+ pollWritableLock(serialize.writable, state);
+
+ return serialize.writable;
+}
+```
+
+Key points:
+
+- A `TransformStream` handles serialization (and optional encryption).
+- `flushablePipe` reads from the transform's readable side and writes to `WorkflowServerWritableStream`.
+- `pollWritableLock` watches for the user releasing their writer lock so the step can complete without waiting for an explicit `.close()`.
+- The `state.promise` is pushed onto `ctx.ops`, tying stream completion to step completion.
+
+## Code Walkthrough
+
+### Buffered writes in WorkflowServerWritableStream
+
+`WorkflowServerWritableStream` (defined in `packages/core/src/serialization.ts`) batches chunks before sending them to the world:
+
+```ts title="packages/core/src/serialization.ts (simplified)" lineNumbers
+const STREAM_FLUSH_INTERVAL_MS = 10;
+
+// Inside the constructor:
+let buffer: Uint8Array[] = [];
+let flushTimer: ReturnType | null = null;
+
+const flush = async (): Promise => {
+ if (buffer.length === 0) return;
+ const chunksToFlush = buffer.slice();
+
+ if (typeof world.writeToStreamMulti === 'function' && chunksToFlush.length > 1) {
+ await world.writeToStreamMulti(name, runId, chunksToFlush);
+ } else {
+ for (const chunk of chunksToFlush) {
+ await world.writeToStream(name, runId, chunk);
+ }
+ }
+ buffer = [];
+};
+
+// write() buffers a chunk and schedules a flush
+async write(chunk) {
+ buffer.push(chunk);
+ scheduleFlush(); // arms a 10 ms setTimeout
+ await new Promise((resolve, reject) => {
+ flushWaiters.push({ resolve, reject });
+ });
+}
+```
+
+The 10 ms batching window reduces network overhead: rapid successive writes are coalesced into a single `writeToStreamMulti` call. Each `write()` caller awaits the flush result, so `flushablePipe` correctly tracks pending operations.
+
+### Lock polling and completion
+
+The Web Streams API has no event for "lock released but stream still open." `flushable-stream.ts` bridges this gap with a 100 ms polling loop:
+
+```ts title="packages/core/src/flushable-stream.ts (simplified)" lineNumbers
+export const LOCK_POLL_INTERVAL_MS = 100;
+
+export function pollWritableLock(
+ writable: WritableStream,
+ state: FlushableStreamState
+): void {
+ const intervalId = setInterval(() => {
+ if (state.doneResolved || state.streamEnded) {
+ clearInterval(intervalId);
+ return;
+ }
+ if (isWritableUnlockedNotClosed(writable) && state.pendingOps === 0) {
+ state.doneResolved = true;
+ state.resolve();
+ clearInterval(intervalId);
+ }
+ }, LOCK_POLL_INTERVAL_MS);
+}
+```
+
+This means a step function can write to a stream, release the writer lock, and return — without explicitly closing the stream. The polling loop detects the unlock, waits for in-flight flushes to settle, then resolves the state promise so the step can complete. Without this mechanism, Vercel functions would hang until the runtime timeout because `.pipeTo()` only resolves on stream close.
+
+### `flushablePipe` — the pump
+
+`flushablePipe` is the core read-write loop that connects the serialization transform to the server writable. It tracks `pendingOps` to coordinate with the lock poller:
+
+```ts title="packages/core/src/flushable-stream.ts (excerpt)" lineNumbers
+export async function flushablePipe(
+ source: ReadableStream,
+ sink: WritableStream,
+ state: FlushableStreamState
+): Promise {
+ const reader = source.getReader();
+ const writer = sink.getWriter();
+
+ try {
+ while (true) {
+ const readResult = await reader.read();
+ if (readResult.done) {
+ state.streamEnded = true;
+ await writer.close();
+ if (!state.doneResolved) {
+ state.doneResolved = true;
+ state.resolve();
+ }
+ return;
+ }
+ state.pendingOps++;
+ try {
+ await writer.write(readResult.value);
+ } finally {
+ state.pendingOps--;
+ }
+ }
+ } catch (err) {
+ state.streamEnded = true;
+ if (!state.doneResolved) {
+ state.doneResolved = true;
+ state.reject(err);
+ }
+ throw err;
+ } finally {
+ reader.releaseLock();
+ writer.releaseLock();
+ }
+}
+```
+
+The dual resolution paths — stream-close via the pump loop and lock-release via polling — ensure the step completes promptly regardless of how the user finishes writing.
+
+### Local persistence (world-local)
+
+`packages/world-local/src/streamer.ts` persists each chunk as a binary file:
+
+- **Path format:** `streams/chunks/{streamName}-chnk_{ULID}.bin`
+- **Chunk format:** 1 byte EOF flag + payload bytes
+- **Ordering:** Monotonic ULID generation ensures lexicographic sort equals chronological order
+- **Multi-chunk batching:** `writeToStreamMulti` generates all ULIDs synchronously before any async I/O, preserving call order even when `runId` is a promise
+- **EOF:** `closeStream` writes a zero-payload chunk with the EOF byte set to `1`
+- **Run association:** A JSON file at `streams/runs/{runId}` tracks which stream IDs belong to a run
+
+The reader (`readFromStream`) sets up event listeners **before** reading from disk, then reconciles disk state with buffered real-time events to avoid duplicates and maintain order.
+
+### Production persistence (world-vercel)
+
+`packages/world-vercel/src/streamer.ts` delegates to the Vercel workflow-server over HTTP:
+
+- **Write:** `PUT /v2/runs/{runId}/stream/{name}` with the chunk as the request body
+- **Multi-chunk write:** Same endpoint with an `X-Stream-Multi: true` header and a length-prefixed binary encoding (`encodeMultiChunks`)
+- **Close:** `PUT` with an `X-Stream-Done: true` header and no body
+- **Read:** `GET /v2/stream/{name}` returns a streaming response body; `startIndex` is supported as a query parameter
+- **Chunk pagination:** `GET /v2/runs/{runId}/streams/{name}/chunks` supports `limit` and `cursor` parameters
+
+The production backend does not use an in-process event emitter — the workflow-server handles chunk storage and real-time delivery to readers.
+
+### Postgres persistence (world-postgres)
+
+`packages/world-postgres/src/streamer.ts` stores chunks as rows via Drizzle ORM:
+
+- **Chunk ID:** Monotonic ULID-based `chunkId` for ordering
+- **Multi-chunk batching:** `writeToStreamMulti` batch-inserts all chunks in a single query with pre-generated IDs to preserve ordering
+- **Real-time delivery:** PostgreSQL `LISTEN`/`NOTIFY` on a `workflow_event_chunk` channel
+
+## Why This Matters
+
+Durable streaming solves the fundamental tension between serverless execution (short-lived, stateless) and streaming output (long-lived, stateful):
+
+- **Incremental results without blocking.** A step function can write progress updates, AI-generated tokens, or partial results that reach the UI immediately — even if the step later suspends and replays.
+- **Automatic completion detection.** The lock-polling mechanism means step functions don't need explicit stream lifecycle management. Write your data, release the lock (or let it go out of scope), and the framework handles the rest.
+- **Backend-agnostic persistence.** The same `getWritable()` call works identically against the local filesystem backend, the Vercel HTTP backend, and the Postgres backend. The `Streamer` interface in `@workflow/world` defines the contract; backends implement it.
+- **Ordered, resumable reads.** Readers can start from any index and receive chunks in guaranteed ULID order. The local backend supports real-time subscriptions via `EventEmitter`; the production backend streams via HTTP response bodies.
+- **Batched efficiency.** The 10 ms flush window and `writeToStreamMulti` support mean high-frequency writes don't generate one HTTP request per chunk — they're coalesced automatically.
diff --git a/docs/content/deep-dives/durable-streaming-social.mdx b/docs/content/deep-dives/durable-streaming-social.mdx
new file mode 100644
index 0000000000..d2c8145aac
--- /dev/null
+++ b/docs/content/deep-dives/durable-streaming-social.mdx
@@ -0,0 +1,58 @@
+---
+title: "Your Stream Crashed. Your Data Didn't."
+description: A concise explainer on how Workflow DevKit makes streams durable — persisting every chunk to a world backend so streams survive function suspensions, crashes, and scale-to-zero events.
+type: conceptual
+summary: Workflow DevKit persists streaming data through world-backed storage. Streams bypass the event log — chunks don't replay — but they don't bypass persistence. A 10 ms batched flush writes every chunk to the world backend as it's produced, so streams survive suspension, crashes, and scale-to-zero without data loss.
+prerequisites:
+ - /docs/foundations/workflows-and-steps
+related:
+ - /docs/how-it-works/understanding-directives
+ - /deep-dives/step-execution-model-reference
+ - /deep-dives/durability-replay-reference
+ - /deep-dives/cost-model-fluid-compute-reference
+---
+
+Serverless functions have timeouts. Streams don't care about timeouts. That's a problem — unless every chunk is persisted the moment it arrives.
+
+## The Key Insight
+
+Streams survive suspension because they bypass the event log but not persistence.
+
+The event log records step inputs, outputs, and orchestration state — it's what makes replay deterministic. Streaming data doesn't belong there. Replaying 10,000 LLM tokens as events would make every re-invocation slower.
+
+Instead, durable streams write directly to the configured world backend — filesystem in local development, Vercel's workflow-server over HTTP in hosted deployments, or Postgres when using `world-postgres`. Each chunk is flushed within a 10 ms batching window via `writeToStreamMulti`. The data is durable from the moment it hits the backend, but never pollutes the replay log.
+
+This separation is why streams survive suspension, crashes, and scale-to-zero without affecting replay performance.
+
+## The Stream Access Rule
+
+Real stream I/O only happens inside step functions. When you call `getWritable()` in a `"use workflow"` function, you get a lightweight handle — an object that carries a stream name but sets up no I/O pipeline. The handle crosses the workflow-to-step boundary through serialization, and the step function reconstitutes it into a real writable with the full pipeline attached.
+
+This restriction exists because workflow functions run in a sandboxed VM without network access. They orchestrate — they don't do I/O. The handle is a promise of a stream, not the stream itself.
+
+## How It Looks in Practice
+
+```ts
+// A single run can own multiple independent streams via namespaces
+const tokens = getWritable({ namespace: 'tokens' });
+const progress = getWritable({ namespace: 'progress' });
+// IDs: strm_{ULID}_user_{base64('tokens')}, strm_{ULID}_user_{base64('progress')}
+```
+
+A 100 ms lock-polling loop detects when you release your writer — no need to explicitly close the stream. The framework sees the unlock, waits for pending flushes to drain, and completes the step automatically.
+
+```ts
+const writer = stream.getWriter();
+for await (const token of llm.stream(prompt)) {
+ await writer.write(token);
+}
+writer.releaseLock();
+// Framework detects the unlock and completes the stream
+return { status: 'done' };
+```
+
+## Why It Matters
+
+Traditional serverless streaming loses data when functions timeout — the in-memory buffer vanishes with the process. Durable streaming makes that impossible. Chunks are persisted as they arrive, readers reconnect from any offset via `startIndex`, and the stream outlives any single invocation.
+
+Your stream crashed. Your data didn't. That's the point.
diff --git a/docs/content/deep-dives/step-execution-model-blog.mdx b/docs/content/deep-dives/step-execution-model-blog.mdx
new file mode 100644
index 0000000000..f69d1523f4
--- /dev/null
+++ b/docs/content/deep-dives/step-execution-model-blog.mdx
@@ -0,0 +1,294 @@
+---
+title: "How Workflow DevKit Splits Your Code in Two — And Why That's the Key to Durable Execution"
+description: A developer-facing deep dive into the step execution model that powers Workflow DevKit's durable workflows — how directives split orchestration from side effects, what happens inside the sandboxed VM, and why replay works without re-running your business logic.
+type: conceptual
+summary: Workflow DevKit uses two directives to separate deterministic orchestration from side-effecting steps. The workflow VM replaces Math.random(), Date.now(), and crypto with seeded alternatives, then suspends execution when a step hasn't completed yet. On replay, cached step results are returned instantly — orchestration re-runs, but your code that talks to databases and APIs never executes twice.
+prerequisites:
+ - /docs/foundations/workflows-and-steps
+related:
+ - /docs/how-it-works/understanding-directives
+ - /docs/how-it-works/code-transform
+ - /deep-dives/compiler-magic-swc-plugin-reference
+ - /deep-dives/durability-replay-reference
+ - /deep-dives/durable-streaming-reference
+---
+
+Every durable workflow framework has to answer one fundamental question: when the process crashes and restarts, how do you get back to where you were without doing everything over again?
+
+Some frameworks checkpoint state into serialized blobs. Others require you to write explicit state machines. Workflow DevKit takes a different path: it lets you write ordinary `async`/`await` JavaScript, then splits your code into two execution contexts at compile time — one that orchestrates, and one that does work. The orchestration half replays from an event log. The work half never runs twice.
+
+This article walks through exactly how that split works, from the directives you write in source code down to the suspension mechanics that make it all possible.
+
+## The Two Directives
+
+Workflow DevKit introduces two JavaScript directives that tell the compiler how to treat each function:
+
+```ts
+export async function createUser(email: string) {
+ "use step";
+ return { id: crypto.randomUUID(), email };
+}
+
+export async function handleUserSignup(email: string) {
+ "use workflow";
+ const user = await createUser(email);
+ return { userId: user.id };
+}
+```
+
+`"use workflow"` marks a function as a **deterministic orchestrator**. It decides what steps to run and in what order, but it cannot perform side effects directly — no API calls, no database writes, no file system access.
+
+`"use step"` marks a function as a **side-effecting operation**. It has full Node.js runtime access: network, disk, environment variables, the works. Its return value is persisted to an append-only **Event log** and cached for replay.
+
+This is the entire API surface for the split. You don't need to learn a state machine DSL or decorate classes with metadata. You write functions, add a directive string, and the compiler handles the rest.
+
+## What the Compiler Does
+
+The SWC compiler plugin runs three transform passes over every workflow file. The one that matters most for understanding the execution model is the **workflow mode** transform.
+
+In workflow mode, step function bodies are replaced with proxy calls through a well-known symbol — this is what produces the **Workflow bundle**. The entire function declaration is replaced with a variable assignment:
+
+```ts
+// Workflow mode output — step function replaced entirely
+export var createUser = globalThis[Symbol.for("WORKFLOW_USE_STEP")](
+ "step//./workflows/user//createUser"
+);
+
+export async function handleUserSignup(email: string) {
+ // Workflow body stays intact
+ const user = await createUser(email);
+ return { userId: user.id };
+}
+handleUserSignup.workflowId = "workflow//./workflows/user//handleUserSignup";
+globalThis.__private_workflows.set(
+ "workflow//./workflows/user//handleUserSignup",
+ handleUserSignup
+);
+```
+
+`WORKFLOW_USE_STEP` returns a function — so `await createUser(email)` still works. The step ID (`"step//./workflows/user//createUser"`) is derived from the file path and function name at build time — it's stable across deployments and deterministic across replays.
+
+At runtime, the `WORKFLOW_USE_STEP` symbol is bound to the `useStep` function from the core runtime:
+
+```ts
+// From packages/core/src/workflow.ts
+const useStep = createUseStep(workflowContext);
+vmGlobalThis[WORKFLOW_USE_STEP] = useStep;
+```
+
+When workflow code calls `await createUser(email)`, it's calling the function returned by the proxy, which checks the **Event log** for a cached result or suspends execution if the step hasn't run yet.
+
+## Inside the Sandbox
+
+Workflow functions execute inside a Node.js VM context — not in your normal runtime. This sandbox is created by `createContext()` in `packages/core/src/vm/index.ts`, and it replaces every source of non-determinism with a seeded, reproducible alternative.
+
+The seed is derived from three values that are identical on every replay of the same run:
+
+```ts
+// From packages/core/src/workflow.ts
+const {
+ context,
+ globalThis: vmGlobalThis,
+ updateTimestamp,
+} = createContext({
+ seed: `${workflowRun.runId}:${workflowRun.workflowName}:${+startedAt}`,
+ fixedTimestamp: +startedAt,
+});
+```
+
+Here's what gets replaced inside the VM:
+
+### Math.random()
+
+```ts
+const rng = seedrandom(seed);
+g.Math.random = rng;
+```
+
+Every call to `Math.random()` returns the next value from a seeded PRNG. Same seed, same sequence, every time. If your workflow uses randomness for ID generation, load balancing, or jitter — it produces identical results on replay.
+
+### Date.now() and new Date()
+
+```ts
+const Date_ = g.Date;
+(g as any).Date = function Date(
+ ...args: Parameters<(typeof globalThis)['Date']>[]
+) {
+ if (args.length === 0) {
+ return new Date_(fixedTimestamp);
+ }
+ return new Date_(...args);
+};
+g.Date.now = () => fixedTimestamp;
+```
+
+Time doesn't advance with wall-clock time inside the workflow. Instead, `fixedTimestamp` advances only when events are consumed from the log:
+
+```ts
+// From packages/core/src/workflow.ts
+workflowContext.eventsConsumer.subscribe((event) => {
+ const createdAt = event?.createdAt;
+ if (createdAt) {
+ updateTimestamp(+createdAt);
+ }
+ return EventConsumerResult.NotConsumed;
+});
+```
+
+This means the workflow experiences time progressing through the event log — not through real elapsed time. A workflow that took 3 hours to complete will replay in milliseconds, but `Date.now()` inside that workflow will still return the correct timestamps at each decision point.
+
+### crypto.getRandomValues() and crypto.randomUUID()
+
+Both use the same seeded RNG, so UUIDs generated inside workflow functions are deterministic and reproducible.
+
+### Disallowed globals
+
+`fetch`, `setTimeout`, `setInterval`, and other non-deterministic APIs throw helpful errors directing you to use step functions instead. `process.env` is available as a frozen snapshot — readable but not writable.
+
+## The Suspension Mechanism
+
+Here's where the model gets interesting. When a workflow reaches a step that hasn't completed yet, it doesn't block or poll. It **suspends**.
+
+The `useStep` proxy created by `createUseStep()` in `packages/core/src/step.ts` subscribes to the `EventsConsumer` for each step call. When the consumer reaches the end of the **Event log** without finding a matching `step_completed` event, it triggers a suspension:
+
+```ts
+// From packages/core/src/step.ts
+if (!event) {
+ // End of the event log — this step hasn't completed yet.
+ // The Promise never resolves, stopping workflow execution.
+ scheduleWhenIdle(ctx, () => {
+ ctx.onWorkflowError(
+ new WorkflowSuspension(ctx.invocationsQueue, ctx.globalThis)
+ );
+ });
+ return EventConsumerResult.NotConsumed;
+}
+```
+
+The `WorkflowSuspension` collects all pending operations — steps, hooks, and waits — and propagates up through the call stack:
+
+```ts
+// From packages/core/src/workflow.ts
+try {
+ const result = await Promise.race([
+ workflowFn(...args),
+ workflowDiscontinuation.promise,
+ ]);
+ // Workflow completed
+} catch (err) {
+ if (WorkflowSuspension.is(err)) {
+ throw err; // Propagated to suspension handler
+ }
+ throw err;
+}
+```
+
+The suspension handler in `packages/core/src/runtime/suspension-handler.ts` then processes the pending queue in a specific order:
+
+1. **Hooks first** — webhook receivers are created before steps to prevent race conditions
+2. **Steps and waits in parallel** — each step gets a `step_created` event persisted and a **Queue message** sent for background execution
+3. **Timeout calculation** — if any waits exist, the minimum `resumeAt` time determines when the workflow re-enqueues
+
+After suspension handling completes, the workflow handler returns. Nothing stays in memory. The **Event log** is the complete state. The **Queue message** is a trigger, not durable state — it carries the run ID and trace context so the step handler knows which step to execute.
+
+## Replay: Orchestration Re-runs, Side Effects Don't
+
+When a step completes in the background, the step handler writes `step_completed` to the **Event log** and sends a **Queue message** to re-enqueue the workflow. It starts from the beginning — same code, same seed, same starting timestamp — and replays through the **Event log**.
+
+This is where the split pays off. The `EventsConsumer` feeds events to subscribers in order. Workflow state is reconstructed by replaying code against the **Event log**. When the `useStep` proxy encounters a `step_completed` event for the current step, it returns the cached result immediately:
+
+```ts
+// From packages/core/src/step.ts
+if (event.eventType === 'step_completed') {
+ ctx.invocationsQueue.delete(event.correlationId);
+
+ ctx.pendingDeliveries++;
+ ctx.promiseQueue = ctx.promiseQueue.then(async () => {
+ try {
+ const hydratedResult = await hydrateStepReturnValue(
+ event.eventData.result,
+ ctx.runId,
+ ctx.encryptionKey,
+ ctx.globalThis
+ );
+ resolve(hydratedResult as Result);
+ } catch (error) {
+ reject(error);
+ } finally {
+ ctx.pendingDeliveries--;
+ }
+ });
+ return EventConsumerResult.Finished;
+}
+```
+
+The promise queue ensures results are delivered in **Event log** order, even if deserialization takes variable time. The workflow code sees exactly the same values it would have seen on the first execution — same step results, same `Date.now()` values, same `Math.random()` sequence.
+
+
+Replay re-runs the orchestration logic — the workflow function — but it never re-runs the side effects. Step bodies are excluded from the **Workflow bundle** because replay must never re-run side effects. Step functions that call APIs, write to databases, or send emails execute exactly once. On replay, the workflow receives their cached results from the **Event log**.
+
+
+## The Lifecycle: A Complete Picture
+
+```mermaid
+flowchart TD
+ A["Workflow handler invoked"] --> B["VM sandbox created with seeded RNG and fixed timestamp"]
+ B --> C["Workflow function begins execution"]
+ C --> D["Step call hits useStep proxy"]
+ D --> E{"step_completed in event log?"}
+ E -->|"Yes"| F["Return cached result, advance VM timestamp"]
+ F --> G["Workflow continues to next step"]
+ G --> D
+ E -->|"No — end of log"| H["WorkflowSuspension thrown"]
+ H --> I["Suspension handler persists step_created events"]
+ I --> J["Step messages queued for background execution"]
+ J --> K["Workflow handler returns — nothing in memory"]
+ K --> L["Step executes with full Node.js access"]
+ L --> M["step_completed event persisted"]
+ M --> N["Workflow re-enqueued"]
+ N --> A
+```
+
+A workflow with 10 steps will invoke the workflow handler up to 10 times — once for each new step. But each invocation replays all previously completed steps from cache in milliseconds, then suspends at the next uncompleted step. The total compute cost is proportional to the orchestration logic, not the step execution time.
+
+## Before and After: Always-On vs. Suspend/Resume
+
+Consider a traditional always-on worker processing a multi-step order pipeline — charge the card, reserve inventory, send a confirmation email, then wait for shipping confirmation from a third-party API.
+
+**Always-on orchestration:**
+- A worker process stays alive for the entire workflow duration. If the shipping API takes 6 hours to respond, the worker sits idle for 6 hours holding a connection, memory, and a compute slot.
+- If the process crashes between steps — say, after charging the card but before reserving inventory — you need explicit checkpointing, idempotency keys, and manual retry logic to avoid double-charging.
+- Scaling means provisioning enough persistent workers to handle peak concurrency. Each worker is occupied for the full lifetime of the workflow it's running, regardless of how much time is spent waiting vs. computing.
+- Error handling is defensive: you wrap every step in try/catch, manage partial failure states, and hope your checkpoint logic covers every edge case.
+
+**Workflow DevKit's suspend/resume model:**
+- The workflow handler runs only during replay and suspension — typically milliseconds per invocation. Between steps, nothing is in memory. The **Event log** is the complete state.
+- Crashes are invisible to the developer. If the process dies after charging the card, the `step_completed` event for the charge is already in the **Event log**. On re-enqueue, the workflow replays through that cached result and suspends at the next uncompleted step — no double-charge, no manual recovery code.
+- Steps execute as individual queued messages that can run on any available compute. A workflow with 100 concurrent steps doesn't need 100 workers — the queue distributes work across whatever capacity is available.
+- Error handling is built into the model: `RetryableError` triggers automatic retries with backoff; `FatalError` terminates the step permanently. The orchestration code doesn't need try/catch around step calls unless you want to handle failures as part of the workflow logic.
+
+The difference is most dramatic for workflows with long waits. A workflow that sends an email and waits 3 days for a customer response costs exactly zero compute during those 3 days. No worker. No connection. No memory. When the response arrives (via a webhook that records `hook_received`, or a scheduled wake-up that re-invokes the handler), the workflow replays from the event log — milliseconds of compute — processes the response in a step, and either completes or suspends again at the next pending operation.
+
+## Writing Workflow Code: Practical Implications
+
+Understanding the execution model changes how you write code:
+
+**Inside `"use workflow"` functions:**
+- `Math.random()`, `Date.now()`, and `crypto.randomUUID()` are safe — they're deterministic
+- Don't call `fetch`, `setTimeout`, or any I/O — the sandbox will throw
+- Use `await` and `Promise.all()` / `Promise.race()` freely — they work as expected
+- Every execution path must be deterministic given the same event history
+
+**Inside `"use step"` functions:**
+- Full Node.js access — call APIs, query databases, write files
+- Return values must be serializable (they're persisted to the event log)
+- Each step executes exactly once, even across crashes and replays
+- Use `FatalError` for permanent failures, `RetryableError` for transient ones
+
+The boundary between the two contexts is the only thing you need to think about. Everything else — suspension, replay, event log management, timestamp advancement — is handled by the runtime.
+
+## Conclusion
+
+The step execution model is the foundation that makes Workflow DevKit's other properties possible. Durability comes from the **Event log**. Replay comes from deterministic orchestration. Cost efficiency comes from suspension. And all of it traces back to a single architectural decision: split the code into what orchestrates and what does work, then make the orchestration half perfectly reproducible.
+
+Two directives. One split. That's the entire model.
diff --git a/docs/content/deep-dives/step-execution-model-reference.mdx b/docs/content/deep-dives/step-execution-model-reference.mdx
new file mode 100644
index 0000000000..b33004f090
--- /dev/null
+++ b/docs/content/deep-dives/step-execution-model-reference.mdx
@@ -0,0 +1,198 @@
+---
+title: Step Execution Model
+description: How Workflow DevKit splits orchestration from side effects using directives, a sandboxed VM with deterministic globals, and suspend/replay mechanics.
+type: conceptual
+summary: A technical reference tracing the full path from directive-annotated source through compiler transformation, into deterministic VM execution, through suspension and queued step dispatch, to replay with cached results.
+prerequisites:
+ - /docs/foundations/workflows-and-steps
+related:
+ - /docs/how-it-works/understanding-directives
+ - /docs/how-it-works/code-transform
+ - /deep-dives/compiler-magic-swc-plugin-reference
+ - /deep-dives/durability-replay-reference
+ - /deep-dives/durable-streaming-reference
+---
+
+
+The step execution model is the mechanism that makes durable workflows possible. It lets you write ordinary `async`/`await` code while the runtime transparently handles suspension, background step execution, and deterministic replay. Understanding this model explains why workflow functions run in a sandbox, why `Math.random()` and `Date.now()` are deterministic, and how cached step results enable replay without re-executing side effects.
+
+
+## Overview
+
+Workflow DevKit separates code into two execution contexts using JavaScript directives:
+
+- **Workflow functions** (`"use workflow"`) — deterministic orchestrators that run inside a sandboxed Node.js VM. They coordinate which steps to run and in what order, but cannot perform side effects directly.
+- **Step functions** (`"use step"`) — side-effecting operations with full Node.js runtime access. They run outside the VM via the **Step bundle**, and their results are persisted to the **Event log**.
+
+The compiler transforms a single source file into three bundles — **Client bundle** (ID-bearing stub used by `start()`), **Workflow bundle** (deterministic orchestration code loaded into the VM), and **Step bundle** (real side-effecting code executed by the step handler). At runtime, the workflow VM calls steps through a symbol-based proxy (`WORKFLOW_USE_STEP`) that either returns a cached result from the **Event log** or suspends the workflow and sends a **Queue message** to enqueue the step for background execution.
+
+## Lifecycle
+
+```mermaid
+flowchart TD
+ A["Source file with 'use workflow' and 'use step' directives"] --> B["SWC compiler runs three transform modes"]
+ B --> C["Workflow mode: step bodies replaced with WORKFLOW_USE_STEP proxy calls"]
+ B --> D["Step mode: step functions registered with registerStepFunction()"]
+ B --> E["Client mode: workflow bodies replaced with error throw + workflowId attached"]
+ C --> F["Workflow code loaded into sandboxed Node.js VM via runInContext()"]
+ F --> G{"Step already in event log?"}
+ G -->|"Yes"| H["EventsConsumer returns cached result, VM timestamp advanced"]
+ G -->|"No"| I["WorkflowSuspension thrown with pending operations"]
+ I --> J["Suspension handler creates step_created event and queues step message"]
+ J --> K["Step executes out-of-band with full Node.js access"]
+ K --> L["step_completed event persisted to event log"]
+ L --> M["Workflow re-enqueued, replays from beginning with cached results"]
+ M --> F
+ H --> N["Workflow continues to next step or completes"]
+```
+
+## Code Walkthrough
+
+### Deterministic VM context
+
+When a workflow run executes, `createContext()` in `packages/core/src/vm/index.ts` builds a sandboxed environment where all sources of non-determinism are replaced with seeded, reproducible alternatives. The seed is derived from the run ID, workflow name, and start timestamp — so every replay of the same run produces identical values:
+
+```ts title="packages/core/src/vm/index.ts" lineNumbers
+export function createContext(options: CreateContextOptions) {
+ let { fixedTimestamp } = options;
+ const { seed } = options;
+ const rng = seedrandom(seed);
+ const context = vmCreateContext();
+
+ const g: typeof globalThis = runInContext('globalThis', context);
+
+ // Deterministic `Math.random()`
+ g.Math.random = rng;
+
+ // Override `Date` constructor to return fixed time when called without arguments
+ const Date_ = g.Date;
+ (g as any).Date = function Date(
+ ...args: Parameters<(typeof globalThis)['Date']>[]
+ ) {
+ if (args.length === 0) {
+ return new Date_(fixedTimestamp);
+ }
+ return new Date_(...args);
+ };
+ (g as any).Date.prototype = Date_.prototype;
+ Object.setPrototypeOf(g.Date, Date_);
+ g.Date.now = () => fixedTimestamp;
+
+ // ... crypto.getRandomValues and crypto.randomUUID also use rng
+
+ return {
+ context,
+ globalThis: g,
+ updateTimestamp: (timestamp: number) => {
+ fixedTimestamp = timestamp;
+ },
+ };
+}
+```
+
+Key properties of the sandbox:
+
+- **`Math.random()`** is replaced with a seeded PRNG (`seedrandom`). The seed is `${runId}:${workflowName}:${+startedAt}`, making the sequence identical across replays.
+- **`Date.now()`** and `new Date()` return a fixed timestamp that advances only when events are consumed from the log (via `updateTimestamp`).
+- **`crypto.getRandomValues()`** and **`crypto.randomUUID()`** use the same seeded RNG.
+- **`fetch`**, **`setTimeout`**, **`setInterval`**, and other non-deterministic globals throw helpful errors directing developers to use step functions or `sleep()` instead.
+- **`process.env`** is provided as a frozen snapshot — readable but not mutable.
+
+### WORKFLOW_USE_STEP transformation
+
+The SWC compiler transforms step function bodies in workflow mode into proxy calls through a well-known symbol. Given this source:
+
+```ts title="Source" lineNumbers
+export async function createUser(email: string) {
+ "use step";
+ return { id: crypto.randomUUID(), email };
+}
+
+export async function handleUserSignup(email: string) {
+ "use workflow";
+ const user = await createUser(email);
+ return { userId: user.id };
+}
+```
+
+The workflow-mode output replaces the step function entirely with a variable assignment:
+
+```ts title="Workflow mode output" lineNumbers
+// Step function replaced — WORKFLOW_USE_STEP returns a callable function
+export var createUser = globalThis[Symbol.for("WORKFLOW_USE_STEP")](
+ "step//./workflows/user//createUser"
+);
+
+// Workflow body stays intact — it's deterministic orchestration
+export async function handleUserSignup(email: string) {
+ const user = await createUser(email);
+ return { userId: user.id };
+}
+handleUserSignup.workflowId = "workflow//./workflows/user//handleUserSignup";
+globalThis.__private_workflows.set(
+ "workflow//./workflows/user//handleUserSignup",
+ handleUserSignup
+);
+```
+
+`WORKFLOW_USE_STEP` returns a function, so `await createUser(email)` still works as expected. At runtime, the `WORKFLOW_USE_STEP` symbol is bound to the `useStep` function created in `packages/core/src/workflow.ts`:
+
+```ts title="packages/core/src/workflow.ts (simplified)" lineNumbers
+const useStep = createUseStep(workflowContext);
+
+// Injected into the VM's globalThis
+vmGlobalThis[WORKFLOW_USE_STEP] = useStep;
+```
+
+When the workflow calls `await createUser(email)`, the proxy checks the `EventsConsumer` for a matching `step_completed` event in the **Event log**. If found, it returns the cached result. If not found, it adds the step to the invocations queue and eventually throws a `WorkflowSuspension`. Step bodies are excluded from the **Workflow bundle** because replay must never re-run side effects.
+
+### Suspension and step dispatch
+
+When a workflow reaches a step that hasn't been executed yet, execution doesn't block — it **suspends**. The `WorkflowSuspension` collects all pending operations (steps, hooks, waits) and propagates up to the runtime:
+
+```ts title="packages/core/src/workflow.ts (execution)" lineNumbers
+try {
+ const result = await Promise.race([
+ workflowFn(...args),
+ workflowDiscontinuation.promise,
+ ]);
+ // ... workflow completed successfully
+} catch (err) {
+ if (WorkflowSuspension.is(err)) {
+ throw err; // Propagated to the suspension handler
+ }
+ throw err;
+}
+```
+
+The suspension handler in `packages/core/src/runtime/suspension-handler.ts` then processes the pending queue:
+
+1. **Hooks first** — created before steps to prevent race conditions with webhook receivers
+2. **Steps and waits in parallel** — each step gets a `step_created` event in the **Event log** and a **Queue message** sent for background execution. Step input is hydrated from the persisted `step_created` event, not from the **Queue message**
+3. **Timeout calculation** — if any waits exist, the minimum `resumeAt` time determines when the workflow should be re-enqueued
+
+### Replay with timestamp advancement
+
+On replay, the `EventsConsumer` feeds events to registered callbacks. A passive subscriber advances the VM's clock as events are consumed:
+
+```ts title="packages/core/src/workflow.ts (timestamp subscriber)" lineNumbers
+workflowContext.eventsConsumer.subscribe((event) => {
+ const createdAt = event?.createdAt;
+ if (createdAt) {
+ updateTimestamp(+createdAt);
+ }
+ return EventConsumerResult.NotConsumed;
+});
+```
+
+This means `Date.now()` inside the workflow returns the timestamp of the most recently consumed event — not wall-clock time. The workflow experiences time progressing through the **Event log**, making temporal logic consistent across replays. Workflow state is reconstructed by replaying code against the **Event log**.
+
+## Why This Matters
+
+The split execution model delivers three properties that enable durable workflows on stateless compute:
+
+1. **Zero re-execution cost** — Step results are cached in the **Event log**. Replay never re-runs a step; it reads the cached result and continues. A workflow with 50 completed steps replays in milliseconds.
+
+2. **Deterministic orchestration** — The sandboxed VM guarantees that workflow code produces the same decisions given the same event history. This makes replay safe: the workflow will always arrive at the same suspension point or completion.
+
+3. **Stateless suspension** — When a workflow suspends, nothing stays in memory. The **Event log** is the complete state. A **Queue message** is a trigger, not durable state. The workflow can resume on any machine, in any process, at any time — and produce the same result.
diff --git a/docs/content/deep-dives/step-execution-model-social.mdx b/docs/content/deep-dives/step-execution-model-social.mdx
new file mode 100644
index 0000000000..d2d6dd5b40
--- /dev/null
+++ b/docs/content/deep-dives/step-execution-model-social.mdx
@@ -0,0 +1,48 @@
+---
+title: "Await Still Reads Linearly — Because Suspension Is Hidden Behind Replay"
+description: A concise explainer on how Workflow DevKit lets you write sequential async/await code that survives crashes, restarts, and scale-to-zero — without changing how you read or write it.
+type: conceptual
+summary: Workflow DevKit splits code into replayable orchestration and once-only side effects. The key insight is that your async/await code reads exactly the same as normal JavaScript — suspension and replay happen transparently behind the await boundary, so you never see the machinery that makes execution durable.
+prerequisites:
+ - /docs/foundations/workflows-and-steps
+related:
+ - /docs/how-it-works/understanding-directives
+ - /docs/how-it-works/code-transform
+ - /deep-dives/compiler-magic-swc-plugin-reference
+ - /deep-dives/durability-replay-reference
+ - /deep-dives/durable-streaming-reference
+---
+
+Here's a workflow that charges a card, reserves inventory, and sends a confirmation email:
+
+```ts
+export async function processOrder(orderId: string) {
+ "use workflow";
+ const charge = await chargeCard(9900);
+ const reservation = await reserveInventory(orderId);
+ await sendConfirmation(charge.id, reservation.id);
+ return { orderId, status: "confirmed" };
+}
+```
+
+It reads like normal async JavaScript. Three awaits, top to bottom, one after another. But this code survives process crashes, cold starts, and scale-to-zero events — without you writing a single line of recovery logic.
+
+## The Trick: Suspension Is Invisible
+
+When this workflow hits `await chargeCard(9900)` for the first time, there's no cached result yet. The runtime **suspends** — it throws a `WorkflowSuspension`, queues the step for background execution, and returns. Nothing stays in memory.
+
+The step runs separately with full Node.js access via the **Step bundle**, charges the card via Stripe, and persists the result to an append-only **Event log**. Then a **Queue message** re-enqueues the workflow.
+
+On replay, the runtime recreates a deterministic VM sandbox — same seeded `Math.random()`, same frozen `Date.now()`, same `crypto.randomUUID()` sequence — and re-executes the **Workflow bundle** from the top. When it hits `await chargeCard(9900)` again, the `EventsConsumer` finds the cached `step_completed` result in the **Event log** and returns it instantly. No network call. No re-charge. Workflow state is reconstructed by replaying code against the **Event log**.
+
+The workflow continues to `await reserveInventory(orderId)`. If that step hasn't completed yet, the same suspension cycle repeats. If it has, the cached result comes back immediately.
+
+From the developer's perspective, execution flows top-to-bottom through each `await`, exactly as written. The suspension, queuing, replay, and cache lookup all happen behind the `await` boundary. You never see it. You never code around it.
+
+## Why This Matters
+
+Other durable execution frameworks ask you to restructure code into state machines, generator functions, or callback chains. Workflow DevKit doesn't. The `"use workflow"` and `"use step"` directives are the only API — the compiler and runtime handle the rest.
+
+Your orchestration code replays deterministically. Your side-effecting steps execute exactly once. And your `await` still reads linearly, because suspension is hidden behind replay.
+
+That's the entire model: write sequential JavaScript, get durable execution for free.
diff --git a/docs/content/docs/how-it-works/code-transform.mdx b/docs/content/docs/how-it-works/code-transform.mdx
index 2998640773..d9a7cd9c4f 100644
--- a/docs/content/docs/how-it-works/code-transform.mdx
+++ b/docs/content/docs/how-it-works/code-transform.mdx
@@ -95,7 +95,7 @@ export async function createUser(email: string) {
return { id: crypto.randomUUID(), email };
}
-registerStepFunction("step//workflows/user.js//createUser", createUser); // [!code highlight]
+registerStepFunction("step//./workflows/user//createUser", createUser); // [!code highlight]
```
**What happens:**
@@ -107,7 +107,7 @@ registerStepFunction("step//workflows/user.js//createUser", createUser); // [!co
**Why no transformation?** Step functions execute in your main runtime with full access to Node.js APIs, file system, databases, etc. They don't need any special handling—they just run normally.
-**ID Format:** Step IDs follow the pattern `step//{filepath}//{functionName}`, where the filepath is relative to your project root.
+**ID Format:** Step IDs follow the pattern `step//{modulePath}//{functionName}`, where the module path is a `./`-prefixed relative path with file extensions stripped.
@@ -134,20 +134,18 @@ export async function handleUserSignup(email: string) {
{/* @skip-typecheck: incomplete code sample */}
```typescript
-export async function createUser(email: string) {
- return globalThis[Symbol.for("WORKFLOW_USE_STEP")]("step//workflows/user.js//createUser")(email); // [!code highlight]
-}
+export var createUser = globalThis[Symbol.for("WORKFLOW_USE_STEP")]("step//./workflows/user//createUser"); // [!code highlight]
export async function handleUserSignup(email: string) {
const user = await createUser(email);
return { userId: user.id };
}
-handleUserSignup.workflowId = "workflow//workflows/user.js//handleUserSignup"; // [!code highlight]
+handleUserSignup.workflowId = "workflow//./workflows/user//handleUserSignup"; // [!code highlight]
```
**What happens:**
-- Step function bodies are **replaced** with calls to `globalThis[Symbol.for("WORKFLOW_USE_STEP")]`
+- Step functions are **replaced entirely** with a `var` assignment calling `globalThis[Symbol.for("WORKFLOW_USE_STEP")]`, which returns a callable function
- Workflow function bodies remain **intact**—they execute deterministically during replay
- The workflow function gets a `workflowId` property for runtime identification
- The `"use workflow"` directive is removed
@@ -158,7 +156,7 @@ handleUserSignup.workflowId = "workflow//workflows/user.js//handleUserSignup"; /
2. If yes: Returns the cached result
3. If no: Triggers a suspension and enqueues the step for background execution
-**ID Format:** Workflow IDs follow the pattern `workflow//{filepath}//{functionName}`. The `workflowId` property is attached to the function to allow [`start()`](/docs/api-reference/workflow-api/start) to work at runtime.
+**ID Format:** Workflow IDs follow the pattern `workflow//{modulePath}//{functionName}`. The `workflowId` property is attached to the function to allow [`start()`](/docs/api-reference/workflow-api/start) to work at runtime.
@@ -183,7 +181,7 @@ export async function handleUserSignup(email: string) {
export async function handleUserSignup(email: string) {
throw new Error("You attempted to execute ..."); // [!code highlight]
}
-handleUserSignup.workflowId = "workflow//workflows/user.js//handleUserSignup"; // [!code highlight]
+handleUserSignup.workflowId = "workflow//./workflows/user//handleUserSignup"; // [!code highlight]
```
**What happens:**
@@ -305,13 +303,15 @@ Learn more about [Workflows and Steps](/docs/foundations/workflows-and-steps).
The compiler generates stable IDs for workflows and steps based on file paths and function names:
-**Pattern:** `{type}//{filepath}//{functionName}`
+**Pattern:** `{type}//{modulePath}//{functionName}`
+
+Where `modulePath` is a relative path prefixed with `./` and file extensions are stripped. When a module specifier with version is configured (for npm packages), it uses that instead (e.g., `@myorg/tasks@2.0.0`).
**Examples:**
-- `workflow//workflows/user-signup.js//handleUserSignup`
-- `step//workflows/user-signup.js//createUser`
-- `step//workflows/payments/checkout.ts//processPayment`
+- `workflow//./workflows/user-signup//handleUserSignup`
+- `step//./workflows/user-signup//createUser`
+- `step//./workflows/payments/checkout//processPayment`
**Key properties:**