Tom-Ryder
diff --git a/‎Sources/AgentRunKit/Documentation.docc/AgentRunKit.md‎
Lines changed: 19 additions & 1 deletion b/‎Sources/AgentRunKit/Documentation.docc/AgentRunKit.md‎
Lines changed: 19 additions & 1 deletion
diff --git a/‎Sources/AgentRunKit/Documentation.docc/Articles/AgentAndChat.md‎
Lines changed: 3 additions & 0 deletions b/‎Sources/AgentRunKit/Documentation.docc/Articles/AgentAndChat.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎Sources/AgentRunKit/Documentation.docc/Articles/CheckpointAndResume.md‎
Lines changed: 179 additions & 0 deletions b/‎Sources/AgentRunKit/Documentation.docc/Articles/CheckpointAndResume.md‎
Lines changed: 179 additions & 0 deletions
diff --git a/‎Sources/AgentRunKit/Documentation.docc/Articles/MCPIntegration.md‎
Lines changed: 6 additions & 0 deletions b/‎Sources/AgentRunKit/Documentation.docc/Articles/MCPIntegration.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎Sources/AgentRunKit/Documentation.docc/Articles/StreamingAndSwiftUI.md‎
Lines changed: 16 additions & 4 deletions b/‎Sources/AgentRunKit/Documentation.docc/Articles/StreamingAndSwiftUI.md‎
Lines changed: 16 additions & 4 deletions
diff --git a/‎Sources/AgentRunKit/Documentation.docc/Extensions/Agent.md‎
Lines changed: 7 additions & 2 deletions b/‎Sources/AgentRunKit/Documentation.docc/Extensions/Agent.md‎
Lines changed: 7 additions & 2 deletions
diff --git a/‎Sources/AgentRunKit/Documentation.docc/Extensions/AgentCheckpoint.md‎
Lines changed: 32 additions & 0 deletions b/‎Sources/AgentRunKit/Documentation.docc/Extensions/AgentCheckpoint.md‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎Sources/AgentRunKit/Documentation.docc/Extensions/AgentCheckpointError.md‎
Lines changed: 17 additions & 0 deletions b/‎Sources/AgentRunKit/Documentation.docc/Extensions/AgentCheckpointError.md‎
Lines changed: 17 additions & 0 deletions
diff --git a/‎Sources/AgentRunKit/Documentation.docc/Extensions/AgentCheckpointer.md‎
Lines changed: 22 additions & 0 deletions b/‎Sources/AgentRunKit/Documentation.docc/Extensions/AgentCheckpointer.md‎
Lines changed: 22 additions & 0 deletions
@@ -38,7 +38,7 @@ if let content = result.content {
 }
 ```
 
-If a run ends because `maxIterations` or `tokenBudget` is reached before the model calls `finish`, ``Agent/run(userMessage:history:context:tokenBudget:requestContext:approvalHandler:)`` still returns an ``AgentResult`` with a structural ``FinishReason`` and `content == nil`.
+If a run ends because `maxIterations` or `tokenBudget` is reached before the model calls `finish`, ``Agent/run(userMessage:history:context:tokenBudget:requestContext:approvalHandler:)-(String,_,_,_,_,_)`` still returns an ``AgentResult`` with a structural ``FinishReason`` and `content == nil`.
 
 For a complete walkthrough, see <doc:GettingStarted>.
 
@@ -54,9 +54,27 @@ For a complete walkthrough, see <doc:GettingStarted>.
 
 - <doc:StreamingAndSwiftUI>
 - ``StreamEvent``
+- ``EventOrigin``
 - ``AgentStream``
+- ``StreamEventBuffer``
+- ``BufferReplayError``
 - ``ToolCallInfo``
 
+### Checkpoint and Resume
+
+- <doc:CheckpointAndResume>
+- ``AgentCheckpoint``
+- ``AgentCheckpointer``
+- ``InMemoryCheckpointer``
+- ``FileCheckpointer``
+- ``MCPToolBinding``
+- ``ContextBudgetCheckpointState``
+- ``AgentCheckpointError``
+- ``CheckpointID``
+- ``SessionID``
+- ``RunID``
+- ``EventID``
+
 ### Tool Approval
 
 - <doc:ToolApproval>
 
@@ -23,6 +23,8 @@ if let content = result.content {
 
 ``Agent`` also exposes `stream()`, which returns an `AsyncThrowingStream<StreamEvent, Error>` for real-time token delivery and tool progress. See <doc:StreamingAndSwiftUI>.
 
+For long-running sessions, `stream()` accepts `sessionID:` and `checkpointer:` parameters that persist iteration state to a backend implementing ``AgentCheckpointer``. ``Agent/resume(from:checkpointer:context:tokenBudget:requestContext:approvalHandler:)`` reconstructs a stopped run from any saved ``CheckpointID``. See <doc:CheckpointAndResume>.
+
 Key behaviors:
 - Injects a `finish` tool automatically. The model must call it to end the loop.
 - Alternate termination for on-device clients: when the LLM client cannot surface tool calls in its response (e.g., `FoundationModelsClient`), the loop terminates on the first iteration that produces content without tool calls. The user-visible contract is unchanged.
@@ -135,3 +137,4 @@ let (_, history2) = try await chat.send("Tell me more.", history: history)
 - <doc:DefiningTools>
 - <doc:StreamingAndSwiftUI>
 - <doc:ContextManagement>
+- <doc:CheckpointAndResume>
@@ -0,0 +1,179 @@
+# Checkpoint and Resume
+
+Persist iteration state mid-run, then resume from any saved checkpoint into a fresh streaming continuation.
+
+## Overview
+
+A checkpointer captures the agent's full loop state at the end of every iteration: messages, accumulated token usage, per-iteration usage, context budget phase, session and run identity, the local-rewrite flag, the session approval allowlist, and any participating MCP tool bindings. ``Agent/resume(from:checkpointer:context:tokenBudget:requestContext:approvalHandler:)`` loads a saved checkpoint, replays its history into the consuming stream as one synthetic event, then continues live from the next iteration.
+
+This unblocks long-running sessions that need to survive process restarts, UI re-renders, or planned suspension. Checkpoints are written automatically by ``Agent/stream(userMessage:history:context:tokenBudget:requestContext:approvalHandler:sessionID:checkpointer:)-(String,_,_,_,_,_,_,_)`` when both a `sessionID` and a `checkpointer` are passed.
+
+## What a Checkpoint Captures
+
+``AgentCheckpoint`` is a `Codable` snapshot. It is written at the end of each iteration after tools execute and before the next request is built.
+
+| Field | Description |
+|---|---|
+| `messages` | Full conversation including system prompt, user, assistant, and tool messages |
+| `iteration` | One-based iteration number that produced this snapshot |
+| `tokenUsage` | Cumulative input/output usage across all iterations to date |
+| `iterationUsage` | Token usage for this iteration alone, when the provider reported it |
+| `contextBudgetState` | ``ContextBudgetCheckpointState`` capturing config, window size, last budget snapshot, and the soft-advisory armed flag |
+| `historyWasRewrittenLocally` | Whether the agent rewrote history (compaction, pruning) before this iteration |
+| `sessionAllowlist` | Tool names the user accepted with `.approveAlways` during this session |
+| `sessionID` | Logical session that owns the run |
+| `runID` | Run that produced this checkpoint |
+| `checkpointID` | Stable identity for the snapshot |
+| `timestamp` | UTC time the snapshot was taken |
+| `mcpToolBindings` | ``MCPToolBinding`` set: which MCP tools participated in this checkpoint's history |
+
+## Backends
+
+``AgentCheckpointer`` is a three-method protocol. Two backends ship with the framework.
+
+| Backend | Use When |
+|---|---|
+| ``InMemoryCheckpointer`` | The session is bounded by a single process lifetime: previews, tests, transient UI |
+| ``FileCheckpointer`` | The session must survive process restart: production apps, server workers, recovery flows |
+
+``FileCheckpointer`` stores one JSON file per checkpoint under `<directory>/checkpoints/<uuid>.json`. ``FileCheckpointer/list(session:)`` skips files it cannot read or decode, so unrelated debris in the directory does not break enumeration. ``FileCheckpointer/load(_:)`` throws ``AgentCheckpointError/fileSystem(_:)`` on the requested file if it is corrupt.
+
+Custom backends conform to ``AgentCheckpointer`` directly; database-backed and remote-storage implementations are out of scope for the built-in backends.
+
+## Enabling Checkpointing on a Stream
+
+Pass `sessionID:` and `checkpointer:` to either entry point.
+
+```swift
+let session = SessionID()
+let checkpointer = InMemoryCheckpointer()
+
+let stream = agent.stream(
+    userMessage: "Plan and execute the migration.",
+    context: ctx,
+    sessionID: session,
+    checkpointer: checkpointer
+)
+for try await event in stream {
+    handle(event)
+}
+
+let savedIDs = try await checkpointer.list(session: session)
+```
+
+If either argument is omitted, no checkpoint is written. The `stream()` overloads continue to default both to `nil`, so existing call sites are unaffected.
+
+## Resuming a Run
+
+``Agent/resume(from:checkpointer:context:tokenBudget:requestContext:approvalHandler:)`` loads the named checkpoint, replays its history as one synthetic ``StreamEvent/Kind/iterationCompleted(usage:iteration:history:)`` event tagged with ``EventOrigin/replayed(from:)``, then continues from `iteration + 1`.
+
+```swift
+let stream = try await agent.resume(
+    from: checkpointID,
+    checkpointer: checkpointer,
+    context: ctx
+)
+for try await event in stream {
+    if case .replayed(let id) = event.origin {
+        applySnapshot(id)
+        continue
+    }
+    handle(event)
+}
+```
+
+The resumed run gets a fresh ``RunID`` under the same ``SessionID``. Callers can distinguish replayed events from the live continuation by inspecting ``StreamEvent/origin``.
+
+### Preflight Termination
+
+If the saved checkpoint already exceeds the new `tokenBudget`, the stream replays and finishes with ``FinishReason/tokenBudgetExceeded(budget:used:)`` without making any LLM call. If `iteration >= maxIterations`, it replays and finishes with ``FinishReason/maxIterationsReached(limit:)``.
+
+### Cursor-State Providers
+
+Providers with conversation cursor state (the OpenAI Responses API's `previous_response_id`) cannot reuse a stale cursor after resume because the resumed run is a different run. The first live request after resume forces full history (`.forceFullRequest`) so cursor-state providers reconstruct the conversation from messages rather than from a vanished cursor.
+
+### MCP Binding Validation
+
+Before replay begins, ``Agent/resume(from:checkpointer:context:tokenBudget:requestContext:approvalHandler:)`` checks that every ``MCPToolBinding`` recorded in the checkpoint has a live counterpart on the resuming agent. If any are missing, resume throws ``AgentCheckpointError/mcpBindingMismatch(_:)`` with the missing bindings before any event is yielded. This catches deployment skew where the agent that resumes is configured against fewer or different MCP servers than the agent that saved.
+
+See <doc:MCPIntegration> for how MCP tools are discovered.
+
+## AgentStream Resume
+
+``AgentStream/resume(from:checkpointer:context:tokenBudget:requestContext:approvalHandler:)`` is the SwiftUI-side entry point. It cancels any in-flight prior task before any await runs, loads the checkpoint exactly once, then synchronously preloads observable state before yielding control back to the caller.
+
+```swift
+@State private var stream = AgentStream(agent: agent, bufferCapacity: 256)
+
+try await stream.resume(
+    from: checkpointID,
+    checkpointer: checkpointer,
+    context: ctx
+)
+```
+
+When `resume` returns, these properties are already populated from the checkpoint:
+
+| Property | Source |
+|---|---|
+| ``AgentStream/sessionID`` | `target.sessionID` |
+| ``AgentStream/history`` | `target.messages` |
+| ``AgentStream/tokenUsage`` | `target.tokenUsage` |
+| ``AgentStream/currentCheckpoint`` | `target.checkpointID` |
+
+The live continuation runs in a background task; ``AgentStream/iterationsReplayed`` increments once the synthetic replay event is observed, then the live iteration cycle proceeds normally. ``AgentStream/iterationsReplayed`` only counts replayed iterations, so callers can distinguish a fresh send from a resume.
+
+See <doc:StreamingAndSwiftUI> for the full SwiftUI contract.
+
+## Cancellation Safety
+
+``AgentStream/resume(from:checkpointer:context:tokenBudget:requestContext:approvalHandler:)`` calls ``AgentStream/cancel()`` and resets observable state before any await. A prior in-flight task cannot continue mutating observers while the new checkpoint loads. The same generation-token discipline that protects ``AgentStream/send(_:history:context:tokenBudget:requestContext:approvalHandler:sessionID:checkpointer:)-(String,_,_,_,_,_,_,_)`` against late-arriving stale events applies to resume.
+
+## Cross-Process Resume
+
+``FileCheckpointer`` is safe to use from a fresh process. The directory layout is stable; reopening the same directory and calling ``FileCheckpointer/list(session:)`` returns checkpoints written by an earlier process.
+
+```swift
+// Process A
+let writer = FileCheckpointer(directory: stateDirectory)
+for try await _ in agent.stream(
+    userMessage: "Long task...",
+    context: ctx, sessionID: session, checkpointer: writer
+) {}
+
+// Process B (later)
+let reader = FileCheckpointer(directory: stateDirectory)
+let ids = try await reader.list(session: session)
+guard let last = ids.last else { return }
+let stream = try await agent.resume(
+    from: last, checkpointer: reader, context: ctx
+)
+```
+
+The file backend is single-writer oriented. Multi-process coordination over the same directory is the caller's responsibility; for concurrent writers, use a database-backed custom ``AgentCheckpointer``.
+
+## Errors
+
+``AgentCheckpointError`` covers the three failure modes that resume can surface:
+
+| Case | Meaning |
+|---|---|
+| ``AgentCheckpointError/notFound(_:)`` | The named ``CheckpointID`` is not present in the backend |
+| ``AgentCheckpointError/fileSystem(_:)`` | A file backend operation failed (read, write, decode for the requested ID) |
+| ``AgentCheckpointError/mcpBindingMismatch(_:)`` | Resume cannot continue because one or more recorded MCP bindings have no live counterpart |
+
+## See Also
+
+- <doc:StreamingAndSwiftUI>
+- <doc:MCPIntegration>
+- ``AgentCheckpoint``
+- ``AgentCheckpointer``
+- ``InMemoryCheckpointer``
+- ``FileCheckpointer``
+- ``MCPToolBinding``
+- ``AgentCheckpointError``
+- ``ContextBudgetCheckpointState``
+- ``EventOrigin``
+- ``CheckpointID``
+- ``SessionID``
+- ``RunID``
@@ -54,6 +54,10 @@ Pass multiple ``MCPServerConfiguration`` values to a single session. All servers
 
 ``MCPTool`` adapts each discovered MCP tool to the ``AnyTool`` protocol. Once inside the `withTools` closure, MCP tools are indistinguishable from native ``Tool`` instances. The agent calls them through the same interface, and their results follow the same ``ToolResult`` type.
 
+## Checkpoint Binding Validation
+
+When a checkpointed run includes MCP tool calls, the agent loop records each participating tool as an ``MCPToolBinding`` in ``AgentCheckpoint/mcpToolBindings``. On resume, ``Agent/resume(from:checkpointer:context:tokenBudget:requestContext:approvalHandler:)`` validates that every recorded binding has a live counterpart with the same `serverName` and `toolName`. Missing bindings throw ``AgentCheckpointError/mcpBindingMismatch(_:)`` before any event is yielded, catching deployment skew where the resuming agent is configured against a different MCP server set. See <doc:CheckpointAndResume>.
+
 ## Error Handling
 
 ``MCPError`` covers all failure modes:
@@ -112,10 +116,12 @@ For session-based usage with custom transports, use the internal initializer tha
 
 - <doc:DefiningTools>
 - <doc:AgentAndChat>
+- <doc:CheckpointAndResume>
 - ``MCPClient``
 - ``MCPSession``
 - ``MCPTool``
 - ``MCPToolInfo``
+- ``MCPToolBinding``
 - ``MCPServerConfiguration``
 - ``StdioMCPTransport``
 - ``MCPTransport``
 
@@ -49,12 +49,13 @@ Every event includes:
 |---|---|
 | ``StreamEvent/id`` | Stable event identity for transcript rendering and correlation |
 | ``StreamEvent/timestamp`` | Emission time in UTC |
-| ``StreamEvent/sessionID`` | Optional session identity |
-| ``StreamEvent/runID`` | Optional run identity |
+| ``StreamEvent/sessionID`` | Session identity, populated when a stream is started with `sessionID:` |
+| ``StreamEvent/runID`` | Run identity, freshly assigned on each `stream()` or `resume(...)` |
 | ``StreamEvent/parentEventID`` | Optional parent correlation identity |
+| ``StreamEvent/origin`` | ``EventOrigin/live`` or ``EventOrigin/replayed(from:)`` (set on resume) |
 | ``StreamEvent/kind`` | The semantic payload |
 
-Today, direct `Agent` and `Chat` streams leave `sessionID`, `runID`, and `parentEventID` unset. A future session layer will populate those fields consistently.
+Pass `sessionID:` to ``Agent/stream(userMessage:history:context:tokenBudget:requestContext:approvalHandler:sessionID:checkpointer:)-(String,_,_,_,_,_,_,_)`` to thread an explicit session through events; otherwise a fresh ``SessionID`` is minted per stream. ``Chat`` continues to leave identity envelope fields unset.
 
 ## StreamEvent Kinds
 
@@ -135,12 +136,20 @@ This canonical codec uses the framework's fixed JSON settings for event transcri
 | `toolCalls` | [``ToolCallInfo``] | Top-level and nested tool calls with live state (`.running`, `.awaitingApproval`, `.completed`, `.failed`) |
 | `iterationUsages` | [``TokenUsage``] | Per-iteration usage, one entry per `.iterationCompleted` |
 | `contextBudget` | ``ContextBudget``? | Latest budget snapshot from `.budgetUpdated` |
+| `sessionID` | ``SessionID``? | Session identity threaded through emitted events |
+| `currentCheckpoint` | ``CheckpointID``? | Last replayed or live checkpoint observed; preloaded on resume |
+| `iterationsReplayed` | `Int` | Count of replayed `.iterationCompleted` events; only incremented on `.replayed` origin |
 
 **Methods:**
 
-- `send(_:history:context:tokenBudget:requestContext:approvalHandler:)` cancels any active stream, resets state, and starts a new one.
+- `send(_:history:context:tokenBudget:requestContext:approvalHandler:sessionID:checkpointer:)` cancels any active stream, resets state, and starts a new one. Pass `sessionID:` and `checkpointer:` to persist iteration state.
+- `resume(from:checkpointer:context:tokenBudget:requestContext:approvalHandler:)` synchronously preloads observable state from the loaded checkpoint, then starts the live continuation. See <doc:CheckpointAndResume>.
 - `cancel()` cancels the active stream without resetting state. It is a local cancellation API and does not guarantee a terminal `.finished` event.
 
+### Late-Binding Replay
+
+Construct ``AgentStream`` with a `bufferCapacity:` to capture every emitted event in a ``StreamEventBuffer``. Late observers reattach via ``AgentStream/replay(from:)``, which streams every buffered event from the given monotonic cursor and then errors with ``BufferReplayError`` if buffering is disabled. The buffer is per-send-isolated: a new `send` or `resume` clears the buffer to keep cursors comparable within one logical run.
+
 When sub-agents emit nested tool events, `toolCalls` flattens them into the same collection and prefixes names using `parent > child`.
 
 ## SwiftUI Example
@@ -196,7 +205,10 @@ for (index, usage) in stream.iterationUsages.enumerated() {
 
 - <doc:AgentAndChat>
 - <doc:SubAgents>
+- <doc:CheckpointAndResume>
 - ``StreamEvent``
+- ``EventOrigin``
 - ``AgentStream``
+- ``StreamEventBuffer``
 - ``ToolCallInfo``
 - ``TokenUsage``
@@ -13,5 +13,10 @@
 
 ### Streaming
 
-- ``stream(userMessage:history:context:tokenBudget:requestContext:approvalHandler:)-(String,_,_,_,_,_)``
-- ``stream(userMessage:history:context:tokenBudget:requestContext:approvalHandler:)-(ChatMessage,_,_,_,_,_)``
+- ``stream(userMessage:history:context:tokenBudget:requestContext:approvalHandler:sessionID:checkpointer:)-(String,_,_,_,_,_,_,_)``
+- ``stream(userMessage:history:context:tokenBudget:requestContext:approvalHandler:sessionID:checkpointer:)-(ChatMessage,_,_,_,_,_,_,_)``
+
+### Resuming
+
+- ``resume(from:checkpointer:context:tokenBudget:requestContext:approvalHandler:)``
+- <doc:CheckpointAndResume>
@@ -0,0 +1,32 @@
+# ``AgentRunKit/AgentCheckpoint``
+
+Snapshot of agent loop state captured at the end of an iteration.
+
+The snapshot is what ``Agent/resume(from:checkpointer:context:tokenBudget:requestContext:approvalHandler:)`` reads to reconstruct a run. See <doc:CheckpointAndResume> for the full lifecycle.
+
+## Topics
+
+### Identity
+
+- ``checkpointID``
+- ``sessionID``
+- ``runID``
+- ``timestamp``
+
+### Loop State
+
+- ``messages``
+- ``iteration``
+- ``tokenUsage``
+- ``iterationUsage``
+
+### Resume Inputs
+
+- ``contextBudgetState``
+- ``historyWasRewrittenLocally``
+- ``sessionAllowlist``
+- ``mcpToolBindings``
+
+### Initialization
+
+- ``init(messages:iteration:tokenUsage:iterationUsage:contextBudgetState:historyWasRewrittenLocally:sessionAllowlist:sessionID:runID:checkpointID:timestamp:mcpToolBindings:)``
@@ -0,0 +1,17 @@
+# ``AgentRunKit/AgentCheckpointError``
+
+Errors thrown by ``AgentCheckpointer`` backends and ``Agent/resume(from:checkpointer:context:tokenBudget:requestContext:approvalHandler:)``.
+
+See <doc:CheckpointAndResume> for the resume contract that surfaces these.
+
+## Topics
+
+### Cases
+
+- ``notFound(_:)``
+- ``fileSystem(_:)``
+- ``mcpBindingMismatch(_:)``
+
+### LocalizedError
+
+- ``errorDescription``
@@ -0,0 +1,22 @@
+# ``AgentRunKit/AgentCheckpointer``
+
+Persistence backend for ``AgentCheckpoint`` snapshots.
+
+Conform to write a custom backend (database, remote storage). The two built-in conformances are ``InMemoryCheckpointer`` and ``FileCheckpointer``. See <doc:CheckpointAndResume>.
+
+## Topics
+
+### Operations
+
+- ``save(_:)``
+- ``load(_:)``
+- ``list(session:)``
+
+### Built-In Backends
+
+- ``InMemoryCheckpointer``
+- ``FileCheckpointer``
+
+### Errors
+
+- ``AgentCheckpointError``