CopilotKit
diff --git a/‎DRIFT.md‎
Lines changed: 15 additions & 8 deletions b/‎DRIFT.md‎
Lines changed: 15 additions & 8 deletions
diff --git a/‎README.md‎
Lines changed: 10 additions & 10 deletions b/‎README.md‎
Lines changed: 10 additions & 10 deletions
diff --git a/‎docs/websocket/index.html‎
Lines changed: 121 additions & 10 deletions b/‎docs/websocket/index.html‎
Lines changed: 121 additions & 10 deletions
@@ -107,7 +107,7 @@ When a model is deprecated:
 
 ## WebSocket Drift Coverage
 
-In addition to the 23 existing drift tests (20 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover aimock's WS protocols (4 verified + 2 canary = 6 WS tests):
+In addition to the 23 existing drift tests (20 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover aimock's WS protocols (6 verified + 2 canary = 8 WS tests):
 
 ### Gemini Interactions API (Beta)
 
@@ -120,13 +120,20 @@ The Gemini Interactions API (`/v1beta/interactions`) is covered by 4 drift tests
 
 Uses `describe.skipIf(!GOOGLE_API_KEY)` like other Gemini tests. The Interactions API is in Beta — shapes may shift as Google iterates on the endpoint.
 
-| Protocol            | Text | Tool Call | Real Endpoint                                                       | Status     |
-| ------------------- | ---- | --------- | ------------------------------------------------------------------- | ---------- |
-| OpenAI Responses WS | ✓    | ✓         | `wss://api.openai.com/v1/responses`                                 | Verified   |
-| OpenAI Realtime     | ✓    | ✓         | `wss://api.openai.com/v1/realtime`                                  | Verified   |
-| Gemini Live         | —    | —         | `wss://generativelanguage.googleapis.com/ws/...BidiGenerateContent` | Unverified |
+| Protocol               | Text | Tool Call | Real Endpoint                                                       | Status     |
+| ---------------------- | ---- | --------- | ------------------------------------------------------------------- | ---------- |
+| OpenAI Responses WS    | ✓    | ✓         | `wss://api.openai.com/v1/responses`                                 | Verified   |
+| OpenAI Realtime (GA)   | ✓    | ✓         | `wss://api.openai.com/v1/realtime`                                  | Verified   |
+| OpenAI Realtime (Beta) | ✓    | ✓         | `wss://api.openai.com/v1/realtime` + `OpenAI-Beta: realtime=v1`     | Verified   |
+| Gemini Live            | —    | —         | `wss://generativelanguage.googleapis.com/ws/...BidiGenerateContent` | Unverified |
 
-**Models**: `gpt-4o-mini` for Responses WS, `gpt-4o-mini-realtime-preview` for Realtime.
+**Models**: `gpt-4o-mini` for Responses WS, `gpt-realtime-2` for Realtime GA (was `gpt-4o-mini-realtime-preview`).
+
+**GA Realtime Drift Tests**:
+
+- **Model canary** — Verifies all 5 GA models exist (`gpt-realtime-2`, `gpt-realtime-1.5`, `gpt-realtime-mini`, `gpt-realtime-translate`, `gpt-realtime-whisper`) and flags unknown realtime models
+- **Protocol probe** — Connects with both GA and Beta protocol, normalizes event sequences, and verifies consistency
+- **Event shape validation** — GA event names (`response.output_text.delta`, `conversation.item.added`, `conversation.item.done`) and nested session config (`session.audio.*`, `session.type`, `session.reasoning`)
 
 **Auth**: Uses the same `OPENAI_API_KEY` and `GOOGLE_API_KEY` environment variables as HTTP tests. No new secrets needed.
 
@@ -175,4 +182,4 @@ The fix workflow also supports `workflow_dispatch` for manual runs.
 
 ## Cost
 
-~29 API calls per run (20 HTTP response-shape + 3 model listing + 6 WS including canaries) using the cheapest available models (`gpt-4o-mini`, `gpt-4o-mini-realtime-preview`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.20/week at daily cadence. When Gemini Live text-capable models become available, the 2 canary tests will become full drift tests, increasing real WS connections from 4 to 6.
+~31 API calls per run (20 HTTP response-shape + 3 model listing + 8 WS including canaries) using the cheapest available models (`gpt-4o-mini`, `gpt-realtime-2`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.25/week at daily cadence. The GA protocol probe adds a second Realtime WS connection (one GA, one Beta) per run. When Gemini Live text-capable models become available, the 2 canary tests will become full drift tests, increasing real WS connections from 6 to 8.
@@ -35,29 +35,29 @@ await mock.stop();
 
 aimock mocks everything your AI app talks to:
 
-| Tool           | What it mocks                                                                                                        | Docs                                                |
-| -------------- | -------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
-| **LLMock**     | OpenAI (Chat/Responses/Realtime), Claude, Gemini (REST/Live/Interactions), Bedrock, Azure, Vertex AI, Ollama, Cohere | [Providers](https://aimock.copilotkit.dev/docs)     |
-| **MCPMock**    | MCP tools, resources, prompts with session management                                                                | [MCP](https://aimock.copilotkit.dev/mcp-mock)       |
-| **A2AMock**    | Agent-to-agent protocol with SSE streaming                                                                           | [A2A](https://aimock.copilotkit.dev/a2a-mock)       |
-| **AGUIMock**   | AG-UI agent-to-UI event streams for frontend testing                                                                 | [AG-UI](https://aimock.copilotkit.dev/agui-mock)    |
-| **VectorMock** | Pinecone, Qdrant, ChromaDB compatible endpoints                                                                      | [Vector](https://aimock.copilotkit.dev/vector-mock) |
-| **Services**   | Tavily search, Cohere rerank, OpenAI moderation                                                                      | [Services](https://aimock.copilotkit.dev/services)  |
+| Tool           | What it mocks                                                                                                                | Docs                                                |
+| -------------- | ---------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
+| **LLMock**     | OpenAI (Chat/Responses/Realtime GA+Beta), Claude, Gemini (REST/Live/Interactions), Bedrock, Azure, Vertex AI, Ollama, Cohere | [Providers](https://aimock.copilotkit.dev/docs)     |
+| **MCPMock**    | MCP tools, resources, prompts with session management                                                                        | [MCP](https://aimock.copilotkit.dev/mcp-mock)       |
+| **A2AMock**    | Agent-to-agent protocol with SSE streaming                                                                                   | [A2A](https://aimock.copilotkit.dev/a2a-mock)       |
+| **AGUIMock**   | AG-UI agent-to-UI event streams for frontend testing                                                                         | [AG-UI](https://aimock.copilotkit.dev/agui-mock)    |
+| **VectorMock** | Pinecone, Qdrant, ChromaDB compatible endpoints                                                                              | [Vector](https://aimock.copilotkit.dev/vector-mock) |
+| **Services**   | Tavily search, Cohere rerank, OpenAI moderation                                                                              | [Services](https://aimock.copilotkit.dev/services)  |
 
 Run them all on one port with `npx @copilotkit/aimock --config aimock.json`, or use the programmatic API to compose exactly what you need.
 
 ## Features
 
 - **[Record & Replay](https://aimock.copilotkit.dev/record-replay)** — Proxy real APIs, save as fixtures, replay deterministically forever
 - **[Multi-turn Conversations](https://aimock.copilotkit.dev/multi-turn)** — Record and replay multi-turn traces with tool rounds; match distinct turns via `turnIndex`, `hasToolResult`, `toolCallId`, `sequenceIndex`, `systemMessage` (gate on host-supplied agent context), or custom predicates
-- **[12 LLM Providers](https://aimock.copilotkit.dev/docs)** — OpenAI Chat, OpenAI Responses, OpenAI Realtime, Claude, Gemini, Gemini Live, Gemini Interactions, Azure, Bedrock, Vertex AI, Ollama, Cohere — full streaming support
+- **[12 LLM Providers](https://aimock.copilotkit.dev/docs)** — OpenAI Chat, OpenAI Responses, OpenAI Realtime (GA + Beta shim), Claude, Gemini, Gemini Live, Gemini Interactions, Azure, Bedrock, Vertex AI, Ollama, Cohere — full streaming support
 - **Multimedia APIs** — [image generation](https://aimock.copilotkit.dev/images) (DALL-E, Imagen), [text-to-speech](https://aimock.copilotkit.dev/speech), [audio transcription](https://aimock.copilotkit.dev/transcription), [video generation](https://aimock.copilotkit.dev/video)
 - **[MCP](https://aimock.copilotkit.dev/mcp-mock) / [A2A](https://aimock.copilotkit.dev/a2a-mock) / [AG-UI](https://aimock.copilotkit.dev/agui-mock) / [Vector](https://aimock.copilotkit.dev/vector-mock)** — Mock every protocol your AI agents use
 - **[Chaos Testing](https://aimock.copilotkit.dev/chaos-testing)** — 500 errors, malformed JSON, mid-stream disconnects at any probability
 - **Per-Request Strict Mode** — `X-AIMock-Strict` header overrides the server-level `--strict` flag per request (`true`/`1` = strict, `false`/`0` = lenient)
 - **[Drift Detection](https://aimock.copilotkit.dev/drift-detection)** — Daily CI validation against real APIs
 - **[Streaming Physics](https://aimock.copilotkit.dev/streaming-physics)** — Configurable `ttft`, `tps`, and `jitter`
-- **[WebSocket APIs](https://aimock.copilotkit.dev/websocket)** — OpenAI Realtime, Responses WS, Gemini Live
+- **[WebSocket APIs](https://aimock.copilotkit.dev/websocket)** — OpenAI Realtime (GA protocol with 5 models: gpt-realtime-2, gpt-realtime-1.5, gpt-realtime-mini, gpt-realtime-translate, gpt-realtime-whisper; transcription/translation session types; image input; commentary phase), Responses WS, Gemini Live
 - **[Prometheus Metrics](https://aimock.copilotkit.dev/metrics)** — Request counts, latencies, fixture match rates
 - **[Docker + Helm](https://aimock.copilotkit.dev/docker)** — Container image and Helm chart for CI/CD
 - **[Vitest & Jest Plugins](https://aimock.copilotkit.dev/test-plugins)** — Zero-config `useAimock()` with auto lifecycle and env patching
 
@@ -109,23 +109,132 @@ <h2>OpenAI Responses (WebSocket)</h2>
         </div>
 
         <h2>OpenAI Realtime</h2>
-        <p>The Realtime API uses a conversational protocol with session management.</p>
+        <p>
+          The Realtime API uses a conversational protocol with session management. aimock implements
+          the
+          <strong>GA (General Availability) protocol</strong> natively &mdash; event names like
+          <code>response.output_text.delta</code>, <code>conversation.item.added</code>, and nested
+          <code>audio</code> session config are the defaults. The Beta protocol is supported via the
+          <code>OpenAI-Beta: realtime=v1</code> header, which activates a translation shim that
+          converts GA events back to Beta names (<code>response.text.delta</code>,
+          <code>conversation.item.created</code>, flat session config).
+        </p>
+
+        <h3>Supported Models</h3>
+        <table class="endpoint-table">
+          <thead>
+            <tr>
+              <th>Model</th>
+              <th>Session Types</th>
+              <th>Notes</th>
+            </tr>
+          </thead>
+          <tbody>
+            <tr>
+              <td>gpt-realtime-2</td>
+              <td>conversation</td>
+              <td>Default model &mdash; GA successor to gpt-4o-realtime-preview</td>
+            </tr>
+            <tr>
+              <td>gpt-realtime-1.5</td>
+              <td>conversation</td>
+              <td>Previous generation GA model</td>
+            </tr>
+            <tr>
+              <td>gpt-realtime-mini</td>
+              <td>conversation</td>
+              <td>Smaller, faster GA model</td>
+            </tr>
+            <tr>
+              <td>gpt-realtime-translate</td>
+              <td>translation</td>
+              <td>Real-time speech translation</td>
+            </tr>
+            <tr>
+              <td>gpt-realtime-whisper</td>
+              <td>transcription</td>
+              <td>Real-time speech transcription</td>
+            </tr>
+          </tbody>
+        </table>
+
+        <h3>Session Types</h3>
+        <ul>
+          <li>
+            <strong>conversation</strong> (default) &mdash; Standard conversational interaction with
+            text and audio modalities
+          </li>
+          <li>
+            <strong>transcription</strong> &mdash; Audio-to-text transcription (requires
+            <code>gpt-realtime-whisper</code>)
+          </li>
+          <li>
+            <strong>translation</strong> &mdash; Real-time speech translation (requires
+            <code>gpt-realtime-translate</code>)
+          </li>
+        </ul>
+
+        <h3>GA Protocol Features</h3>
+        <ul>
+          <li>
+            <strong>GA event names</strong> &mdash; <code>response.output_text.delta</code> (was
+            <code>response.text.delta</code>), <code>conversation.item.added</code> (was
+            <code>conversation.item.created</code>), etc.
+          </li>
+          <li>
+            <strong>Nested audio config</strong> &mdash; Session config uses
+            <code>session.audio.voice</code> instead of flat <code>session.voice</code>
+          </li>
+          <li>
+            <strong>Image input</strong> &mdash; <code>input_image</code> content parts in
+            <code>conversation.item.create</code>
+          </li>
+          <li>
+            <strong>Commentary phase</strong> &mdash; <code>phase</code> field on
+            <code>response.output_item.added/done</code> events (<code>final_answer</code> or
+            <code>commentary</code>)
+          </li>
+          <li>
+            <strong><code>conversation.item.done</code></strong> &mdash; New event emitted after
+            each completed response item
+          </li>
+          <li>
+            <strong><code>response.cancel</code></strong> &mdash; Client message to cancel in-flight
+            responses
+          </li>
+        </ul>
+
+        <h3>Beta Compatibility</h3>
+        <p>
+          Clients that send the <code>OpenAI-Beta: realtime=v1</code> header receive Beta-format
+          events automatically. The shim translates event names, flattens the nested audio config,
+          and suppresses GA-only events like <code>conversation.item.done</code>. No code changes
+          needed in tests that target the Beta protocol.
+        </p>
 
         <div class="code-block">
-          <div class="code-block-header">ws-realtime.test.ts <span class="lang-tag">ts</span></div>
-          <pre><code><span class="kw">const</span> <span class="op">ws</span> = <span class="kw">await</span> <span class="fn">connectWebSocket</span>(<span class="op">instance</span>.<span class="prop">url</span>, <span class="str">"/v1/realtime"</span>);
+          <div class="code-block-header">
+            ws-realtime.test.ts (GA protocol) <span class="lang-tag">ts</span>
+          </div>
+          <pre><code><span class="kw">const</span> <span class="op">ws</span> = <span class="kw">await</span> <span class="fn">connectWebSocket</span>(<span class="op">instance</span>.<span class="prop">url</span>, <span class="str">"/v1/realtime?model=gpt-realtime-2"</span>);
 
 <span class="cm">// Server sends session.created on connect</span>
 <span class="kw">const</span> [<span class="op">sessionMsg</span>] = <span class="kw">await</span> <span class="op">ws</span>.<span class="fn">waitForMessages</span>(<span class="num">1</span>);
-<span class="fn">expect</span>(<span class="type">JSON</span>.<span class="fn">parse</span>(<span class="op">sessionMsg</span>).<span class="prop">type</span>).<span class="fn">toBe</span>(<span class="str">"session.created"</span>);
+<span class="kw">const</span> <span class="op">session</span> = <span class="type">JSON</span>.<span class="fn">parse</span>(<span class="op">sessionMsg</span>);
+<span class="fn">expect</span>(<span class="op">session</span>.<span class="prop">type</span>).<span class="fn">toBe</span>(<span class="str">"session.created"</span>);
+<span class="fn">expect</span>(<span class="op">session</span>.<span class="prop">session</span>.<span class="prop">type</span>).<span class="fn">toBe</span>(<span class="str">"conversation"</span>);
+<span class="fn">expect</span>(<span class="op">session</span>.<span class="prop">session</span>.<span class="prop">audio</span>).<span class="fn">toBeDefined</span>();
 
-<span class="cm">// Configure session</span>
+<span class="cm">// Configure session with nested audio config</span>
 <span class="op">ws</span>.<span class="fn">send</span>(<span class="type">JSON</span>.<span class="fn">stringify</span>({
   <span class="prop">type</span>: <span class="str">"session.update"</span>,
-  <span class="prop">session</span>: { <span class="prop">modalities</span>: [<span class="str">"text"</span>] }
+  <span class="prop">session</span>: {
+    <span class="prop">modalities</span>: [<span class="str">"text"</span>],
+    <span class="prop">audio</span>: { <span class="prop">voice</span>: <span class="str">"alloy"</span> }
+  }
 }));
 
-<span class="cm">// Add a user message</span>
+<span class="cm">// Add a user message (supports input_text + input_image content)</span>
 <span class="op">ws</span>.<span class="fn">send</span>(<span class="type">JSON</span>.<span class="fn">stringify</span>({
   <span class="prop">type</span>: <span class="str">"conversation.item.create"</span>,
   <span class="prop">item</span>: {
@@ -138,10 +247,12 @@ <h2>OpenAI Realtime</h2>
 <span class="cm">// Request a response</span>
 <span class="op">ws</span>.<span class="fn">send</span>(<span class="type">JSON</span>.<span class="fn">stringify</span>({ <span class="prop">type</span>: <span class="str">"response.create"</span> }));
 
-<span class="cm">// Wait for response events</span>
-<span class="kw">const</span> <span class="op">msgs</span> = <span class="kw">await</span> <span class="op">ws</span>.<span class="fn">waitForMessages</span>(<span class="num">8</span>);
+<span class="cm">// GA events: output_text instead of text, item.added instead of item.created</span>
+<span class="kw">const</span> <span class="op">msgs</span> = <span class="kw">await</span> <span class="op">ws</span>.<span class="fn">waitForMessages</span>(<span class="num">10</span>);
 <span class="kw">const</span> <span class="op">events</span> = <span class="op">msgs</span>.<span class="fn">map</span>(<span class="op">m</span> <span class="kw">=&gt;</span> <span class="type">JSON</span>.<span class="fn">parse</span>(<span class="op">m</span>));
-<span class="fn">expect</span>(<span class="op">events</span>.<span class="fn">some</span>(<span class="op">e</span> <span class="kw">=&gt;</span> <span class="op">e</span>.<span class="prop">type</span> === <span class="str">"response.text.delta"</span>)).<span class="fn">toBe</span>(<span class="kw">true</span>);</code></pre>
+<span class="fn">expect</span>(<span class="op">events</span>.<span class="fn">some</span>(<span class="op">e</span> <span class="kw">=&gt;</span> <span class="op">e</span>.<span class="prop">type</span> === <span class="str">"response.output_text.delta"</span>)).<span class="fn">toBe</span>(<span class="kw">true</span>);
+<span class="fn">expect</span>(<span class="op">events</span>.<span class="fn">some</span>(<span class="op">e</span> <span class="kw">=&gt;</span> <span class="op">e</span>.<span class="prop">type</span> === <span class="str">"conversation.item.added"</span>)).<span class="fn">toBe</span>(<span class="kw">true</span>);
+<span class="fn">expect</span>(<span class="op">events</span>.<span class="fn">some</span>(<span class="op">e</span> <span class="kw">=&gt;</span> <span class="op">e</span>.<span class="prop">type</span> === <span class="str">"conversation.item.done"</span>)).<span class="fn">toBe</span>(<span class="kw">true</span>);</code></pre>
         </div>
 
         <h2>Gemini Live</h2>