Skip to content

Commit bc5a742

Browse files
committed
Update docs, drift coverage, and competitive matrix for GA realtime
1 parent d526d3d commit bc5a742

5 files changed

Lines changed: 230 additions & 28 deletions

File tree

DRIFT.md

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ When a model is deprecated:
107107

108108
## WebSocket Drift Coverage
109109

110-
In addition to the 23 existing drift tests (20 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover aimock's WS protocols (4 verified + 2 canary = 6 WS tests):
110+
In addition to the 23 existing drift tests (20 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover aimock's WS protocols (6 verified + 2 canary = 8 WS tests):
111111

112112
### Gemini Interactions API (Beta)
113113

@@ -120,13 +120,20 @@ The Gemini Interactions API (`/v1beta/interactions`) is covered by 4 drift tests
120120

121121
Uses `describe.skipIf(!GOOGLE_API_KEY)` like other Gemini tests. The Interactions API is in Beta — shapes may shift as Google iterates on the endpoint.
122122

123-
| Protocol | Text | Tool Call | Real Endpoint | Status |
124-
| ------------------- | ---- | --------- | ------------------------------------------------------------------- | ---------- |
125-
| OpenAI Responses WS ||| `wss://api.openai.com/v1/responses` | Verified |
126-
| OpenAI Realtime ||| `wss://api.openai.com/v1/realtime` | Verified |
127-
| Gemini Live ||| `wss://generativelanguage.googleapis.com/ws/...BidiGenerateContent` | Unverified |
123+
| Protocol | Text | Tool Call | Real Endpoint | Status |
124+
| ---------------------- | ---- | --------- | ------------------------------------------------------------------- | ---------- |
125+
| OpenAI Responses WS ||| `wss://api.openai.com/v1/responses` | Verified |
126+
| OpenAI Realtime (GA) ||| `wss://api.openai.com/v1/realtime` | Verified |
127+
| OpenAI Realtime (Beta) ||| `wss://api.openai.com/v1/realtime` + `OpenAI-Beta: realtime=v1` | Verified |
128+
| Gemini Live ||| `wss://generativelanguage.googleapis.com/ws/...BidiGenerateContent` | Unverified |
128129

129-
**Models**: `gpt-4o-mini` for Responses WS, `gpt-4o-mini-realtime-preview` for Realtime.
130+
**Models**: `gpt-4o-mini` for Responses WS, `gpt-realtime-2` for Realtime GA (was `gpt-4o-mini-realtime-preview`).
131+
132+
**GA Realtime Drift Tests**:
133+
134+
- **Model canary** — Verifies all 5 GA models exist (`gpt-realtime-2`, `gpt-realtime-1.5`, `gpt-realtime-mini`, `gpt-realtime-translate`, `gpt-realtime-whisper`) and flags unknown realtime models
135+
- **Protocol probe** — Connects with both GA and Beta protocol, normalizes event sequences, and verifies consistency
136+
- **Event shape validation** — GA event names (`response.output_text.delta`, `conversation.item.added`, `conversation.item.done`) and nested session config (`session.audio.*`, `session.type`, `session.reasoning`)
130137

131138
**Auth**: Uses the same `OPENAI_API_KEY` and `GOOGLE_API_KEY` environment variables as HTTP tests. No new secrets needed.
132139

@@ -175,4 +182,4 @@ The fix workflow also supports `workflow_dispatch` for manual runs.
175182

176183
## Cost
177184

178-
~29 API calls per run (20 HTTP response-shape + 3 model listing + 6 WS including canaries) using the cheapest available models (`gpt-4o-mini`, `gpt-4o-mini-realtime-preview`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.20/week at daily cadence. When Gemini Live text-capable models become available, the 2 canary tests will become full drift tests, increasing real WS connections from 4 to 6.
185+
~31 API calls per run (20 HTTP response-shape + 3 model listing + 8 WS including canaries) using the cheapest available models (`gpt-4o-mini`, `gpt-realtime-2`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.25/week at daily cadence. The GA protocol probe adds a second Realtime WS connection (one GA, one Beta) per run. When Gemini Live text-capable models become available, the 2 canary tests will become full drift tests, increasing real WS connections from 6 to 8.

README.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -35,29 +35,29 @@ await mock.stop();
3535

3636
aimock mocks everything your AI app talks to:
3737

38-
| Tool | What it mocks | Docs |
39-
| -------------- | -------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
40-
| **LLMock** | OpenAI (Chat/Responses/Realtime), Claude, Gemini (REST/Live/Interactions), Bedrock, Azure, Vertex AI, Ollama, Cohere | [Providers](https://aimock.copilotkit.dev/docs) |
41-
| **MCPMock** | MCP tools, resources, prompts with session management | [MCP](https://aimock.copilotkit.dev/mcp-mock) |
42-
| **A2AMock** | Agent-to-agent protocol with SSE streaming | [A2A](https://aimock.copilotkit.dev/a2a-mock) |
43-
| **AGUIMock** | AG-UI agent-to-UI event streams for frontend testing | [AG-UI](https://aimock.copilotkit.dev/agui-mock) |
44-
| **VectorMock** | Pinecone, Qdrant, ChromaDB compatible endpoints | [Vector](https://aimock.copilotkit.dev/vector-mock) |
45-
| **Services** | Tavily search, Cohere rerank, OpenAI moderation | [Services](https://aimock.copilotkit.dev/services) |
38+
| Tool | What it mocks | Docs |
39+
| -------------- | ---------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
40+
| **LLMock** | OpenAI (Chat/Responses/Realtime GA+Beta), Claude, Gemini (REST/Live/Interactions), Bedrock, Azure, Vertex AI, Ollama, Cohere | [Providers](https://aimock.copilotkit.dev/docs) |
41+
| **MCPMock** | MCP tools, resources, prompts with session management | [MCP](https://aimock.copilotkit.dev/mcp-mock) |
42+
| **A2AMock** | Agent-to-agent protocol with SSE streaming | [A2A](https://aimock.copilotkit.dev/a2a-mock) |
43+
| **AGUIMock** | AG-UI agent-to-UI event streams for frontend testing | [AG-UI](https://aimock.copilotkit.dev/agui-mock) |
44+
| **VectorMock** | Pinecone, Qdrant, ChromaDB compatible endpoints | [Vector](https://aimock.copilotkit.dev/vector-mock) |
45+
| **Services** | Tavily search, Cohere rerank, OpenAI moderation | [Services](https://aimock.copilotkit.dev/services) |
4646

4747
Run them all on one port with `npx @copilotkit/aimock --config aimock.json`, or use the programmatic API to compose exactly what you need.
4848

4949
## Features
5050

5151
- **[Record & Replay](https://aimock.copilotkit.dev/record-replay)** — Proxy real APIs, save as fixtures, replay deterministically forever
5252
- **[Multi-turn Conversations](https://aimock.copilotkit.dev/multi-turn)** — Record and replay multi-turn traces with tool rounds; match distinct turns via `turnIndex`, `hasToolResult`, `toolCallId`, `sequenceIndex`, `systemMessage` (gate on host-supplied agent context), or custom predicates
53-
- **[12 LLM Providers](https://aimock.copilotkit.dev/docs)** — OpenAI Chat, OpenAI Responses, OpenAI Realtime, Claude, Gemini, Gemini Live, Gemini Interactions, Azure, Bedrock, Vertex AI, Ollama, Cohere — full streaming support
53+
- **[12 LLM Providers](https://aimock.copilotkit.dev/docs)** — OpenAI Chat, OpenAI Responses, OpenAI Realtime (GA + Beta shim), Claude, Gemini, Gemini Live, Gemini Interactions, Azure, Bedrock, Vertex AI, Ollama, Cohere — full streaming support
5454
- **Multimedia APIs**[image generation](https://aimock.copilotkit.dev/images) (DALL-E, Imagen), [text-to-speech](https://aimock.copilotkit.dev/speech), [audio transcription](https://aimock.copilotkit.dev/transcription), [video generation](https://aimock.copilotkit.dev/video)
5555
- **[MCP](https://aimock.copilotkit.dev/mcp-mock) / [A2A](https://aimock.copilotkit.dev/a2a-mock) / [AG-UI](https://aimock.copilotkit.dev/agui-mock) / [Vector](https://aimock.copilotkit.dev/vector-mock)** — Mock every protocol your AI agents use
5656
- **[Chaos Testing](https://aimock.copilotkit.dev/chaos-testing)** — 500 errors, malformed JSON, mid-stream disconnects at any probability
5757
- **Per-Request Strict Mode**`X-AIMock-Strict` header overrides the server-level `--strict` flag per request (`true`/`1` = strict, `false`/`0` = lenient)
5858
- **[Drift Detection](https://aimock.copilotkit.dev/drift-detection)** — Daily CI validation against real APIs
5959
- **[Streaming Physics](https://aimock.copilotkit.dev/streaming-physics)** — Configurable `ttft`, `tps`, and `jitter`
60-
- **[WebSocket APIs](https://aimock.copilotkit.dev/websocket)** — OpenAI Realtime, Responses WS, Gemini Live
60+
- **[WebSocket APIs](https://aimock.copilotkit.dev/websocket)** — OpenAI Realtime (GA protocol with 5 models: gpt-realtime-2, gpt-realtime-1.5, gpt-realtime-mini, gpt-realtime-translate, gpt-realtime-whisper; transcription/translation session types; image input; commentary phase), Responses WS, Gemini Live
6161
- **[Prometheus Metrics](https://aimock.copilotkit.dev/metrics)** — Request counts, latencies, fixture match rates
6262
- **[Docker + Helm](https://aimock.copilotkit.dev/docker)** — Container image and Helm chart for CI/CD
6363
- **[Vitest & Jest Plugins](https://aimock.copilotkit.dev/test-plugins)** — Zero-config `useAimock()` with auto lifecycle and env patching

docs/websocket/index.html

Lines changed: 121 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -109,23 +109,132 @@ <h2>OpenAI Responses (WebSocket)</h2>
109109
</div>
110110

111111
<h2>OpenAI Realtime</h2>
112-
<p>The Realtime API uses a conversational protocol with session management.</p>
112+
<p>
113+
The Realtime API uses a conversational protocol with session management. aimock implements
114+
the
115+
<strong>GA (General Availability) protocol</strong> natively &mdash; event names like
116+
<code>response.output_text.delta</code>, <code>conversation.item.added</code>, and nested
117+
<code>audio</code> session config are the defaults. The Beta protocol is supported via the
118+
<code>OpenAI-Beta: realtime=v1</code> header, which activates a translation shim that
119+
converts GA events back to Beta names (<code>response.text.delta</code>,
120+
<code>conversation.item.created</code>, flat session config).
121+
</p>
122+
123+
<h3>Supported Models</h3>
124+
<table class="endpoint-table">
125+
<thead>
126+
<tr>
127+
<th>Model</th>
128+
<th>Session Types</th>
129+
<th>Notes</th>
130+
</tr>
131+
</thead>
132+
<tbody>
133+
<tr>
134+
<td>gpt-realtime-2</td>
135+
<td>conversation</td>
136+
<td>Default model &mdash; GA successor to gpt-4o-realtime-preview</td>
137+
</tr>
138+
<tr>
139+
<td>gpt-realtime-1.5</td>
140+
<td>conversation</td>
141+
<td>Previous generation GA model</td>
142+
</tr>
143+
<tr>
144+
<td>gpt-realtime-mini</td>
145+
<td>conversation</td>
146+
<td>Smaller, faster GA model</td>
147+
</tr>
148+
<tr>
149+
<td>gpt-realtime-translate</td>
150+
<td>translation</td>
151+
<td>Real-time speech translation</td>
152+
</tr>
153+
<tr>
154+
<td>gpt-realtime-whisper</td>
155+
<td>transcription</td>
156+
<td>Real-time speech transcription</td>
157+
</tr>
158+
</tbody>
159+
</table>
160+
161+
<h3>Session Types</h3>
162+
<ul>
163+
<li>
164+
<strong>conversation</strong> (default) &mdash; Standard conversational interaction with
165+
text and audio modalities
166+
</li>
167+
<li>
168+
<strong>transcription</strong> &mdash; Audio-to-text transcription (requires
169+
<code>gpt-realtime-whisper</code>)
170+
</li>
171+
<li>
172+
<strong>translation</strong> &mdash; Real-time speech translation (requires
173+
<code>gpt-realtime-translate</code>)
174+
</li>
175+
</ul>
176+
177+
<h3>GA Protocol Features</h3>
178+
<ul>
179+
<li>
180+
<strong>GA event names</strong> &mdash; <code>response.output_text.delta</code> (was
181+
<code>response.text.delta</code>), <code>conversation.item.added</code> (was
182+
<code>conversation.item.created</code>), etc.
183+
</li>
184+
<li>
185+
<strong>Nested audio config</strong> &mdash; Session config uses
186+
<code>session.audio.voice</code> instead of flat <code>session.voice</code>
187+
</li>
188+
<li>
189+
<strong>Image input</strong> &mdash; <code>input_image</code> content parts in
190+
<code>conversation.item.create</code>
191+
</li>
192+
<li>
193+
<strong>Commentary phase</strong> &mdash; <code>phase</code> field on
194+
<code>response.output_item.added/done</code> events (<code>final_answer</code> or
195+
<code>commentary</code>)
196+
</li>
197+
<li>
198+
<strong><code>conversation.item.done</code></strong> &mdash; New event emitted after
199+
each completed response item
200+
</li>
201+
<li>
202+
<strong><code>response.cancel</code></strong> &mdash; Client message to cancel in-flight
203+
responses
204+
</li>
205+
</ul>
206+
207+
<h3>Beta Compatibility</h3>
208+
<p>
209+
Clients that send the <code>OpenAI-Beta: realtime=v1</code> header receive Beta-format
210+
events automatically. The shim translates event names, flattens the nested audio config,
211+
and suppresses GA-only events like <code>conversation.item.done</code>. No code changes
212+
needed in tests that target the Beta protocol.
213+
</p>
113214

114215
<div class="code-block">
115-
<div class="code-block-header">ws-realtime.test.ts <span class="lang-tag">ts</span></div>
116-
<pre><code><span class="kw">const</span> <span class="op">ws</span> = <span class="kw">await</span> <span class="fn">connectWebSocket</span>(<span class="op">instance</span>.<span class="prop">url</span>, <span class="str">"/v1/realtime"</span>);
216+
<div class="code-block-header">
217+
ws-realtime.test.ts (GA protocol) <span class="lang-tag">ts</span>
218+
</div>
219+
<pre><code><span class="kw">const</span> <span class="op">ws</span> = <span class="kw">await</span> <span class="fn">connectWebSocket</span>(<span class="op">instance</span>.<span class="prop">url</span>, <span class="str">"/v1/realtime?model=gpt-realtime-2"</span>);
117220

118221
<span class="cm">// Server sends session.created on connect</span>
119222
<span class="kw">const</span> [<span class="op">sessionMsg</span>] = <span class="kw">await</span> <span class="op">ws</span>.<span class="fn">waitForMessages</span>(<span class="num">1</span>);
120-
<span class="fn">expect</span>(<span class="type">JSON</span>.<span class="fn">parse</span>(<span class="op">sessionMsg</span>).<span class="prop">type</span>).<span class="fn">toBe</span>(<span class="str">"session.created"</span>);
223+
<span class="kw">const</span> <span class="op">session</span> = <span class="type">JSON</span>.<span class="fn">parse</span>(<span class="op">sessionMsg</span>);
224+
<span class="fn">expect</span>(<span class="op">session</span>.<span class="prop">type</span>).<span class="fn">toBe</span>(<span class="str">"session.created"</span>);
225+
<span class="fn">expect</span>(<span class="op">session</span>.<span class="prop">session</span>.<span class="prop">type</span>).<span class="fn">toBe</span>(<span class="str">"conversation"</span>);
226+
<span class="fn">expect</span>(<span class="op">session</span>.<span class="prop">session</span>.<span class="prop">audio</span>).<span class="fn">toBeDefined</span>();
121227

122-
<span class="cm">// Configure session</span>
228+
<span class="cm">// Configure session with nested audio config</span>
123229
<span class="op">ws</span>.<span class="fn">send</span>(<span class="type">JSON</span>.<span class="fn">stringify</span>({
124230
<span class="prop">type</span>: <span class="str">"session.update"</span>,
125-
<span class="prop">session</span>: { <span class="prop">modalities</span>: [<span class="str">"text"</span>] }
231+
<span class="prop">session</span>: {
232+
<span class="prop">modalities</span>: [<span class="str">"text"</span>],
233+
<span class="prop">audio</span>: { <span class="prop">voice</span>: <span class="str">"alloy"</span> }
234+
}
126235
}));
127236

128-
<span class="cm">// Add a user message</span>
237+
<span class="cm">// Add a user message (supports input_text + input_image content)</span>
129238
<span class="op">ws</span>.<span class="fn">send</span>(<span class="type">JSON</span>.<span class="fn">stringify</span>({
130239
<span class="prop">type</span>: <span class="str">"conversation.item.create"</span>,
131240
<span class="prop">item</span>: {
@@ -138,10 +247,12 @@ <h2>OpenAI Realtime</h2>
138247
<span class="cm">// Request a response</span>
139248
<span class="op">ws</span>.<span class="fn">send</span>(<span class="type">JSON</span>.<span class="fn">stringify</span>({ <span class="prop">type</span>: <span class="str">"response.create"</span> }));
140249

141-
<span class="cm">// Wait for response events</span>
142-
<span class="kw">const</span> <span class="op">msgs</span> = <span class="kw">await</span> <span class="op">ws</span>.<span class="fn">waitForMessages</span>(<span class="num">8</span>);
250+
<span class="cm">// GA events: output_text instead of text, item.added instead of item.created</span>
251+
<span class="kw">const</span> <span class="op">msgs</span> = <span class="kw">await</span> <span class="op">ws</span>.<span class="fn">waitForMessages</span>(<span class="num">10</span>);
143252
<span class="kw">const</span> <span class="op">events</span> = <span class="op">msgs</span>.<span class="fn">map</span>(<span class="op">m</span> <span class="kw">=&gt;</span> <span class="type">JSON</span>.<span class="fn">parse</span>(<span class="op">m</span>));
144-
<span class="fn">expect</span>(<span class="op">events</span>.<span class="fn">some</span>(<span class="op">e</span> <span class="kw">=&gt;</span> <span class="op">e</span>.<span class="prop">type</span> === <span class="str">"response.text.delta"</span>)).<span class="fn">toBe</span>(<span class="kw">true</span>);</code></pre>
253+
<span class="fn">expect</span>(<span class="op">events</span>.<span class="fn">some</span>(<span class="op">e</span> <span class="kw">=&gt;</span> <span class="op">e</span>.<span class="prop">type</span> === <span class="str">"response.output_text.delta"</span>)).<span class="fn">toBe</span>(<span class="kw">true</span>);
254+
<span class="fn">expect</span>(<span class="op">events</span>.<span class="fn">some</span>(<span class="op">e</span> <span class="kw">=&gt;</span> <span class="op">e</span>.<span class="prop">type</span> === <span class="str">"conversation.item.added"</span>)).<span class="fn">toBe</span>(<span class="kw">true</span>);
255+
<span class="fn">expect</span>(<span class="op">events</span>.<span class="fn">some</span>(<span class="op">e</span> <span class="kw">=&gt;</span> <span class="op">e</span>.<span class="prop">type</span> === <span class="str">"conversation.item.done"</span>)).<span class="fn">toBe</span>(<span class="kw">true</span>);</code></pre>
145256
</div>
146257

147258
<h2>Gemini Live</h2>

0 commit comments

Comments
 (0)