diff --git a/CHANGELOG.md b/CHANGELOG.md
index 90243579..61afe860 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,20 @@
- **Model-aware fixture recording** — recorded fixtures now include the model name in match criteria, preventing collisions when an app makes multiple LLM calls with the same user message but different models. Model names are normalized by stripping date/version suffixes (e.g., `claude-opus-4-20250514` → `claude-opus-4`) so fixtures survive version bumps. Disable with `recordFullModelVersion: true`. ([#185](https://github.com/CopilotKit/aimock/issues/185))
- **Drift detection metadata** — recorded fixtures include `systemHash` and `toolsHash` in a `metadata` block for detecting system prompt or tool definition changes since recording.
- **Prefix model matching** — fixture router uses `startsWith` for string model matching, so `model: "claude-opus-4"` matches any `claude-opus-4-*` version.
+- **GA Realtime protocol migration with Beta compatibility shim** — handler emits GA event names natively; `sendEvent()` wrapper translates back for Beta clients detected via `OpenAI-Beta` header. Default model changed to `gpt-realtime-2`.
+- **5 new GA Realtime models** — `gpt-realtime-2`, `gpt-realtime-1.5`, `gpt-realtime-mini`, `gpt-realtime-translate`, `gpt-realtime-whisper`.
+- **Translate and whisper session types** — dedicated session configurations for translation and transcription workloads on the Realtime API.
+- **Image input support** — Realtime sessions accept image content parts alongside text and audio.
+- **Commentary phase** — Realtime handler supports the GA commentary phase for model-generated annotations.
+- **`conversation.item.done` and `response.cancel` events** — new GA Realtime event types for item completion tracking and response cancellation.
+- **Endpoint type routing for Realtime** — router distinguishes GA vs Beta Realtime endpoints for fixture matching.
+- **Drift detection for GA Realtime** — drift test suite extended with GA protocol shapes, Beta conformance shapes, and three-way triangulation.
+
+### Tests
+
+- **73 GA Realtime integration tests** — comprehensive test coverage for all GA event types, Beta compatibility, session management, model routing, image input, translate/whisper, commentary, and cancellation.
+- **GA and Beta Realtime conformance suites** — API conformance tests validating event shapes against both GA and Beta protocol specs.
+- **GA Realtime drift detection** — SDK shape tests and provider triangulation for the GA Realtime protocol.
## [1.22.1] - 2026-05-12
diff --git a/DRIFT.md b/DRIFT.md
index 35ec2261..ebb9c06a 100644
--- a/DRIFT.md
+++ b/DRIFT.md
@@ -107,7 +107,7 @@ When a model is deprecated:
## WebSocket Drift Coverage
-In addition to the 23 existing drift tests (20 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover aimock's WS protocols (4 verified + 2 canary = 6 WS tests):
+In addition to the 23 existing drift tests (20 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover aimock's WS protocols (6 verified + 2 canary = 8 WS tests):
### Gemini Interactions API (Beta)
@@ -120,13 +120,20 @@ The Gemini Interactions API (`/v1beta/interactions`) is covered by 4 drift tests
Uses `describe.skipIf(!GOOGLE_API_KEY)` like other Gemini tests. The Interactions API is in Beta — shapes may shift as Google iterates on the endpoint.
-| Protocol | Text | Tool Call | Real Endpoint | Status |
-| ------------------- | ---- | --------- | ------------------------------------------------------------------- | ---------- |
-| OpenAI Responses WS | ✓ | ✓ | `wss://api.openai.com/v1/responses` | Verified |
-| OpenAI Realtime | ✓ | ✓ | `wss://api.openai.com/v1/realtime` | Verified |
-| Gemini Live | — | — | `wss://generativelanguage.googleapis.com/ws/...BidiGenerateContent` | Unverified |
+| Protocol | Text | Tool Call | Real Endpoint | Status |
+| ---------------------- | ---- | --------- | ------------------------------------------------------------------- | ---------- |
+| OpenAI Responses WS | ✓ | ✓ | `wss://api.openai.com/v1/responses` | Verified |
+| OpenAI Realtime (GA) | ✓ | ✓ | `wss://api.openai.com/v1/realtime` | Verified |
+| OpenAI Realtime (Beta) | ✓ | ✓ | `wss://api.openai.com/v1/realtime` + `OpenAI-Beta: realtime=v1` | Verified |
+| Gemini Live | — | — | `wss://generativelanguage.googleapis.com/ws/...BidiGenerateContent` | Unverified |
-**Models**: `gpt-4o-mini` for Responses WS, `gpt-4o-mini-realtime-preview` for Realtime.
+**Models**: `gpt-4o-mini` for Responses WS, `gpt-realtime-2` for Realtime GA (was `gpt-4o-mini-realtime-preview`).
+
+**GA Realtime Drift Tests**:
+
+- **Model canary** — Verifies all 5 GA models exist (`gpt-realtime-2`, `gpt-realtime-1.5`, `gpt-realtime-mini`, `gpt-realtime-translate`, `gpt-realtime-whisper`) and flags unknown realtime models
+- **Protocol probe** — Connects with both GA and Beta protocol, normalizes event sequences, and verifies consistency
+- **Event shape validation** — GA event names (`response.output_text.delta`, `conversation.item.added`, `conversation.item.done`) and nested session config (`session.audio.*`, `session.type`, `session.reasoning`)
**Auth**: Uses the same `OPENAI_API_KEY` and `GOOGLE_API_KEY` environment variables as HTTP tests. No new secrets needed.
@@ -175,4 +182,4 @@ The fix workflow also supports `workflow_dispatch` for manual runs.
## Cost
-~29 API calls per run (20 HTTP response-shape + 3 model listing + 6 WS including canaries) using the cheapest available models (`gpt-4o-mini`, `gpt-4o-mini-realtime-preview`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.20/week at daily cadence. When Gemini Live text-capable models become available, the 2 canary tests will become full drift tests, increasing real WS connections from 4 to 6.
+~31 API calls per run (20 HTTP response-shape + 3 model listing + 8 WS including canaries) using the cheapest available models (`gpt-4o-mini`, `gpt-realtime-2`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.25/week at daily cadence. The GA protocol probe adds a second Realtime WS connection (one GA, one Beta) per run. When Gemini Live text-capable models become available, the 2 canary tests will become full drift tests, increasing real WS connections from 6 to 8.
diff --git a/README.md b/README.md
index f8e58271..64dbb03e 100644
--- a/README.md
+++ b/README.md
@@ -35,14 +35,14 @@ await mock.stop();
aimock mocks everything your AI app talks to:
-| Tool | What it mocks | Docs |
-| -------------- | -------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
-| **LLMock** | OpenAI (Chat/Responses/Realtime), Claude, Gemini (REST/Live/Interactions), Bedrock, Azure, Vertex AI, Ollama, Cohere | [Providers](https://aimock.copilotkit.dev/docs) |
-| **MCPMock** | MCP tools, resources, prompts with session management | [MCP](https://aimock.copilotkit.dev/mcp-mock) |
-| **A2AMock** | Agent-to-agent protocol with SSE streaming | [A2A](https://aimock.copilotkit.dev/a2a-mock) |
-| **AGUIMock** | AG-UI agent-to-UI event streams for frontend testing | [AG-UI](https://aimock.copilotkit.dev/agui-mock) |
-| **VectorMock** | Pinecone, Qdrant, ChromaDB compatible endpoints | [Vector](https://aimock.copilotkit.dev/vector-mock) |
-| **Services** | Tavily search, Cohere rerank, OpenAI moderation | [Services](https://aimock.copilotkit.dev/services) |
+| Tool | What it mocks | Docs |
+| -------------- | ---------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
+| **LLMock** | OpenAI (Chat/Responses/Realtime GA+Beta), Claude, Gemini (REST/Live/Interactions), Bedrock, Azure, Vertex AI, Ollama, Cohere | [Providers](https://aimock.copilotkit.dev/docs) |
+| **MCPMock** | MCP tools, resources, prompts with session management | [MCP](https://aimock.copilotkit.dev/mcp-mock) |
+| **A2AMock** | Agent-to-agent protocol with SSE streaming | [A2A](https://aimock.copilotkit.dev/a2a-mock) |
+| **AGUIMock** | AG-UI agent-to-UI event streams for frontend testing | [AG-UI](https://aimock.copilotkit.dev/agui-mock) |
+| **VectorMock** | Pinecone, Qdrant, ChromaDB compatible endpoints | [Vector](https://aimock.copilotkit.dev/vector-mock) |
+| **Services** | Tavily search, Cohere rerank, OpenAI moderation | [Services](https://aimock.copilotkit.dev/services) |
Run them all on one port with `npx @copilotkit/aimock --config aimock.json`, or use the programmatic API to compose exactly what you need.
@@ -50,14 +50,14 @@ Run them all on one port with `npx @copilotkit/aimock --config aimock.json`, or
- **[Record & Replay](https://aimock.copilotkit.dev/record-replay)** — Proxy real APIs, save as fixtures, replay deterministically forever
- **[Multi-turn Conversations](https://aimock.copilotkit.dev/multi-turn)** — Record and replay multi-turn traces with tool rounds; match distinct turns via `turnIndex`, `hasToolResult`, `toolCallId`, `sequenceIndex`, `systemMessage` (gate on host-supplied agent context), or custom predicates
-- **[12 LLM Providers](https://aimock.copilotkit.dev/docs)** — OpenAI Chat, OpenAI Responses, OpenAI Realtime, Claude, Gemini, Gemini Live, Gemini Interactions, Azure, Bedrock, Vertex AI, Ollama, Cohere — full streaming support
+- **[12 LLM Providers](https://aimock.copilotkit.dev/docs)** — OpenAI Chat, OpenAI Responses, OpenAI Realtime (GA + Beta shim), Claude, Gemini, Gemini Live, Gemini Interactions, Azure, Bedrock, Vertex AI, Ollama, Cohere — full streaming support
- **Multimedia APIs** — [image generation](https://aimock.copilotkit.dev/images) (DALL-E, Imagen), [text-to-speech](https://aimock.copilotkit.dev/speech), [audio transcription](https://aimock.copilotkit.dev/transcription), [video generation](https://aimock.copilotkit.dev/video)
- **[MCP](https://aimock.copilotkit.dev/mcp-mock) / [A2A](https://aimock.copilotkit.dev/a2a-mock) / [AG-UI](https://aimock.copilotkit.dev/agui-mock) / [Vector](https://aimock.copilotkit.dev/vector-mock)** — Mock every protocol your AI agents use
- **[Chaos Testing](https://aimock.copilotkit.dev/chaos-testing)** — 500 errors, malformed JSON, mid-stream disconnects at any probability
- **Per-Request Strict Mode** — `X-AIMock-Strict` header overrides the server-level `--strict` flag per request (`true`/`1` = strict, `false`/`0` = lenient)
- **[Drift Detection](https://aimock.copilotkit.dev/drift-detection)** — Daily CI validation against real APIs
- **[Streaming Physics](https://aimock.copilotkit.dev/streaming-physics)** — Configurable `ttft`, `tps`, and `jitter`
-- **[WebSocket APIs](https://aimock.copilotkit.dev/websocket)** — OpenAI Realtime, Responses WS, Gemini Live
+- **[WebSocket APIs](https://aimock.copilotkit.dev/websocket)** — OpenAI Realtime (GA protocol with 5 models: gpt-realtime-2, gpt-realtime-1.5, gpt-realtime-mini, gpt-realtime-translate, gpt-realtime-whisper; transcription/translation session types; image input; commentary phase), Responses WS, Gemini Live
- **[Prometheus Metrics](https://aimock.copilotkit.dev/metrics)** — Request counts, latencies, fixture match rates
- **[Docker + Helm](https://aimock.copilotkit.dev/docker)** — Container image and Helm chart for CI/CD
- **[Vitest & Jest Plugins](https://aimock.copilotkit.dev/test-plugins)** — Zero-config `useAimock()` with auto lifecycle and env patching
diff --git a/docs/websocket/index.html b/docs/websocket/index.html
index d807d3b0..19d54b51 100644
--- a/docs/websocket/index.html
+++ b/docs/websocket/index.html
@@ -109,23 +109,132 @@
OpenAI Responses (WebSocket)
OpenAI Realtime
- The Realtime API uses a conversational protocol with session management.
+
+ The Realtime API uses a conversational protocol with session management. aimock implements
+ the
+ GA (General Availability) protocol natively — event names like
+ response.output_text.delta, conversation.item.added, and nested
+ audio session config are the defaults. The Beta protocol is supported via the
+ OpenAI-Beta: realtime=v1 header, which activates a translation shim that
+ converts GA events back to Beta names (response.text.delta,
+ conversation.item.created, flat session config).
+
+
+ Supported Models
+
+
+
+ | Model |
+ Session Types |
+ Notes |
+
+
+
+
+ | gpt-realtime-2 |
+ conversation |
+ Default model — GA successor to gpt-4o-realtime-preview |
+
+
+ | gpt-realtime-1.5 |
+ conversation |
+ Previous generation GA model |
+
+
+ | gpt-realtime-mini |
+ conversation |
+ Smaller, faster GA model |
+
+
+ | gpt-realtime-translate |
+ translation |
+ Real-time speech translation |
+
+
+ | gpt-realtime-whisper |
+ transcription |
+ Real-time speech transcription |
+
+
+
+
+ Session Types
+
+ -
+ conversation (default) — Standard conversational interaction with
+ text and audio modalities
+
+ -
+ transcription — Audio-to-text transcription (requires
+
gpt-realtime-whisper)
+
+ -
+ translation — Real-time speech translation (requires
+
gpt-realtime-translate)
+
+
+
+ GA Protocol Features
+
+ -
+ GA event names —
response.output_text.delta (was
+ response.text.delta), conversation.item.added (was
+ conversation.item.created), etc.
+
+ -
+ Nested audio config — Session config uses
+
session.audio.voice instead of flat session.voice
+
+ -
+ Image input —
input_image content parts in
+ conversation.item.create
+
+ -
+ Commentary phase —
phase field on
+ response.output_item.added/done events (final_answer or
+ commentary)
+
+ -
+
conversation.item.done — New event emitted after
+ each completed response item
+
+ -
+
response.cancel — Client message to cancel in-flight
+ responses
+
+
+
+ Beta Compatibility
+
+ Clients that send the OpenAI-Beta: realtime=v1 header receive Beta-format
+ events automatically. The shim translates event names, flattens the nested audio config,
+ and suppresses GA-only events like conversation.item.done. No code changes
+ needed in tests that target the Beta protocol.
+
-
-
const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-2");
// Server sends session.created on connect
const [sessionMsg] = await ws.waitForMessages(1);
-expect(JSON.parse(sessionMsg).type).toBe("session.created");
+const session = JSON.parse(sessionMsg);
+expect(session.type).toBe("session.created");
+expect(session.session.type).toBe("conversation");
+expect(session.session.audio).toBeDefined();
-// Configure session
+// Configure session with nested audio config
ws.send(JSON.stringify({
type: "session.update",
- session: { modalities: ["text"] }
+ session: {
+ modalities: ["text"],
+ audio: { voice: "alloy" }
+ }
}));
-// Add a user message
+// Add a user message (supports input_text + input_image content)
ws.send(JSON.stringify({
type: "conversation.item.create",
item: {
@@ -138,10 +247,12 @@ OpenAI Realtime
// Request a response
ws.send(JSON.stringify({ type: "response.create" }));
-// Wait for response events
-const msgs = await ws.waitForMessages(8);
+// GA events: output_text instead of text, item.added instead of item.created
+const msgs = await ws.waitForMessages(10);
const events = msgs.map(m => JSON.parse(m));
-expect(events.some(e => e.type === "response.text.delta")).toBe(true);
+expect(events.some(e => e.type === "response.output_text.delta")).toBe(true);
+expect(events.some(e => e.type === "conversation.item.added")).toBe(true);
+expect(events.some(e => e.type === "conversation.item.done")).toBe(true);
Gemini Live
diff --git a/scripts/update-competitive-matrix.ts b/scripts/update-competitive-matrix.ts
index 6283b40c..edbbe5a9 100644
--- a/scripts/update-competitive-matrix.ts
+++ b/scripts/update-competitive-matrix.ts
@@ -69,6 +69,43 @@ const FEATURE_RULES: FeatureRule[] = [
rowLabel: "WebSocket APIs",
keywords: ["websocket", "realtime", "ws://", "wss://"],
},
+ {
+ rowLabel: "Realtime GA protocol",
+ keywords: [
+ "gpt-realtime-2",
+ "realtime.*ga",
+ "ga.*protocol",
+ "output_text\\.delta",
+ "conversation\\.item\\.added",
+ ],
+ },
+ {
+ rowLabel: "Realtime Beta compatibility",
+ keywords: [
+ "openai-beta.*realtime",
+ "realtime=v1",
+ "beta.*shim",
+ "beta.*compat",
+ "response\\.text\\.delta",
+ ],
+ },
+ {
+ rowLabel: "Realtime translate/whisper",
+ keywords: [
+ "gpt-realtime-translate",
+ "gpt-realtime-whisper",
+ "realtime.*transcription",
+ "realtime.*translation",
+ ],
+ },
+ {
+ rowLabel: "Realtime image input",
+ keywords: ["input_image.*realtime", "realtime.*image", "realtime.*vision"],
+ },
+ {
+ rowLabel: "Realtime commentary phase",
+ keywords: ["commentary.*phase", "phase.*commentary", "final_answer.*commentary"],
+ },
{
rowLabel: "Embeddings API",
keywords: ["/v1/embeddings", "embeddings api", "embedding endpoint", "embedding model"],
@@ -350,6 +387,11 @@ function buildMigrationRowPatterns(rowLabel: string): string[] {
"Request journal": ["Request journal"],
"Drift detection": ["Drift detection"],
"AG-UI event mocking": ["AG-UI event mocking", "AG-UI mocking", "AG-UI"],
+ "Realtime GA protocol": ["Realtime GA protocol", "GA Realtime"],
+ "Realtime Beta compatibility": ["Realtime Beta compatibility", "Beta Realtime"],
+ "Realtime translate/whisper": ["Realtime translate/whisper", "Translate/Whisper"],
+ "Realtime image input": ["Realtime image input"],
+ "Realtime commentary phase": ["Realtime commentary phase", "Commentary phase"],
};
if (variants[rowLabel]) {
diff --git a/src/__tests__/competitive-matrix.test.ts b/src/__tests__/competitive-matrix.test.ts
index 9a9367ed..145209de 100644
--- a/src/__tests__/competitive-matrix.test.ts
+++ b/src/__tests__/competitive-matrix.test.ts
@@ -80,6 +80,43 @@ const FEATURE_RULES: FeatureRule[] = [
rowLabel: "Structured output / JSON mode",
keywords: ["json_object", "json_schema", "structured output", "response_format"],
},
+ {
+ rowLabel: "Realtime GA protocol",
+ keywords: [
+ "gpt-realtime-2",
+ "realtime.*ga",
+ "ga.*protocol",
+ "output_text\\.delta",
+ "conversation\\.item\\.added",
+ ],
+ },
+ {
+ rowLabel: "Realtime Beta compatibility",
+ keywords: [
+ "openai-beta.*realtime",
+ "realtime=v1",
+ "beta.*shim",
+ "beta.*compat",
+ "response\\.text\\.delta",
+ ],
+ },
+ {
+ rowLabel: "Realtime translate/whisper",
+ keywords: [
+ "gpt-realtime-translate",
+ "gpt-realtime-whisper",
+ "realtime.*transcription",
+ "realtime.*translation",
+ ],
+ },
+ {
+ rowLabel: "Realtime image input",
+ keywords: ["input_image.*realtime", "realtime.*image", "realtime.*vision"],
+ },
+ {
+ rowLabel: "Realtime commentary phase",
+ keywords: ["commentary.*phase", "phase.*commentary", "final_answer.*commentary"],
+ },
];
function extractFeatures(text: string): Record {
@@ -117,6 +154,11 @@ function buildMigrationRowPatterns(rowLabel: string): string[] {
"Error injection (one-shot)": ["Error injection"],
"Request journal": ["Request journal"],
"Drift detection": ["Drift detection"],
+ "Realtime GA protocol": ["Realtime GA protocol", "GA Realtime"],
+ "Realtime Beta compatibility": ["Realtime Beta compatibility", "Beta Realtime"],
+ "Realtime translate/whisper": ["Realtime translate/whisper", "Translate/Whisper"],
+ "Realtime image input": ["Realtime image input"],
+ "Realtime commentary phase": ["Realtime commentary phase", "Commentary phase"],
};
if (variants[rowLabel]) {
patterns.push(...variants[rowLabel]);
diff --git a/src/__tests__/drift/sdk-shapes.ts b/src/__tests__/drift/sdk-shapes.ts
index 491c2f3f..b1e53785 100644
--- a/src/__tests__/drift/sdk-shapes.ts
+++ b/src/__tests__/drift/sdk-shapes.ts
@@ -720,7 +720,223 @@ export function openaiRealtimeTextEventShapes(): SSEEventShape[] {
session: {
id: "sess_abc123",
object: "realtime.session",
- model: "gpt-4o-mini",
+ model: "gpt-realtime-2",
+ modalities: ["text"],
+ instructions: "",
+ tools: [],
+ audio: {
+ voice: null,
+ input_audio_format: null,
+ output_audio_format: null,
+ input_audio_noise_reduction: null,
+ input_audio_transcription: null,
+ },
+ turn_detection: null,
+ temperature: 0.8,
+ expires_at: 1700000000,
+ max_response_output_tokens: "inf",
+ input_audio_transcription: null,
+ tool_choice: "auto",
+ type: "conversation",
+ reasoning: null,
+ },
+ }),
+ },
+ {
+ type: "session.updated",
+ dataShape: extractShape({
+ type: "session.updated",
+ event_id: "event_abc123",
+ session: {
+ object: "realtime.session",
+ model: "gpt-realtime-2",
+ modalities: ["text"],
+ instructions: "",
+ tools: [],
+ audio: {
+ voice: null,
+ input_audio_format: null,
+ output_audio_format: null,
+ input_audio_noise_reduction: null,
+ input_audio_transcription: null,
+ },
+ turn_detection: null,
+ temperature: 0.8,
+ expires_at: 1700000000,
+ max_response_output_tokens: "inf",
+ input_audio_transcription: null,
+ tool_choice: "auto",
+ type: "conversation",
+ reasoning: null,
+ },
+ }),
+ },
+ {
+ type: "conversation.item.added",
+ dataShape: extractShape({
+ type: "conversation.item.added",
+ event_id: "event_abc123",
+ previous_item_id: null,
+ item: {
+ type: "message",
+ id: "item_abc123",
+ role: "user",
+ content: [{ type: "input_text", text: "Say hello" }],
+ },
+ }),
+ },
+ {
+ type: "response.created",
+ dataShape: extractShape({
+ type: "response.created",
+ event_id: "event_abc123",
+ response: {
+ id: "resp_abc123",
+ object: "realtime.response",
+ status: "in_progress",
+ status_details: null,
+ output: [],
+ usage: null,
+ },
+ }),
+ },
+ {
+ type: "response.output_item.added",
+ dataShape: extractShape({
+ type: "response.output_item.added",
+ event_id: "event_abc123",
+ response_id: "resp_abc123",
+ output_index: 0,
+ item: {
+ id: "item_abc123",
+ type: "message",
+ role: "assistant",
+ status: "in_progress",
+ content: [],
+ phase: "final_answer",
+ },
+ }),
+ },
+ {
+ type: "response.content_part.added",
+ dataShape: extractShape({
+ type: "response.content_part.added",
+ event_id: "event_abc123",
+ response_id: "resp_abc123",
+ item_id: "item_abc123",
+ output_index: 0,
+ content_index: 0,
+ part: { type: "output_text", text: "" },
+ }),
+ },
+ {
+ type: "response.output_text.delta",
+ dataShape: extractShape({
+ type: "response.output_text.delta",
+ event_id: "event_abc123",
+ response_id: "resp_abc123",
+ item_id: "item_abc123",
+ output_index: 0,
+ content_index: 0,
+ delta: "Hello",
+ }),
+ },
+ {
+ type: "response.output_text.done",
+ dataShape: extractShape({
+ type: "response.output_text.done",
+ event_id: "event_abc123",
+ response_id: "resp_abc123",
+ item_id: "item_abc123",
+ output_index: 0,
+ content_index: 0,
+ text: "Hello!",
+ }),
+ },
+ {
+ type: "response.content_part.done",
+ dataShape: extractShape({
+ type: "response.content_part.done",
+ event_id: "event_abc123",
+ response_id: "resp_abc123",
+ item_id: "item_abc123",
+ output_index: 0,
+ content_index: 0,
+ part: { type: "output_text", text: "Hello!" },
+ }),
+ },
+ {
+ type: "response.output_item.done",
+ dataShape: extractShape({
+ type: "response.output_item.done",
+ event_id: "event_abc123",
+ response_id: "resp_abc123",
+ output_index: 0,
+ item: {
+ id: "item_abc123",
+ type: "message",
+ role: "assistant",
+ status: "completed",
+ content: [{ type: "output_text", text: "Hello!" }],
+ phase: "final_answer",
+ },
+ }),
+ },
+ {
+ type: "conversation.item.done",
+ dataShape: extractShape({
+ type: "conversation.item.done",
+ event_id: "event_abc123",
+ item: {
+ id: "item_abc123",
+ object: "realtime.item",
+ type: "message",
+ role: "assistant",
+ status: "completed",
+ content: [{ type: "output_text", text: "Hello!" }],
+ },
+ }),
+ },
+ {
+ type: "response.done",
+ dataShape: extractShape({
+ type: "response.done",
+ event_id: "event_abc123",
+ response: {
+ id: "resp_abc123",
+ object: "realtime.response",
+ status: "completed",
+ output: [
+ {
+ id: "item_abc123",
+ type: "message",
+ role: "assistant",
+ status: "completed",
+ content: [{ type: "output_text", text: "Hello!" }],
+ },
+ ],
+ usage: { total_tokens: 0, input_tokens: 0, output_tokens: 0 },
+ },
+ }),
+ },
+ ];
+}
+
+/**
+ * Beta Realtime event shapes — uses legacy Beta event names.
+ * Useful for three-way comparison when testing Beta compatibility shim.
+ */
+export function openaiRealtimeBetaTextEventShapes(): SSEEventShape[] {
+ return [
+ {
+ type: "session.created",
+ dataShape: extractShape({
+ type: "session.created",
+ event_id: "event_abc123",
+ session: {
+ id: "sess_abc123",
+ object: "realtime.session",
+ model: "gpt-realtime-2",
modalities: ["text"],
instructions: "",
tools: [],
@@ -743,7 +959,7 @@ export function openaiRealtimeTextEventShapes(): SSEEventShape[] {
event_id: "event_abc123",
session: {
object: "realtime.session",
- model: "gpt-4o-mini",
+ model: "gpt-realtime-2",
modalities: ["text"],
instructions: "",
tools: [],
diff --git a/src/__tests__/drift/ws-providers.ts b/src/__tests__/drift/ws-providers.ts
index ba840922..d83059fb 100644
--- a/src/__tests__/drift/ws-providers.ts
+++ b/src/__tests__/drift/ws-providers.ts
@@ -344,15 +344,19 @@ export async function openaiRealtimeWS(
config: ProviderConfig,
text: string,
tools?: object[],
+ beta = true,
): Promise {
// Realtime API requires a realtime-specific model (gpt-4o-mini doesn't work)
+ const headers: Record = {
+ Authorization: `Bearer ${config.apiKey}`,
+ };
+ if (beta) {
+ headers["OpenAI-Beta"] = "realtime=v1";
+ }
const ws = await connectTLSWebSocket(
"api.openai.com",
- "/v1/realtime?model=gpt-4o-mini-realtime-preview",
- {
- Authorization: `Bearer ${config.apiKey}`,
- "OpenAI-Beta": "realtime=v1",
- },
+ "/v1/realtime?model=gpt-realtime-2",
+ headers,
);
// Step 1: Wait for session.created
@@ -360,7 +364,7 @@ export async function openaiRealtimeWS(
// Step 2: Send session.update
const session: Record = {
- model: "gpt-4o-mini-realtime-preview",
+ model: "gpt-realtime-2",
modalities: ["text"],
};
if (tools) session.tools = tools;
@@ -381,8 +385,11 @@ export async function openaiRealtimeWS(
}),
);
- // Step 5: Wait for conversation.item.created
- const itemCreated = await ws.waitUntil((msg: any) => msg?.type === "conversation.item.created");
+ // Step 5: Wait for conversation.item.created (Beta) or conversation.item.added (GA)
+ const itemCreated = await ws.waitUntil(
+ (msg: any) =>
+ msg?.type === "conversation.item.created" || msg?.type === "conversation.item.added",
+ );
// Step 6: Send response.create
ws.send(JSON.stringify({ type: "response.create" }));
diff --git a/src/__tests__/drift/ws-realtime.drift.ts b/src/__tests__/drift/ws-realtime.drift.ts
index ecd5203c..56c84bfb 100644
--- a/src/__tests__/drift/ws-realtime.drift.ts
+++ b/src/__tests__/drift/ws-realtime.drift.ts
@@ -2,6 +2,7 @@
* OpenAI Realtime API WebSocket drift tests.
*
* Three-way comparison: SDK types x real API (WS) x aimock output (WS).
+ * Updated for GA protocol — uses gpt-realtime-2 and GA event names.
*/
import { describe, it, expect, beforeAll, afterAll } from "vitest";
@@ -13,6 +14,22 @@ import { listOpenAIModels } from "./providers.js";
import { startDriftServer, stopDriftServer, collectMockWSMessages } from "./helpers.js";
import { connectWebSocket } from "../ws-test-client.js";
+// ---------------------------------------------------------------------------
+// GA <-> Beta event name mapping (local copy for normalization in tests)
+// ---------------------------------------------------------------------------
+
+const GA_TO_BETA_EVENT: Record = {
+ "response.output_text.delta": "response.text.delta",
+ "response.output_text.done": "response.text.done",
+ "response.output_audio.delta": "response.audio.delta",
+ "response.output_audio.done": "response.audio.done",
+ "response.output_audio_transcript.delta": "response.audio_transcript.delta",
+ "response.output_audio_transcript.done": "response.audio_transcript.done",
+ "conversation.item.added": "conversation.item.created",
+};
+
+const BETA_SUPPRESSED_EVENTS = new Set(["conversation.item.done"]);
+
// ---------------------------------------------------------------------------
// Server lifecycle
// ---------------------------------------------------------------------------
@@ -32,32 +49,49 @@ afterAll(async () => {
// Tests
// ---------------------------------------------------------------------------
-const REALTIME_MODEL = "gpt-4o-mini-realtime-preview";
-
describe.skipIf(!OPENAI_API_KEY)("OpenAI Realtime API drift", () => {
const config = { apiKey: OPENAI_API_KEY! };
- it("canary: realtime preview model still available", async () => {
+ it("canary: GA realtime models available", async () => {
const models = await listOpenAIModels(config.apiKey);
- const found = models.some((m) => m === REALTIME_MODEL || m.startsWith(`${REALTIME_MODEL}-`));
- if (!found) {
- // Check if a GA model replaced it
- const ga = models.find((m) => m === "gpt-4o-mini-realtime" || m === "gpt-realtime-mini");
- const hint = ga ? ` Found GA model "${ga}" — update REALTIME_MODEL.` : "";
- expect.fail(
- `Realtime model "${REALTIME_MODEL}" no longer in model listing.${hint} ` +
- `Update ws-providers.ts and this test.`,
- );
+
+ const gaModels = [
+ "gpt-realtime-2",
+ "gpt-realtime-1.5",
+ "gpt-realtime-mini",
+ "gpt-realtime-translate",
+ "gpt-realtime-whisper",
+ ];
+ const knownModels = new Set([
+ ...gaModels,
+ // Legacy preview models (may still appear)
+ "gpt-4o-realtime-preview",
+ "gpt-4o-mini-realtime-preview",
+ "gpt-4o-realtime-preview-2024-10-01",
+ "gpt-4o-mini-realtime-preview-2024-12-17",
+ ]);
+
+ const realtimeModels = models.filter((m) => m.includes("realtime"));
+
+ // At least one GA model should exist
+ const hasGA = realtimeModels.some((m) => gaModels.includes(m));
+ expect(hasGA).toBe(true);
+
+ // Flag unknown realtime models
+ const unknown = realtimeModels.filter((m) => !knownModels.has(m));
+ if (unknown.length > 0) {
+ console.warn(`[DRIFT] Unknown realtime models detected: ${unknown.join(", ")}`);
}
+ expect(unknown).toEqual([]);
});
- it("WS text event sequence and shapes match", async () => {
+ it("WS text event sequence and shapes match (GA)", async () => {
const sdkEvents = openaiRealtimeTextEventShapes();
- // Real API
- const realResult = await openaiRealtimeWS(config, "Say hello");
+ // Real API — GA mode (no Beta header)
+ const realResult = await openaiRealtimeWS(config, "Say hello", undefined, false);
- // Mock — replicate the Realtime protocol sequence
+ // Mock — replicate the Realtime protocol sequence (GA mode)
const mockWs = await connectWebSocket(instance.url, "/v1/realtime");
// session.created is sent automatically on connect
@@ -114,7 +148,7 @@ describe.skipIf(!OPENAI_API_KEY)("OpenAI Realtime API drift", () => {
expect(mockEvents.length, "Mock returned no WS messages").toBeGreaterThan(0);
const diffs = compareSSESequences(sdkEvents, realResult.events, mockEvents);
- const report = formatDriftReport("OpenAI Realtime WS (text events)", diffs);
+ const report = formatDriftReport("OpenAI Realtime WS (GA text events)", diffs);
expect(
diffs.filter((d) => d.severity === "critical"),
@@ -122,13 +156,13 @@ describe.skipIf(!OPENAI_API_KEY)("OpenAI Realtime API drift", () => {
).toEqual([]);
});
- it("WS tool call event sequence matches", async () => {
+ it("WS tool call event sequence matches (GA)", async () => {
const sdkEvents = [
...openaiRealtimeTextEventShapes().filter(
(e) =>
e.type === "session.created" ||
e.type === "session.updated" ||
- e.type === "conversation.item.created" ||
+ e.type === "conversation.item.added" ||
e.type === "response.created" ||
e.type === "response.done",
),
@@ -148,8 +182,8 @@ describe.skipIf(!OPENAI_API_KEY)("OpenAI Realtime API drift", () => {
},
];
- // Real API
- const realResult = await openaiRealtimeWS(config, "Weather in Paris", tools);
+ // Real API — GA mode
+ const realResult = await openaiRealtimeWS(config, "Weather in Paris", tools, false);
// Mock — replicate the Realtime protocol sequence
const mockWs = await connectWebSocket(instance.url, "/v1/realtime");
@@ -208,11 +242,35 @@ describe.skipIf(!OPENAI_API_KEY)("OpenAI Realtime API drift", () => {
expect(mockEvents.length, "Mock returned no WS messages").toBeGreaterThan(0);
const diffs = compareSSESequences(sdkEvents, realResult.events, mockEvents);
- const report = formatDriftReport("OpenAI Realtime WS (tool call events)", diffs);
+ const report = formatDriftReport("OpenAI Realtime WS (GA tool call events)", diffs);
expect(
diffs.filter((d) => d.severity === "critical"),
report,
).toEqual([]);
});
+
+ it("GA and Beta event sequences are consistent after normalization", async () => {
+ // GA connection (no Beta header)
+ const gaResult = await openaiRealtimeWS(config, "Say hello in one word.", undefined, false);
+
+ // Beta connection
+ const betaResult = await openaiRealtimeWS(config, "Say hello in one word.", undefined, true);
+
+ // Normalize GA events to Beta names for comparison
+ const gaToComparable = (type: string) => GA_TO_BETA_EVENT[type] ?? type;
+
+ const gaTypes = gaResult.events
+ .map((e) => e.type)
+ .filter((t) => !BETA_SUPPRESSED_EVENTS.has(t))
+ .map(gaToComparable);
+ const betaTypes = betaResult.events.map((e) => e.type);
+
+ // Deduplicate consecutive repeated types so that differences in delta
+ // count (non-deterministic LLM output length) don't cause false failures.
+ function dedupeConsecutive(types: string[]): string[] {
+ return types.filter((t, i) => i === 0 || t !== types[i - 1]);
+ }
+ expect(dedupeConsecutive(gaTypes)).toEqual(dedupeConsecutive(betaTypes));
+ });
});
diff --git a/src/__tests__/ws-api-conformance.test.ts b/src/__tests__/ws-api-conformance.test.ts
index 234be227..edd8a787 100644
--- a/src/__tests__/ws-api-conformance.test.ts
+++ b/src/__tests__/ws-api-conformance.test.ts
@@ -391,15 +391,15 @@ describe("WS Realtime API conformance", () => {
});
});
- describe("conversation.item.created", () => {
- it("conversation.item.created has item with id", async () => {
+ describe("conversation.item.added", () => {
+ it("conversation.item.added has item with id", async () => {
const ws = await connectWebSocket(instance.url, "/v1/realtime");
await ws.waitForMessages(1); // session.created
ws.send(realtimeItemCreate("hello"));
const raw = await ws.waitForMessages(2);
ws.close();
const frame = JSON.parse(raw[1]) as any;
- expect(frame.type).toBe("conversation.item.created");
+ expect(frame.type).toBe("conversation.item.added");
expect(typeof frame.item.id).toBe("string");
expect(frame.item.id.length).toBeGreaterThan(0);
});
@@ -410,12 +410,12 @@ describe("WS Realtime API conformance", () => {
const ws = await connectWebSocket(instance.url, "/v1/realtime");
await ws.waitForMessages(1); // session.created
ws.send(realtimeItemCreate("hello"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(realtimeResponseCreate());
- // session.created + item.created + response.created + output_item.added
- // + content_part.added + text.delta(s) + text.done + content_part.done
- // + output_item.done + response.done = 10 min
- const raw = await ws.waitForMessages(10);
+ // session.created + item.added + response.created + output_item.added
+ // + content_part.added + output_text.delta(s) + output_text.done + content_part.done
+ // + output_item.done + conversation.item.done + response.done = 11 min
+ const raw = await ws.waitForMessages(11);
ws.close();
return raw.slice(2).map((m) => JSON.parse(m) as any);
}
@@ -445,18 +445,18 @@ describe("WS Realtime API conformance", () => {
expect((itemAdded.item as any).role).toBe("assistant");
});
- it("response.content_part.added has part with type text", async () => {
+ it("response.content_part.added has part with type output_text", async () => {
const frames = await getTextResponseFrames();
const partAdded = frames.find((f: any) => f.type === "response.content_part.added")!;
expect(partAdded).toBeDefined();
const part = (partAdded as any).part;
- expect(part.type).toBe("text");
+ expect(part.type).toBe("output_text");
expect(part.text).toBe("");
});
- it("response.text.delta has response_id, item_id, output_index, content_index, delta as string", async () => {
+ it("response.output_text.delta has response_id, item_id, output_index, content_index, delta as string", async () => {
const frames = await getTextResponseFrames();
- const deltas = frames.filter((f: any) => f.type === "response.text.delta");
+ const deltas = frames.filter((f: any) => f.type === "response.output_text.delta");
expect(deltas.length).toBeGreaterThan(0);
for (const d of deltas) {
expect(typeof (d as any).response_id).toBe("string");
@@ -467,19 +467,19 @@ describe("WS Realtime API conformance", () => {
}
});
- it("response.text.done has full text", async () => {
+ it("response.output_text.done has full text", async () => {
const frames = await getTextResponseFrames();
- const textDone = frames.find((f: any) => f.type === "response.text.done")!;
+ const textDone = frames.find((f: any) => f.type === "response.output_text.done")!;
expect(textDone).toBeDefined();
expect((textDone as any).text).toBe("Hi there!");
});
- it("response.content_part.done has part with type text and text content", async () => {
+ it("response.content_part.done has part with type output_text and text content", async () => {
const frames = await getTextResponseFrames();
const partDone = frames.find((f: any) => f.type === "response.content_part.done")!;
expect(partDone).toBeDefined();
const part = (partDone as any).part;
- expect(part.type).toBe("text");
+ expect(part.type).toBe("output_text");
expect(typeof part.text).toBe("string");
expect(part.text).toBe("Hi there!");
});
@@ -510,11 +510,11 @@ describe("WS Realtime API conformance", () => {
const ws = await connectWebSocket(instance.url, "/v1/realtime");
await ws.waitForMessages(1); // session.created
ws.send(realtimeItemCreate("weather"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(realtimeResponseCreate());
- // session.created + item.created + response.created + output_item.added
- // + args.delta(s) + args.done + output_item.done + response.done = 8 min
- const raw = await ws.waitForMessages(8);
+ // session.created + item.added + response.created + output_item.added
+ // + args.delta(s) + args.done + output_item.done + conversation.item.done + response.done = 9 min
+ const raw = await ws.waitForMessages(9);
ws.close();
return raw.slice(2).map((m) => JSON.parse(m) as any);
}
@@ -560,7 +560,7 @@ describe("WS Realtime API conformance", () => {
const ws = await connectWebSocket(instance.url, "/v1/realtime");
await ws.waitForMessages(1); // session.created
ws.send(realtimeItemCreate("no-match-xyz-9999"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(realtimeResponseCreate());
const raw = await ws.waitForMessages(4); // + response.created + response.done
ws.close();
@@ -577,7 +577,7 @@ describe("WS Realtime API conformance", () => {
const ws = await connectWebSocket(instance.url, "/v1/realtime");
await ws.waitForMessages(1); // session.created
ws.send(realtimeItemCreate("error-test"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(realtimeResponseCreate());
const raw = await ws.waitForMessages(4); // + response.created + response.done
ws.close();
@@ -602,6 +602,168 @@ describe("WS Realtime API conformance", () => {
});
});
+// ---------------------------------------------------------------------------
+// 7a. GA Realtime conformance
+// ---------------------------------------------------------------------------
+
+describe("GA Realtime conformance", () => {
+ it("session.created includes nested audio config, type field, and reasoning field", async () => {
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-2");
+ const raw = await ws.waitForMessages(1);
+ ws.close();
+ const frame = JSON.parse(raw[0]) as any;
+ expect(frame.type).toBe("session.created");
+ const session = frame.session;
+ expect(session).toHaveProperty("audio");
+ expect(session).toHaveProperty("type", "conversation");
+ expect(session).toHaveProperty("reasoning");
+ expect(session.audio).toMatchObject({
+ voice: null,
+ input_audio_format: null,
+ output_audio_format: null,
+ input_audio_noise_reduction: null,
+ input_audio_transcription: null,
+ });
+ });
+
+ it("emits conversation.item.added (not .created) and conversation.item.done", async () => {
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-2");
+ await ws.waitForMessages(1); // session.created
+ ws.send(realtimeItemCreate("hello"));
+ await ws.waitForMessages(2); // + conversation.item.added
+ ws.send(realtimeResponseCreate());
+ const raw = await ws.waitForMessages(11);
+ ws.close();
+ const types = raw.map((m) => (JSON.parse(m) as any).type);
+ expect(types).toContain("conversation.item.added");
+ expect(types).not.toContain("conversation.item.created");
+ expect(types).toContain("conversation.item.done");
+ });
+
+ it("output_item events include phase field", async () => {
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-2");
+ await ws.waitForMessages(1); // session.created
+ ws.send(realtimeItemCreate("hello"));
+ await ws.waitForMessages(2); // + conversation.item.added
+ ws.send(realtimeResponseCreate());
+ const raw = await ws.waitForMessages(11);
+ ws.close();
+ const frames = raw.map((m) => JSON.parse(m) as any);
+ const outputItemAdded = frames.find((f: any) => f.type === "response.output_item.added");
+ expect(outputItemAdded).toBeDefined();
+ expect(outputItemAdded.item).toHaveProperty("phase");
+ expect(outputItemAdded.item.phase).toBe("final_answer");
+
+ const outputItemDone = frames.find((f: any) => f.type === "response.output_item.done");
+ expect(outputItemDone).toBeDefined();
+ expect(outputItemDone.item).toHaveProperty("phase");
+ });
+
+ it("response.cancel emits response.cancelled", async () => {
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-2");
+ await ws.waitForMessages(1); // session.created
+ ws.send(JSON.stringify({ type: "response.cancel" }));
+ const raw = await ws.waitForMessages(2);
+ ws.close();
+ const frame = JSON.parse(raw[1]) as any;
+ expect(frame.type).toBe("response.cancelled");
+ });
+
+ it("content types use GA names (output_text, not text)", async () => {
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-2");
+ await ws.waitForMessages(1); // session.created
+ ws.send(realtimeItemCreate("hello"));
+ await ws.waitForMessages(2); // + conversation.item.added
+ ws.send(realtimeResponseCreate());
+ const raw = await ws.waitForMessages(11);
+ ws.close();
+ const frames = raw.map((m) => JSON.parse(m) as any);
+ const contentPartAdded = frames.find((f: any) => f.type === "response.content_part.added");
+ expect(contentPartAdded).toBeDefined();
+ expect(contentPartAdded.part.type).toBe("output_text");
+
+ const contentPartDone = frames.find((f: any) => f.type === "response.content_part.done");
+ expect(contentPartDone).toBeDefined();
+ expect(contentPartDone.part.type).toBe("output_text");
+ });
+});
+
+// ---------------------------------------------------------------------------
+// 7b. Beta Realtime conformance (OpenAI-Beta: realtime=v1)
+// ---------------------------------------------------------------------------
+
+describe("Beta Realtime conformance (OpenAI-Beta: realtime=v1)", () => {
+ it("session.created uses flat config (no nested audio, no type, no reasoning)", async () => {
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-2", {
+ "OpenAI-Beta": "realtime=v1",
+ });
+ const raw = await ws.waitForMessages(1);
+ ws.close();
+ const frame = JSON.parse(raw[0]) as any;
+ expect(frame.type).toBe("session.created");
+ const session = frame.session;
+ // Beta: flat config — voice at top level, no nested audio object
+ expect(session).toHaveProperty("voice");
+ expect(session).not.toHaveProperty("audio");
+ expect(session).not.toHaveProperty("type");
+ expect(session).not.toHaveProperty("reasoning");
+ });
+
+ it("emits Beta event names (response.text.delta, conversation.item.created)", async () => {
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-2", {
+ "OpenAI-Beta": "realtime=v1",
+ });
+ await ws.waitForMessages(1); // session.created
+ ws.send(realtimeItemCreate("hello"));
+ await ws.waitForMessages(2); // + conversation.item.created (Beta name)
+ ws.send(realtimeResponseCreate());
+ // Beta: no conversation.item.done, so fewer events than GA
+ const raw = await ws.waitForMessages(10);
+ ws.close();
+ const types = raw.map((m) => (JSON.parse(m) as any).type);
+ expect(types).toContain("conversation.item.created");
+ expect(types).not.toContain("conversation.item.added");
+ expect(types).toContain("response.text.delta");
+ expect(types).not.toContain("response.output_text.delta");
+ expect(types).toContain("response.text.done");
+ expect(types).not.toContain("response.output_text.done");
+ });
+
+ it("does NOT emit conversation.item.done (suppressed in Beta)", async () => {
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-2", {
+ "OpenAI-Beta": "realtime=v1",
+ });
+ await ws.waitForMessages(1); // session.created
+ ws.send(realtimeItemCreate("hello"));
+ await ws.waitForMessages(2); // + conversation.item.created
+ ws.send(realtimeResponseCreate());
+ const raw = await ws.waitForMessages(10);
+ ws.close();
+ const types = raw.map((m) => (JSON.parse(m) as any).type);
+ expect(types).not.toContain("conversation.item.done");
+ });
+
+ it("uses Beta content type names (text, not output_text)", async () => {
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-2", {
+ "OpenAI-Beta": "realtime=v1",
+ });
+ await ws.waitForMessages(1); // session.created
+ ws.send(realtimeItemCreate("hello"));
+ await ws.waitForMessages(2); // + conversation.item.created
+ ws.send(realtimeResponseCreate());
+ const raw = await ws.waitForMessages(10);
+ ws.close();
+ const frames = raw.map((m) => JSON.parse(m) as any);
+ const contentPartAdded = frames.find((f: any) => f.type === "response.content_part.added");
+ expect(contentPartAdded).toBeDefined();
+ expect(contentPartAdded.part.type).toBe("text");
+
+ const contentPartDone = frames.find((f: any) => f.type === "response.content_part.done");
+ expect(contentPartDone).toBeDefined();
+ expect(contentPartDone.part.type).toBe("text");
+ });
+});
+
// ---------------------------------------------------------------------------
// 8. WS Gemini Live BidiGenerateContent conformance
// ---------------------------------------------------------------------------
diff --git a/src/__tests__/ws-realtime.test.ts b/src/__tests__/ws-realtime.test.ts
index 1775a839..90877162 100644
--- a/src/__tests__/ws-realtime.test.ts
+++ b/src/__tests__/ws-realtime.test.ts
@@ -18,6 +18,14 @@ const toolFixture: Fixture = {
},
};
+const contentWithToolCallsFixture: Fixture = {
+ match: { userMessage: "commentary-phase" },
+ response: {
+ content: "Let me check the weather for you.",
+ toolCalls: [{ name: "get_weather", arguments: '{"city":"NYC"}' }],
+ },
+};
+
const errorFixture: Fixture = {
match: { userMessage: "fail" },
response: {
@@ -26,7 +34,12 @@ const errorFixture: Fixture = {
},
};
-const allFixtures: Fixture[] = [textFixture, toolFixture, errorFixture];
+const allFixtures: Fixture[] = [
+ textFixture,
+ toolFixture,
+ contentWithToolCallsFixture,
+ errorFixture,
+];
// --- helpers ---
@@ -129,12 +142,20 @@ describe("WebSocket /v1/realtime", () => {
expect(session.modalities).toEqual(["text"]);
expect(session.instructions).toBe("");
expect(session.tools).toEqual([]);
- expect(session.voice).toBeNull();
expect(session.temperature).toBe(0.8);
expect(typeof session.expires_at).toBe("number");
expect(session.max_response_output_tokens).toBe("inf");
- expect(session.input_audio_transcription).toBeNull();
expect(session.tool_choice).toBe("auto");
+ expect(session.type).toBe("conversation");
+ expect(session.reasoning).toBeNull();
+ // GA nested audio config
+ const audio = session.audio as Record;
+ expect(audio).toBeDefined();
+ expect(audio.voice).toBeNull();
+ expect(audio.input_audio_format).toBeNull();
+ expect(audio.output_audio_format).toBeNull();
+ expect(audio.input_audio_noise_reduction).toBeNull();
+ expect(audio.input_audio_transcription).toBeNull();
ws.close();
});
@@ -163,8 +184,12 @@ describe("WebSocket /v1/realtime", () => {
expect(session.object).toBe("realtime.session");
expect(typeof session.expires_at).toBe("number");
expect(session.max_response_output_tokens).toBe("inf");
- expect(session.input_audio_transcription).toBeNull();
expect(session.tool_choice).toBe("auto");
+ expect(session.type).toBe("conversation");
+ // GA nested audio config
+ const audio = session.audio as Record;
+ expect(audio).toBeDefined();
+ expect(audio.voice).toBeNull();
ws.close();
});
@@ -178,32 +203,33 @@ describe("WebSocket /v1/realtime", () => {
ws.send(conversationItemCreate("user", "hello"));
- // Wait for conversation.item.created ack
+ // Wait for conversation.item.added ack (GA name)
const ackRaw = await ws.waitForMessages(2);
const ackEvent = JSON.parse(ackRaw[1]) as WSEvent;
- expect(ackEvent.type).toBe("conversation.item.created");
+ expect(ackEvent.type).toBe("conversation.item.added");
ws.send(responseCreate());
// Text stream: response.created + output_item.added + content_part.added
- // + text.delta(s) + text.done + content_part.done + output_item.done + response.done
- // = 8 minimum events (1 delta for small text with default chunkSize=20)
- // Total messages: 2 (session.created + item.created) + 8 = 10
- const allRaw = await ws.waitForMessages(10);
+ // + output_text.delta(s) + output_text.done + content_part.done + output_item.done
+ // + conversation.item.done + response.done
+ // = 9 minimum events (1 delta for small text with default chunkSize=20)
+ // Total messages: 2 (session.created + item.added) + 9 = 11
+ const allRaw = await ws.waitForMessages(11);
const responseEvents = parseEvents(allRaw.slice(2));
const types = responseEvents.map((e) => e.type);
expect(types[0]).toBe("response.created");
expect(types).toContain("response.output_item.added");
expect(types).toContain("response.content_part.added");
- expect(types).toContain("response.text.delta");
- expect(types).toContain("response.text.done");
+ expect(types).toContain("response.output_text.delta");
+ expect(types).toContain("response.output_text.done");
expect(types).toContain("response.content_part.done");
expect(types).toContain("response.output_item.done");
expect(types[types.length - 1]).toBe("response.done");
// Verify text deltas reconstruct to "Hi there!"
- const deltas = responseEvents.filter((e) => e.type === "response.text.delta");
+ const deltas = responseEvents.filter((e) => e.type === "response.output_text.delta");
const fullText = deltas.map((d) => d.delta).join("");
expect(fullText).toBe("Hi there!");
@@ -234,10 +260,10 @@ describe("WebSocket /v1/realtime", () => {
expect(resp.usage).toEqual({ total_tokens: 0, input_tokens: 0, output_tokens: 0 });
expect(Array.isArray(resp.output)).toBe(true);
- // Verify conversation.item.created has previous_item_id (null for first item)
- const itemCreatedEvent = parseEvents(allRaw.slice(1, 2))[0];
- expect(itemCreatedEvent.type).toBe("conversation.item.created");
- expect(itemCreatedEvent.previous_item_id).toBeNull();
+ // Verify conversation.item.added has previous_item_id (null for first item)
+ const itemAddedEvent = parseEvents(allRaw.slice(1, 2))[0];
+ expect(itemAddedEvent.type).toBe("conversation.item.added");
+ expect(itemAddedEvent.previous_item_id).toBeNull();
// Send a second item and verify previous_item_id points to the last conversation item
// (the assistant response item pushed during handleResponseCreate)
@@ -245,7 +271,7 @@ describe("WebSocket /v1/realtime", () => {
ws.send(conversationItemCreate("user", "how are you?"));
const secondAckRaw = await ws.waitForMessages(allRaw.length + 1);
const secondAckEvent = JSON.parse(secondAckRaw[secondAckRaw.length - 1]) as WSEvent;
- expect(secondAckEvent.type).toBe("conversation.item.created");
+ expect(secondAckEvent.type).toBe("conversation.item.added");
expect(secondAckEvent.previous_item_id).toBe(assistantItemId);
ws.close();
@@ -258,15 +284,15 @@ describe("WebSocket /v1/realtime", () => {
await ws.waitForMessages(1); // session.created
ws.send(conversationItemCreate("user", "weather"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(responseCreate());
// Tool call stream: response.created + output_item.added
// + function_call_arguments.delta(s) + function_call_arguments.done
- // + output_item.done + response.done = 6 min events
- // Total: 2 + 6 = 8
- const allRaw = await ws.waitForMessages(8);
+ // + output_item.done + conversation.item.done + response.done = 7 min events
+ // Total: 2 + 7 = 9
+ const allRaw = await ws.waitForMessages(9);
const responseEvents = parseEvents(allRaw.slice(2));
const types = responseEvents.map((e) => e.type);
@@ -317,7 +343,7 @@ describe("WebSocket /v1/realtime", () => {
await ws.waitForMessages(1); // session.created
ws.send(conversationItemCreate("user", "unknown-message-that-matches-nothing"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(responseCreate());
@@ -353,7 +379,7 @@ describe("WebSocket /v1/realtime", () => {
await ws.waitForMessages(1); // session.created
ws.send(conversationItemCreate("user", "fail"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(responseCreate());
@@ -388,12 +414,12 @@ describe("WebSocket /v1/realtime", () => {
await ws.waitForMessages(1); // session.created
ws.send(conversationItemCreate("user", "hello"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(responseCreate());
- // Wait for full text response sequence
- await ws.waitForMessages(10);
+ // Wait for full text response sequence (9 response events + 2 initial = 11)
+ await ws.waitForMessages(11);
// Small pause to ensure the journal write has completed
await new Promise((r) => setTimeout(r, 50));
@@ -425,23 +451,23 @@ describe("WebSocket /v1/realtime", () => {
// Add both conversation items
ws.send(conversationItemCreate("user", "ser-a"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
// Now send two response.create messages rapidly without waiting
// The realtime handler adds "ser-a" to conversation, so the second one
// also sees it. To make the second match "ser-b", add it to conversation first.
ws.send(conversationItemCreate("user", "ser-b"));
- await ws.waitForMessages(3); // + second conversation.item.created
+ await ws.waitForMessages(3); // + second conversation.item.added
// Fire two response.create messages back-to-back
ws.send(responseCreate());
ws.send(responseCreate());
// Each text response: response.created + output_item.added + content_part.added
- // + delta(s) + text.done + content_part.done + output_item.done + response.done
+ // + delta(s) + text.done + content_part.done + output_item.done + conversation.item.done + response.done
// "Alpha response" / 5 = 3 deltas, "Bravo response" / 5 = 3 deltas
- // So 10 events per response = 20 total, plus the 3 initial messages = 23
- const allRaw = await ws.waitForMessages(23);
+ // So 11 events per response = 22 total, plus the 3 initial messages = 25
+ const allRaw = await ws.waitForMessages(25);
const responseEvents = parseEvents(allRaw.slice(3));
// Find response.done boundaries
@@ -461,11 +487,11 @@ describe("WebSocket /v1/realtime", () => {
// Verify no interleaving: deltas in each batch should form a complete string
const firstDeltas = firstBatch
- .filter((e) => e.type === "response.text.delta")
+ .filter((e) => e.type === "response.output_text.delta")
.map((e) => e.delta)
.join("");
const secondDeltas = secondBatch
- .filter((e) => e.type === "response.text.delta")
+ .filter((e) => e.type === "response.output_text.delta")
.map((e) => e.delta)
.join("");
@@ -496,15 +522,15 @@ describe("WebSocket /v1/realtime", () => {
await ws.waitForMessages(1); // session.created
ws.send(conversationItemCreate("user", "multi-tool-rt"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(responseCreate());
// 2 tool calls: response.created
- // + (output_item.added + 1 delta + arguments.done + output_item.done) * 2
- // + response.done = 1 + 8 + 1 = 10 events
- // Total: 2 (session.created + item.created) + 10 = 12
- const allRaw = await ws.waitForMessages(12);
+ // + (output_item.added + 1 delta + arguments.done + output_item.done + conversation.item.done) * 2
+ // + response.done = 1 + 10 + 1 = 12 events
+ // Total: 2 (session.created + item.created) + 12 = 14
+ const allRaw = await ws.waitForMessages(14);
const responseEvents = parseEvents(allRaw.slice(2));
const types = responseEvents.map((e) => e.type);
@@ -546,7 +572,7 @@ describe("WebSocket /v1/realtime", () => {
await ws.waitForMessages(1); // session.created
ws.send(conversationItemCreate("user", "truncate-rt"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(responseCreate());
@@ -557,7 +583,7 @@ describe("WebSocket /v1/realtime", () => {
await new Promise((r) => setTimeout(r, 50));
// The connection was destroyed, so whatever messages arrived should NOT include response.done
- // We got at least session.created + conversation.item.created = 2 before the response
+ // We got at least session.created + conversation.item.added = 2 before the response
const raw = await ws.waitForMessages(2).catch(() => [] as string[]);
if (raw.length > 2) {
const responseEvents = parseEvents(raw.slice(2));
@@ -580,7 +606,7 @@ describe("WebSocket /v1/realtime", () => {
await ws.waitForMessages(1); // session.created
ws.send(conversationItemCreate("user", "truncate-journal-rt"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(responseCreate());
@@ -612,7 +638,7 @@ describe("WebSocket /v1/realtime", () => {
await ws.waitForMessages(1); // session.created
ws.send(conversationItemCreate("user", "truncate-tool-rt"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(responseCreate());
@@ -642,7 +668,7 @@ describe("WebSocket /v1/realtime", () => {
await ws.waitForMessages(1); // session.created
ws.send(conversationItemCreate("user", "disconnect-rt"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(responseCreate());
@@ -709,7 +735,7 @@ describe("WebSocket /v1/realtime", () => {
const raw = await ws.waitForMessages(2);
const event = JSON.parse(raw[1]) as WSEvent;
- expect(event.type).toBe("conversation.item.created");
+ expect(event.type).toBe("conversation.item.added");
const item = event.item as Record;
expect(item.id).toBeDefined();
expect((item.id as string).startsWith("item_")).toBe(true);
@@ -741,8 +767,11 @@ describe("WebSocket /v1/realtime", () => {
expect(session.object).toBe("realtime.session");
expect(typeof session.expires_at).toBe("number");
expect(session.max_response_output_tokens).toBe("inf");
- expect(session.input_audio_transcription).toBeNull();
expect(session.tool_choice).toBe("auto");
+ expect(session.type).toBe("conversation");
+ // GA nested audio config
+ const audio = session.audio as Record;
+ expect(audio).toBeDefined();
ws.close();
});
@@ -762,7 +791,7 @@ describe("WebSocket /v1/realtime", () => {
const raw = await ws.waitForMessages(2);
const event = JSON.parse(raw[1]) as WSEvent;
// The unknown message is silently ignored, so next message is the item.created
- expect(event.type).toBe("conversation.item.created");
+ expect(event.type).toBe("conversation.item.added");
ws.close();
});
@@ -780,26 +809,26 @@ describe("WebSocket /v1/realtime", () => {
// Add function_call item
ws.send(functionCallItem("get_weather", "call_123", '{"city":"NYC"}'));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
// Add function_call_output item
ws.send(functionCallOutputItem("call_123", "Sunny, 72F"));
- await ws.waitForMessages(3); // + conversation.item.created
+ await ws.waitForMessages(3); // + conversation.item.added
ws.send(responseCreate());
// Text response: response.created + output_item.added + content_part.added
- // + text.delta(s) + text.done + content_part.done + output_item.done + response.done
- // "Tool result processed" = 21 chars / chunkSize 20 = 2 deltas = 9 events
- // Total: 3 + 9 = 12
- const allRaw = await ws.waitForMessages(12);
+ // + text.delta(s) + text.done + content_part.done + output_item.done + conversation.item.done + response.done
+ // "Tool result processed" = 21 chars / chunkSize 20 = 2 deltas = 10 events
+ // Total: 3 + 10 = 13
+ const allRaw = await ws.waitForMessages(13);
const responseEvents = parseEvents(allRaw.slice(3));
const types = responseEvents.map((e) => e.type);
expect(types[0]).toBe("response.created");
expect(types[types.length - 1]).toBe("response.done");
// Verify text deltas reconstruct correctly
- const deltas = responseEvents.filter((e) => e.type === "response.text.delta");
+ const deltas = responseEvents.filter((e) => e.type === "response.output_text.delta");
const fullText = deltas.map((d) => d.delta).join("");
expect(fullText).toBe("Tool result processed");
@@ -814,16 +843,16 @@ describe("WebSocket /v1/realtime", () => {
// Add system message item
ws.send(systemMessageItem("You are a helpful assistant"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
// Add user message
ws.send(conversationItemCreate("user", "hello"));
- await ws.waitForMessages(3); // + conversation.item.created
+ await ws.waitForMessages(3); // + conversation.item.added
ws.send(responseCreate());
- // Wait for text response
- const allRaw = await ws.waitForMessages(11);
+ // Wait for text response (9 response events + 3 initial = 12)
+ const allRaw = await ws.waitForMessages(12);
const responseEvents = parseEvents(allRaw.slice(3));
expect(responseEvents[0].type).toBe("response.created");
expect(responseEvents[responseEvents.length - 1].type).toBe("response.done");
@@ -838,7 +867,7 @@ describe("WebSocket /v1/realtime", () => {
await ws.waitForMessages(1); // session.created
ws.send(conversationItemCreate("user", "unknown-no-match"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(responseCreate());
@@ -857,12 +886,12 @@ describe("WebSocket /v1/realtime", () => {
await ws.waitForMessages(2); // + session.updated
ws.send(conversationItemCreate("user", "hello"));
- await ws.waitForMessages(3); // + conversation.item.created
+ await ws.waitForMessages(3); // + conversation.item.added
ws.send(responseCreate());
- // Wait for text response
- const allRaw = await ws.waitForMessages(11);
+ // Wait for text response (9 response events + 3 initial = 12)
+ const allRaw = await ws.waitForMessages(12);
const responseEvents = parseEvents(allRaw.slice(3));
expect(responseEvents[0].type).toBe("response.created");
expect(responseEvents[responseEvents.length - 1].type).toBe("response.done");
@@ -878,24 +907,24 @@ describe("WebSocket /v1/realtime", () => {
// First conversation turn
ws.send(conversationItemCreate("user", "hello"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(responseCreate());
- // Wait for full text response (8 events) => total 10
- await ws.waitForMessages(10);
+ // Wait for full text response (9 events) => total 11
+ await ws.waitForMessages(11);
// Second conversation turn — add another user message
ws.send(conversationItemCreate("user", "weather"));
- // + conversation.item.created => total 11
- await ws.waitForMessages(11);
+ // + conversation.item.added => total 12
+ await ws.waitForMessages(12);
ws.send(responseCreate());
- // Tool call response (6 events) => total 17
- const allRaw = await ws.waitForMessages(17);
- const secondResponseEvents = parseEvents(allRaw.slice(11));
+ // Tool call response (7 events) => total 19
+ const allRaw = await ws.waitForMessages(19);
+ const secondResponseEvents = parseEvents(allRaw.slice(12));
const types = secondResponseEvents.map((e) => e.type);
expect(types[0]).toBe("response.created");
@@ -922,7 +951,7 @@ describe("WebSocket /v1/realtime", () => {
await ws.waitForMessages(1); // session.created
ws.send(conversationItemCreate("user", "error-no-status-rt"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(responseCreate());
@@ -947,7 +976,7 @@ describe("WebSocket /v1/realtime", () => {
await ws.waitForMessages(1); // session.created
ws.send(conversationItemCreate("user", "weird-response-rt"));
- await ws.waitForMessages(2); // + conversation.item.created
+ await ws.waitForMessages(2); // + conversation.item.added
ws.send(responseCreate());
@@ -960,92 +989,1113 @@ describe("WebSocket /v1/realtime", () => {
ws.close();
});
-});
-// ─── Unit tests: realtimeItemsToMessages ─────────────────────────────────────
+ // ── GA session config tests ───────────────────────────────────────────
+ it("session.update accepts reasoning.effort", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
-describe("realtimeItemsToMessages", () => {
- it("converts message items with all role types", () => {
- const items = [
- { type: "message" as const, role: "user" as const, content: [{ type: "text", text: "hi" }] },
- {
- type: "message" as const,
- role: "assistant" as const,
- content: [{ type: "text", text: "hello" }],
- },
- {
- type: "message" as const,
- role: "system" as const,
- content: [{ type: "text", text: "you are helpful" }],
- },
- ];
+ // Skip session.created
+ await ws.waitForMessages(1);
- const messages = realtimeItemsToMessages(items);
- expect(messages).toEqual([
- { role: "user", content: "hi" },
- { role: "assistant", content: "hello" },
- { role: "system", content: "you are helpful" },
- ]);
+ ws.send(
+ JSON.stringify({
+ type: "session.update",
+ session: { reasoning: { effort: "high" } },
+ }),
+ );
+
+ const raw = await ws.waitForMessages(2);
+ const event = JSON.parse(raw[1]) as WSEvent;
+ expect(event.type).toBe("session.updated");
+ const session = event.session as Record;
+ expect(session.reasoning).toEqual({ effort: "high" });
+
+ ws.close();
});
- it("adds system message when instructions provided", () => {
- const items = [
- { type: "message" as const, role: "user" as const, content: [{ type: "text", text: "hi" }] },
- ];
- const messages = realtimeItemsToMessages(items, "Be helpful");
- expect(messages[0]).toEqual({ role: "system", content: "Be helpful" });
- expect(messages[1]).toEqual({ role: "user", content: "hi" });
+ it("session.update accepts input_audio_noise_reduction via nested audio config", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ // Skip session.created
+ await ws.waitForMessages(1);
+
+ ws.send(
+ JSON.stringify({
+ type: "session.update",
+ session: {
+ audio: { input_audio_noise_reduction: { type: "near_field" } },
+ },
+ }),
+ );
+
+ const raw = await ws.waitForMessages(2);
+ const event = JSON.parse(raw[1]) as WSEvent;
+ expect(event.type).toBe("session.updated");
+ const session = event.session as Record;
+ const audio = session.audio as Record;
+ expect(audio.input_audio_noise_reduction).toEqual({ type: "near_field" });
+
+ ws.close();
});
- it("converts function_call items with fallback for missing name", () => {
- const mockLogger = { warn: () => {}, error: () => {}, info: () => {}, debug: () => {} };
- const items = [
- {
- type: "function_call" as const,
- call_id: "call_123",
- arguments: '{"q":"test"}',
- // name is missing
- },
- ];
- // eslint-disable-next-line @typescript-eslint/no-explicit-any
- const messages = realtimeItemsToMessages(items, undefined, mockLogger as any);
- expect(messages.length).toBe(1);
- expect(messages[0].role).toBe("assistant");
- expect(messages[0].tool_calls![0].id).toBe("call_123");
- expect(messages[0].tool_calls![0].function.name).toBe("");
- expect(messages[0].tool_calls![0].function.arguments).toBe('{"q":"test"}');
+ it("session.update accepts input_audio_transcription via nested audio config", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ // Skip session.created
+ await ws.waitForMessages(1);
+
+ ws.send(
+ JSON.stringify({
+ type: "session.update",
+ session: {
+ audio: { input_audio_transcription: { model: "whisper-1" } },
+ },
+ }),
+ );
+
+ const raw = await ws.waitForMessages(2);
+ const event = JSON.parse(raw[1]) as WSEvent;
+ expect(event.type).toBe("session.updated");
+ const session = event.session as Record;
+ const audio = session.audio as Record;
+ expect(audio.input_audio_transcription).toEqual({ model: "whisper-1" });
+
+ ws.close();
});
- it("converts function_call items with auto-generated call_id and empty arguments", () => {
- const items = [
- {
- type: "function_call" as const,
- name: "search",
- // call_id and arguments missing
- },
- ];
- const messages = realtimeItemsToMessages(items);
- expect(messages.length).toBe(1);
- expect(messages[0].tool_calls![0].id).toMatch(/^call_/);
- expect(messages[0].tool_calls![0].function.name).toBe("search");
- expect(messages[0].tool_calls![0].function.arguments).toBe("");
+ it("session.update accepts GA nested audio config (voice, formats)", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ // Skip session.created
+ await ws.waitForMessages(1);
+
+ ws.send(
+ JSON.stringify({
+ type: "session.update",
+ session: {
+ audio: {
+ voice: "alloy",
+ input_audio_format: "pcm16",
+ output_audio_format: "pcm16",
+ },
+ modalities: ["text", "audio"],
+ },
+ }),
+ );
+
+ const raw = await ws.waitForMessages(2);
+ const event = JSON.parse(raw[1]) as WSEvent;
+ expect(event.type).toBe("session.updated");
+ const session = event.session as Record;
+ const audio = session.audio as Record;
+ expect(audio.voice).toBe("alloy");
+ expect(audio.input_audio_format).toBe("pcm16");
+ expect(audio.output_audio_format).toBe("pcm16");
+ expect(session.modalities).toEqual(["text", "audio"]);
+
+ ws.close();
});
- it("converts function_call_output items with fallback for missing output", () => {
- const mockLogger = { warn: () => {}, error: () => {}, info: () => {}, debug: () => {} };
- const items = [
- {
- type: "function_call_output" as const,
- call_id: "call_456",
- // output is missing
+ it("reasoning persists across session.update calls", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ // Skip session.created
+ await ws.waitForMessages(1);
+
+ // First update: set reasoning
+ ws.send(
+ JSON.stringify({
+ type: "session.update",
+ session: { reasoning: { effort: "high" } },
+ }),
+ );
+
+ const raw1 = await ws.waitForMessages(2);
+ const event1 = JSON.parse(raw1[1]) as WSEvent;
+ expect((event1.session as Record).reasoning).toEqual({ effort: "high" });
+
+ // Second update: change something else, reasoning should persist
+ ws.send(sessionUpdate({ instructions: "Be helpful" }));
+
+ const raw2 = await ws.waitForMessages(3);
+ const event2 = JSON.parse(raw2[2]) as WSEvent;
+ expect(event2.type).toBe("session.updated");
+ const session2 = event2.session as Record;
+ expect(session2.reasoning).toEqual({ effort: "high" });
+ expect(session2.instructions).toBe("Be helpful");
+
+ // Third update: clear reasoning
+ ws.send(
+ JSON.stringify({
+ type: "session.update",
+ session: { reasoning: null },
+ }),
+ );
+
+ const raw3 = await ws.waitForMessages(4);
+ const event3 = JSON.parse(raw3[3]) as WSEvent;
+ expect((event3.session as Record).reasoning).toBeNull();
+
+ ws.close();
+ });
+
+ // ── Image input tests ────────────────────────────────────────────────
+ it("accepts input_image content in conversation.item.create", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(
+ JSON.stringify({
+ type: "conversation.item.create",
+ item: {
+ type: "message",
+ role: "user",
+ content: [
+ { type: "input_text", text: "What is in this image?" },
+ { type: "input_image", url: "https://example.com/photo.jpg" },
+ ],
+ },
+ }),
+ );
+
+ const raw = await ws.waitForMessages(2);
+ const event = parseEvents(raw.slice(1))[0];
+ expect(event.type).toBe("conversation.item.added");
+ const item = event.item as Record;
+ const content = item.content as Array>;
+ expect(content).toHaveLength(2);
+ expect(content[0].type).toBe("input_text");
+ expect(content[0].text).toBe("What is in this image?");
+ expect(content[1].type).toBe("input_image");
+ expect(content[1].url).toBe("https://example.com/photo.jpg");
+
+ ws.close();
+ });
+
+ it("maps input_image to ChatMessage image_url format for fixture matching", async () => {
+ // Use a predicate to verify the ChatMessage structure produced by realtimeItemsToMessages
+ let capturedMessages: unknown[] | null = null;
+ const imageFixture: Fixture = {
+ match: {
+ predicate: (req) => {
+ capturedMessages = req.messages;
+ // Match any request so we get a response
+ const lastUser = req.messages.filter((m) => m.role === "user").pop();
+ return !!lastUser;
+ },
},
- ];
- // eslint-disable-next-line @typescript-eslint/no-explicit-any
- const messages = realtimeItemsToMessages(items, undefined, mockLogger as any);
- expect(messages.length).toBe(1);
- expect(messages[0].role).toBe("tool");
- expect(messages[0].content).toBe("");
- expect(messages[0].tool_call_id).toBe("call_456");
+ response: { content: "I see a cat." },
+ };
+ instance = await createServer([imageFixture]);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(
+ JSON.stringify({
+ type: "conversation.item.create",
+ item: {
+ type: "message",
+ role: "user",
+ content: [
+ { type: "input_text", text: "What is in this image?" },
+ { type: "input_image", url: "https://example.com/photo.jpg" },
+ ],
+ },
+ }),
+ );
+ await ws.waitForMessages(2); // + conversation.item.added
+
+ ws.send(responseCreate());
+
+ // Text response: response.created + output_item.added + content_part.added
+ // + output_text.delta + output_text.done + content_part.done + output_item.done + response.done
+ const allRaw = await ws.waitForMessages(10);
+ const responseEvents = parseEvents(allRaw.slice(2));
+ const textDelta = responseEvents.find((e) => e.type === "response.output_text.delta");
+ expect(textDelta).toBeDefined();
+ expect(textDelta!.delta).toContain("I see a cat.");
+
+ // Verify the ChatMessage structure passed to fixture matching
+ expect(capturedMessages).not.toBeNull();
+ const userMsg = (capturedMessages as Record[]).find((m) => m.role === "user");
+ expect(userMsg).toBeDefined();
+ // After mapping, content should be an array with text + image_url parts
+ const content = userMsg!.content as Array>;
+ expect(Array.isArray(content)).toBe(true);
+ expect(content).toHaveLength(2);
+ expect(content[0]).toEqual({ type: "text", text: "What is in this image?" });
+ expect(content[1]).toEqual({
+ type: "image_url",
+ image_url: { url: "https://example.com/photo.jpg" },
+ });
+
+ ws.close();
+ });
+
+ // ── Beta shim tests ──────────────────────────────────────────────────
+ it("emits Beta event names when OpenAI-Beta header is present", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-2", {
+ "OpenAI-Beta": "realtime=v1",
+ });
+
+ // First message: session.created
+ const raw = await ws.waitForMessages(1);
+ const session = parseEvents(raw)[0];
+ expect(session.type).toBe("session.created");
+
+ // Beta: flat session config (no nested audio)
+ const sess = session.session as Record;
+ expect(sess.voice).toBeDefined();
+ expect(sess.audio).toBeUndefined();
+ expect(sess.type).toBeUndefined();
+ expect(sess.reasoning).toBeUndefined();
+
+ // Send conversation item and response
+ ws.send(conversationItemCreate("user", "hello"));
+
+ const ackRaw = await ws.waitForMessages(2);
+ const ackEvent = parseEvents(ackRaw.slice(1))[0];
+ // Beta: conversation.item.created (not .added)
+ expect(ackEvent.type).toBe("conversation.item.created");
+
+ ws.send(responseCreate());
+
+ // Wait for full text response
+ const allRaw = await ws.waitForMessages(10);
+ const responseEvents = parseEvents(allRaw.slice(2));
+ const types = responseEvents.map((e) => e.type);
+
+ // Beta event names
+ expect(types).toContain("response.text.delta"); // not output_text
+ expect(types).toContain("response.text.done"); // not output_text
+ expect(types).not.toContain("response.output_text.delta");
+ expect(types).not.toContain("response.output_text.done");
+
+ // Beta content type: "text" not "output_text"
+ const contentPartAdded = responseEvents.find((e) => e.type === "response.content_part.added");
+ expect((contentPartAdded!.part as Record).type).toBe("text");
+
+ ws.close();
+ });
+
+ // ── Translate/Whisper session types + audio buffer ─────────────────────
+ it("accepts transcription session type and acknowledges audio buffer commit", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ // Skip session.created
+ await ws.waitForMessages(1);
+
+ // Update session to transcription mode with whisper model
+ ws.send(sessionUpdate({ type: "transcription", model: "gpt-realtime-whisper" }));
+
+ const updateRaw = await ws.waitForMessages(2);
+ const updateEvent = parseEvents(updateRaw.slice(1))[0];
+ expect(updateEvent.type).toBe("session.updated");
+ expect((updateEvent.session as Record).type).toBe("transcription");
+ expect((updateEvent.session as Record).model).toBe("gpt-realtime-whisper");
+
+ // Send audio buffer messages
+ ws.send(JSON.stringify({ type: "input_audio_buffer.append", audio: "base64data" }));
+ ws.send(JSON.stringify({ type: "input_audio_buffer.commit" }));
+
+ // Should get input_audio_buffer.committed + conversation.item.added (placeholder)
+ const audioRaw = await ws.waitForMessages(4);
+ const audioEvents = parseEvents(audioRaw.slice(2));
+ const types = audioEvents.map((e) => e.type);
+ expect(types).toContain("input_audio_buffer.committed");
+ expect(types).toContain("conversation.item.added");
+
+ // The placeholder item should have input_audio content
+ const itemAdded = audioEvents.find((e) => e.type === "conversation.item.added");
+ const item = itemAdded!.item as Record;
+ expect(item.role).toBe("user");
+ const content = item.content as Array>;
+ expect(content[0].type).toBe("input_audio");
+ expect(content[0].transcript).toBeNull();
+
+ ws.close();
+ });
+
+ it("input_audio_buffer.append is silently accepted", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ // Send append — should be silently accepted (no event emitted)
+ ws.send(JSON.stringify({ type: "input_audio_buffer.append", audio: "base64data" }));
+
+ // Send a known message to verify processing continues
+ ws.send(conversationItemCreate("user", "hello"));
+
+ const raw = await ws.waitForMessages(2);
+ const event = parseEvents(raw.slice(1))[0];
+ // The append was silent, so next event is from the conversation.item.create
+ expect(event.type).toBe("conversation.item.added");
+
+ ws.close();
+ });
+
+ it("input_audio_buffer.clear emits cleared event", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(JSON.stringify({ type: "input_audio_buffer.clear" }));
+
+ const raw = await ws.waitForMessages(2);
+ const event = parseEvents(raw.slice(1))[0];
+ expect(event.type).toBe("input_audio_buffer.cleared");
+
+ ws.close();
+ });
+
+ it("input_audio_buffer.commit in conversation mode does not add placeholder item", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ // In default conversation mode, commit should only emit committed, no item
+ ws.send(JSON.stringify({ type: "input_audio_buffer.commit" }));
+
+ const raw = await ws.waitForMessages(2);
+ const event = parseEvents(raw.slice(1))[0];
+ expect(event.type).toBe("input_audio_buffer.committed");
+
+ // Send another message to verify no extra events were emitted
+ ws.send(conversationItemCreate("user", "hello"));
+ const raw2 = await ws.waitForMessages(3);
+ const event2 = parseEvents(raw2.slice(2))[0];
+ expect(event2.type).toBe("conversation.item.added");
+
+ ws.close();
+ });
+
+ it("accepts translation session type", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(sessionUpdate({ type: "translation", model: "gpt-realtime-translate" }));
+
+ const raw = await ws.waitForMessages(2);
+ const event = parseEvents(raw.slice(1))[0];
+ expect(event.type).toBe("session.updated");
+ expect((event.session as Record).type).toBe("translation");
+ expect((event.session as Record).model).toBe("gpt-realtime-translate");
+
+ ws.close();
+ });
+
+ it("rejects invalid session type + model combination (transcription with wrong model)", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(sessionUpdate({ type: "transcription", model: "gpt-realtime-2" }));
+
+ const raw = await ws.waitForMessages(2);
+ const event = parseEvents(raw.slice(1))[0];
+ expect(event.type).toBe("error");
+ const error = event.error as Record;
+ expect(error.type).toBe("invalid_request_error");
+ expect(error.code).toBe("invalid_session_config");
+ expect(error.message).toContain("transcription");
+
+ ws.close();
+ });
+
+ it("rejects invalid session type + model combination (translation with wrong model)", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(sessionUpdate({ type: "translation", model: "gpt-realtime-2" }));
+
+ const raw = await ws.waitForMessages(2);
+ const event = parseEvents(raw.slice(1))[0];
+ expect(event.type).toBe("error");
+ const error = event.error as Record;
+ expect(error.type).toBe("invalid_request_error");
+ expect(error.code).toBe("invalid_session_config");
+ expect(error.message).toContain("translation");
+
+ ws.close();
+ });
+
+ it("audio buffer commit in translation mode adds placeholder conversation item", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(sessionUpdate({ type: "translation", model: "gpt-realtime-translate" }));
+ await ws.waitForMessages(2); // session.updated
+
+ ws.send(JSON.stringify({ type: "input_audio_buffer.commit" }));
+
+ // Should get committed + conversation.item.added (placeholder)
+ const raw = await ws.waitForMessages(4);
+ const events = parseEvents(raw.slice(2));
+ const types = events.map((e) => e.type);
+ expect(types).toContain("input_audio_buffer.committed");
+ expect(types).toContain("conversation.item.added");
+
+ ws.close();
+ });
+
+ // ── conversation.item.done tests ────────────────────────────────────
+ it("emits conversation.item.done after response completes", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(conversationItemCreate("user", "hello"));
+ await ws.waitForMessages(2); // + conversation.item.added
+
+ ws.send(responseCreate());
+
+ // Text response events + conversation.item.done = 9 events after item.added
+ // response.created + output_item.added + content_part.added + delta(s) + output_text.done
+ // + content_part.done + output_item.done + conversation.item.done + response.done
+ const allRaw = await ws.waitForMessages(11);
+ const responseEvents = parseEvents(allRaw.slice(2));
+ const types = responseEvents.map((e) => e.type);
+
+ // conversation.item.done should appear after response.output_item.done
+ const outputItemDoneIdx = types.lastIndexOf("response.output_item.done");
+ const itemDoneIdx = types.indexOf("conversation.item.done");
+ expect(itemDoneIdx).toBeGreaterThan(-1);
+ expect(itemDoneIdx).toBeGreaterThan(outputItemDoneIdx);
+
+ const itemDone = responseEvents[itemDoneIdx];
+ const item = itemDone.item as Record;
+ expect(item.id).toBeDefined();
+ expect((item.id as string).startsWith("item_")).toBe(true);
+ expect(item.type).toBe("message");
+ expect(item.role).toBe("assistant");
+ expect(item.status).toBe("completed");
+ expect(Array.isArray(item.content)).toBe(true);
+
+ ws.close();
+ });
+
+ // ── response.cancel tests ───────────────────────────────────────────
+ it("handles response.cancel by emitting response.cancelled", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ // Send response.cancel (no active response needed — aimock just acknowledges)
+ ws.send(JSON.stringify({ type: "response.cancel" }));
+
+ const raw = await ws.waitForMessages(2);
+ const event = parseEvents(raw.slice(1))[0];
+ expect(event.type).toBe("response.cancelled");
+
+ ws.close();
+ });
+
+ // ── Beta suppression of conversation.item.done ──────────────────────
+ it("suppresses conversation.item.done in Beta mode", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-2", {
+ "OpenAI-Beta": "realtime=v1",
+ });
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(conversationItemCreate("user", "hello"));
+ await ws.waitForMessages(2); // + conversation.item.created (Beta name)
+
+ ws.send(responseCreate());
+
+ // Wait for response events — Beta does not include conversation.item.done
+ const allRaw = await ws.waitForMessages(10);
+ const responseEvents = parseEvents(allRaw.slice(2));
+ const types = responseEvents.map((e) => e.type);
+
+ expect(types).not.toContain("conversation.item.done");
+
+ ws.close();
+ });
+
+ // ── GA model acceptance tests ───────────────────────────────────────────
+ it.each([
+ "gpt-realtime-2",
+ "gpt-realtime-1.5",
+ "gpt-realtime-mini",
+ "gpt-realtime-translate",
+ "gpt-realtime-whisper",
+ ])("accepts GA model %s via query parameter", async (model) => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, `/v1/realtime?model=${model}`);
+
+ const raw = await ws.waitForMessages(1);
+ const event = parseEvents(raw)[0];
+ expect(event.type).toBe("session.created");
+ const session = event.session as Record;
+ expect(session.model).toBe(model);
+
+ ws.close();
+ });
+
+ it.each(["gpt-4o-realtime-preview", "gpt-4o-mini-realtime-preview"])(
+ "accepts legacy model %s via query parameter",
+ async (model) => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, `/v1/realtime?model=${model}`);
+
+ const raw = await ws.waitForMessages(1);
+ const event = parseEvents(raw)[0];
+ expect(event.type).toBe("session.created");
+ const session = event.session as Record;
+ expect(session.model).toBe(model);
+
+ ws.close();
+ },
+ );
+
+ // ── endpointType routing tests ──────────────────────────────────────────
+ it("sets _endpointType to realtime for default conversation sessions", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(conversationItemCreate("user", "hello"));
+ await ws.waitForMessages(2); // + conversation.item.added
+
+ ws.send(responseCreate());
+
+ // Wait for full text response
+ await ws.waitForMessages(10);
+ await new Promise((r) => setTimeout(r, 50));
+
+ const entry = instance.journal.getLast();
+ expect(entry).not.toBeNull();
+ expect(entry!.body._endpointType).toBe("realtime");
+
+ ws.close();
+ });
+
+ it("sets _endpointType to realtime-transcription for transcription sessions", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-whisper");
+
+ await ws.waitForMessages(1); // session.created
+
+ // Update session to transcription type
+ ws.send(sessionUpdate({ type: "transcription" }));
+ await ws.waitForMessages(2); // + session.updated
+
+ ws.send(conversationItemCreate("user", "hello"));
+ await ws.waitForMessages(3); // + conversation.item.added
+
+ ws.send(responseCreate());
+
+ // Response events (no match in non-strict = 2 events: response.created + response.done)
+ await ws.waitForMessages(5);
+ await new Promise((r) => setTimeout(r, 50));
+
+ const entry = instance.journal.getLast();
+ expect(entry).not.toBeNull();
+ expect(entry!.body._endpointType).toBe("realtime-transcription");
+
+ ws.close();
+ });
+
+ it("sets _endpointType to realtime-translation for translation sessions", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-translate");
+
+ await ws.waitForMessages(1); // session.created
+
+ // Update session to translation type
+ ws.send(sessionUpdate({ type: "translation" }));
+ await ws.waitForMessages(2); // + session.updated
+
+ ws.send(conversationItemCreate("user", "hello"));
+ await ws.waitForMessages(3); // + conversation.item.added
+
+ ws.send(responseCreate());
+
+ // Response events
+ await ws.waitForMessages(5);
+ await new Promise((r) => setTimeout(r, 50));
+
+ const entry = instance.journal.getLast();
+ expect(entry).not.toBeNull();
+ expect(entry!.body._endpointType).toBe("realtime-translation");
+
+ ws.close();
+ });
+
+ // ── Commentary phase tests ──────────────────────────────────────────────
+ it("emits phase: final_answer on output_item.added and output_item.done for text response", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(conversationItemCreate("user", "hello"));
+ await ws.waitForMessages(2); // + conversation.item.added
+
+ ws.send(responseCreate());
+
+ // Text response: 9 events after item.added
+ const allRaw = await ws.waitForMessages(11);
+ const responseEvents = parseEvents(allRaw.slice(2));
+
+ const outputItemAdded = responseEvents.find(
+ (e) =>
+ e.type === "response.output_item.added" &&
+ (e.item as Record).type === "message",
+ );
+ expect(outputItemAdded).toBeDefined();
+ expect((outputItemAdded!.item as Record).phase).toBe("final_answer");
+
+ const outputItemDone = responseEvents.find(
+ (e) =>
+ e.type === "response.output_item.done" &&
+ (e.item as Record).type === "message",
+ );
+ expect(outputItemDone).toBeDefined();
+ expect((outputItemDone!.item as Record).phase).toBe("final_answer");
+
+ ws.close();
+ });
+
+ it("emits phase: final_answer on output_item.added and output_item.done for tool call response", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(conversationItemCreate("user", "weather"));
+ await ws.waitForMessages(2); // + conversation.item.added
+
+ ws.send(responseCreate());
+
+ // Tool call response: 7 events after item.added
+ const allRaw = await ws.waitForMessages(9);
+ const responseEvents = parseEvents(allRaw.slice(2));
+
+ const outputItemAdded = responseEvents.find(
+ (e) =>
+ e.type === "response.output_item.added" &&
+ (e.item as Record).type === "function_call",
+ );
+ expect(outputItemAdded).toBeDefined();
+ expect((outputItemAdded!.item as Record).phase).toBe("final_answer");
+
+ const outputItemDone = responseEvents.find(
+ (e) =>
+ e.type === "response.output_item.done" &&
+ (e.item as Record).type === "function_call",
+ );
+ expect(outputItemDone).toBeDefined();
+ expect((outputItemDone!.item as Record).phase).toBe("final_answer");
+
+ ws.close();
+ });
+
+ it("emits phase: commentary on text output_item and phase: final_answer on tool call when both present", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(conversationItemCreate("user", "commentary-phase"));
+ await ws.waitForMessages(2); // + conversation.item.added
+
+ ws.send(responseCreate());
+
+ // ContentWithToolCalls: text part + tool call part + all their events
+ // response.created + output_item.added(text) + content_part.added + delta(s) + output_text.done
+ // + content_part.done + output_item.done(text) + conversation.item.done(text)
+ // + output_item.added(tool) + delta(s) + arguments.done + output_item.done(tool)
+ // + conversation.item.done(tool) + response.done
+ // Text "Let me check the weather for you." = 34 chars / chunkSize 20 = 2 deltas
+ // Tool args '{"city":"NYC"}' = 14 chars / chunkSize 20 = 1 delta
+ // Total response events = 1 + 1 + 1 + 2 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 = 15
+ // Total messages = 2 (session.created + item.added) + 15 = 17
+ const allRaw = await ws.waitForMessages(17);
+ const responseEvents = parseEvents(allRaw.slice(2));
+
+ // Find text output_item.added
+ const textItemAdded = responseEvents.find(
+ (e) =>
+ e.type === "response.output_item.added" &&
+ (e.item as Record).type === "message",
+ );
+ expect(textItemAdded).toBeDefined();
+ expect((textItemAdded!.item as Record).phase).toBe("commentary");
+
+ // Find text output_item.done
+ const textItemDone = responseEvents.find(
+ (e) =>
+ e.type === "response.output_item.done" &&
+ (e.item as Record).type === "message",
+ );
+ expect(textItemDone).toBeDefined();
+ expect((textItemDone!.item as Record).phase).toBe("commentary");
+
+ // Find tool call output_item.added
+ const toolItemAdded = responseEvents.find(
+ (e) =>
+ e.type === "response.output_item.added" &&
+ (e.item as Record).type === "function_call",
+ );
+ expect(toolItemAdded).toBeDefined();
+ expect((toolItemAdded!.item as Record).phase).toBe("final_answer");
+
+ // Find tool call output_item.done
+ const toolItemDone = responseEvents.find(
+ (e) =>
+ e.type === "response.output_item.done" &&
+ (e.item as Record).type === "function_call",
+ );
+ expect(toolItemDone).toBeDefined();
+ expect((toolItemDone!.item as Record).phase).toBe("final_answer");
+
+ ws.close();
+ });
+
+ // ── Beta content type translation conformance ──────────────────────────
+ it("Beta mode: response.output_item.done translates item.content[].type output_text -> text", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-2", {
+ "OpenAI-Beta": "realtime=v1",
+ });
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(conversationItemCreate("user", "hello"));
+ await ws.waitForMessages(2); // + conversation.item.created (Beta name)
+
+ ws.send(responseCreate());
+
+ // Beta text response: session.created + item.created + response.created + output_item.added
+ // + content_part.added + text.delta + text.done + content_part.done + output_item.done + response.done
+ const allRaw = await ws.waitForMessages(10);
+ const responseEvents = parseEvents(allRaw.slice(2));
+
+ // Find response.output_item.done
+ const outputItemDone = responseEvents.find((e) => e.type === "response.output_item.done");
+ expect(outputItemDone).toBeDefined();
+ const item = outputItemDone!.item as Record;
+ const content = item.content as Array>;
+ expect(content).toBeDefined();
+ expect(content[0].type).toBe("text"); // not "output_text"
+
+ ws.close();
+ });
+
+ it("Beta mode: response.done translates response.output[].content[].type output_text -> text", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-2", {
+ "OpenAI-Beta": "realtime=v1",
+ });
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(conversationItemCreate("user", "hello"));
+ await ws.waitForMessages(2); // + conversation.item.created
+
+ ws.send(responseCreate());
+
+ const allRaw = await ws.waitForMessages(10);
+ const responseEvents = parseEvents(allRaw.slice(2));
+
+ // Find response.done
+ const responseDone = responseEvents.find((e) => e.type === "response.done");
+ expect(responseDone).toBeDefined();
+ const resp = responseDone!.response as Record;
+ const output = resp.output as Array>;
+ expect(output).toBeDefined();
+ expect(output.length).toBeGreaterThan(0);
+ const outputContent = output[0].content as Array>;
+ expect(outputContent).toBeDefined();
+ expect(outputContent[0].type).toBe("text"); // not "output_text"
+
+ ws.close();
+ });
+
+ it("Beta mode: output_item events do NOT have phase field", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime?model=gpt-realtime-2", {
+ "OpenAI-Beta": "realtime=v1",
+ });
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(conversationItemCreate("user", "hello"));
+ await ws.waitForMessages(2); // + conversation.item.created
+
+ ws.send(responseCreate());
+
+ const allRaw = await ws.waitForMessages(10);
+ const responseEvents = parseEvents(allRaw.slice(2));
+
+ // Check output_item.added
+ const outputItemAdded = responseEvents.find((e) => e.type === "response.output_item.added");
+ expect(outputItemAdded).toBeDefined();
+ const addedItem = outputItemAdded!.item as Record;
+ expect(addedItem.phase).toBeUndefined();
+
+ // Check output_item.done
+ const outputItemDone = responseEvents.find((e) => e.type === "response.output_item.done");
+ expect(outputItemDone).toBeDefined();
+ const doneItem = outputItemDone!.item as Record;
+ expect(doneItem.phase).toBeUndefined();
+
+ ws.close();
+ });
+
+ // ── Session type validation tests ──────────────────────────────────────
+ it("rejects invalid session.type value", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ ws.send(sessionUpdate({ type: "invalid_type" }));
+
+ const raw = await ws.waitForMessages(2);
+ const event = parseEvents(raw.slice(1))[0];
+ expect(event.type).toBe("error");
+ const error = event.error as Record;
+ expect(error.type).toBe("invalid_request_error");
+ expect(error.code).toBe("invalid_session_config");
+ expect(error.message).toContain("Invalid session type");
+ expect(error.message).toContain("invalid_type");
+
+ ws.close();
+ });
+
+ it("rejected session.update does not corrupt session state", async () => {
+ instance = await createServer(allFixtures);
+ const ws = await connectWebSocket(instance.url, "/v1/realtime");
+
+ await ws.waitForMessages(1); // session.created
+
+ // First, set a known valid state
+ ws.send(sessionUpdate({ instructions: "Be helpful", model: "gpt-realtime-2" }));
+ const raw1 = await ws.waitForMessages(2);
+ const event1 = parseEvents(raw1.slice(1))[0];
+ expect(event1.type).toBe("session.updated");
+ expect((event1.session as Record).model).toBe("gpt-realtime-2");
+
+ // Now send an invalid model+type combination that should be rejected
+ ws.send(sessionUpdate({ type: "transcription", model: "gpt-realtime-2" }));
+ const raw2 = await ws.waitForMessages(3);
+ const event2 = parseEvents(raw2.slice(2))[0];
+ expect(event2.type).toBe("error");
+
+ // Verify state was rolled back by sending another valid update and checking the echoed state
+ ws.send(sessionUpdate({ instructions: "Updated instructions" }));
+ const raw3 = await ws.waitForMessages(4);
+ const event3 = parseEvents(raw3.slice(3))[0];
+ expect(event3.type).toBe("session.updated");
+ const session = event3.session as Record;
+ // Model and type should still be the pre-rejection values
+ expect(session.model).toBe("gpt-realtime-2");
+ expect(session.type).toBe("conversation");
+ expect(session.instructions).toBe("Updated instructions");
+
+ ws.close();
+ });
+});
+
+// ─── Unit tests: realtimeItemsToMessages ─────────────────────────────────────
+
+describe("realtimeItemsToMessages", () => {
+ it("converts message items with all role types", () => {
+ const items = [
+ { type: "message" as const, role: "user" as const, content: [{ type: "text", text: "hi" }] },
+ {
+ type: "message" as const,
+ role: "assistant" as const,
+ content: [{ type: "text", text: "hello" }],
+ },
+ {
+ type: "message" as const,
+ role: "system" as const,
+ content: [{ type: "text", text: "you are helpful" }],
+ },
+ ];
+
+ const messages = realtimeItemsToMessages(items);
+ expect(messages).toEqual([
+ { role: "user", content: "hi" },
+ { role: "assistant", content: "hello" },
+ { role: "system", content: "you are helpful" },
+ ]);
+ });
+
+ it("adds system message when instructions provided", () => {
+ const items = [
+ { type: "message" as const, role: "user" as const, content: [{ type: "text", text: "hi" }] },
+ ];
+ const messages = realtimeItemsToMessages(items, "Be helpful");
+ expect(messages[0]).toEqual({ role: "system", content: "Be helpful" });
+ expect(messages[1]).toEqual({ role: "user", content: "hi" });
+ });
+
+ it("converts function_call items with fallback for missing name", () => {
+ const mockLogger = { warn: () => {}, error: () => {}, info: () => {}, debug: () => {} };
+ const items = [
+ {
+ type: "function_call" as const,
+ call_id: "call_123",
+ arguments: '{"q":"test"}',
+ // name is missing
+ },
+ ];
+ // eslint-disable-next-line @typescript-eslint/no-explicit-any
+ const messages = realtimeItemsToMessages(items, undefined, mockLogger as any);
+ expect(messages.length).toBe(1);
+ expect(messages[0].role).toBe("assistant");
+ expect(messages[0].tool_calls![0].id).toBe("call_123");
+ expect(messages[0].tool_calls![0].function.name).toBe("");
+ expect(messages[0].tool_calls![0].function.arguments).toBe('{"q":"test"}');
+ });
+
+ it("converts function_call items with auto-generated call_id and empty arguments", () => {
+ const items = [
+ {
+ type: "function_call" as const,
+ name: "search",
+ // call_id and arguments missing
+ },
+ ];
+ const messages = realtimeItemsToMessages(items);
+ expect(messages.length).toBe(1);
+ expect(messages[0].tool_calls![0].id).toMatch(/^call_/);
+ expect(messages[0].tool_calls![0].function.name).toBe("search");
+ expect(messages[0].tool_calls![0].function.arguments).toBe("");
+ });
+
+ it("converts function_call_output items with fallback for missing output", () => {
+ const mockLogger = { warn: () => {}, error: () => {}, info: () => {}, debug: () => {} };
+ const items = [
+ {
+ type: "function_call_output" as const,
+ call_id: "call_456",
+ // output is missing
+ },
+ ];
+ // eslint-disable-next-line @typescript-eslint/no-explicit-any
+ const messages = realtimeItemsToMessages(items, undefined, mockLogger as any);
+ expect(messages.length).toBe(1);
+ expect(messages[0].role).toBe("tool");
+ expect(messages[0].content).toBe("");
+ expect(messages[0].tool_call_id).toBe("call_456");
+ });
+
+ it("maps input_text content parts to text format", () => {
+ const items = [
+ {
+ type: "message" as const,
+ role: "user" as const,
+ content: [{ type: "input_text", text: "hello world" }],
+ },
+ ];
+ const messages = realtimeItemsToMessages(items);
+ expect(messages).toEqual([{ role: "user", content: [{ type: "text", text: "hello world" }] }]);
+ });
+
+ it("maps input_image content parts to image_url format", () => {
+ const items = [
+ {
+ type: "message" as const,
+ role: "user" as const,
+ content: [
+ { type: "input_text", text: "What is in this image?" },
+ { type: "input_image", url: "https://example.com/photo.jpg" },
+ ],
+ },
+ ];
+ const messages = realtimeItemsToMessages(items);
+ expect(messages).toEqual([
+ {
+ role: "user",
+ content: [
+ { type: "text", text: "What is in this image?" },
+ { type: "image_url", image_url: { url: "https://example.com/photo.jpg" } },
+ ],
+ },
+ ]);
+ });
+
+ it("maps input_audio content parts to placeholder text", () => {
+ const items = [
+ {
+ type: "message" as const,
+ role: "user" as const,
+ content: [{ type: "input_audio", transcript: null }],
+ },
+ ];
+ const messages = realtimeItemsToMessages(items);
+ expect(messages).toEqual([
+ { role: "user", content: [{ type: "text", text: "[audio input]" }] },
+ ]);
+ });
+
+ it("maps mixed multimodal content (input_text + input_image + input_audio)", () => {
+ const items = [
+ {
+ type: "message" as const,
+ role: "user" as const,
+ content: [
+ { type: "input_text", text: "Describe this" },
+ { type: "input_image", url: "https://example.com/img.png" },
+ { type: "input_audio", transcript: null },
+ ],
+ },
+ ];
+ const messages = realtimeItemsToMessages(items);
+ expect(messages).toEqual([
+ {
+ role: "user",
+ content: [
+ { type: "text", text: "Describe this" },
+ { type: "image_url", image_url: { url: "https://example.com/img.png" } },
+ { type: "text", text: "[audio input]" },
+ ],
+ },
+ ]);
+ });
+
+ it("preserves existing text content format (backward compat)", () => {
+ const items = [
+ {
+ type: "message" as const,
+ role: "user" as const,
+ content: [{ type: "text", text: "hello" }],
+ },
+ ];
+ const messages = realtimeItemsToMessages(items);
+ // Existing format should still extract simple text
+ expect(messages).toEqual([{ role: "user", content: "hello" }]);
});
it("handles message items with missing content", () => {
diff --git a/src/__tests__/ws-test-client.ts b/src/__tests__/ws-test-client.ts
index 5c7a4e65..bbedbdae 100644
--- a/src/__tests__/ws-test-client.ts
+++ b/src/__tests__/ws-test-client.ts
@@ -15,20 +15,29 @@ export interface WSTestClient {
waitForClose(): Promise;
}
-export function connectWebSocket(url: string, path: string): Promise {
+export function connectWebSocket(
+ url: string,
+ path: string,
+ headers?: Record,
+): Promise {
return new Promise((resolve, reject) => {
const parsed = new URL(url);
const socket = net.connect(parseInt(parsed.port), parsed.hostname, () => {
const key = randomBytes(16).toString("base64");
- socket.write(
+ let headerStr =
`GET ${path} HTTP/1.1\r\n` +
- `Host: ${parsed.host}\r\n` +
- `Upgrade: websocket\r\n` +
- `Connection: Upgrade\r\n` +
- `Sec-WebSocket-Key: ${key}\r\n` +
- `Sec-WebSocket-Version: 13\r\n` +
- `\r\n`,
- );
+ `Host: ${parsed.host}\r\n` +
+ `Upgrade: websocket\r\n` +
+ `Connection: Upgrade\r\n` +
+ `Sec-WebSocket-Key: ${key}\r\n` +
+ `Sec-WebSocket-Version: 13\r\n`;
+ if (headers) {
+ for (const [name, value] of Object.entries(headers)) {
+ headerStr += `${name}: ${value}\r\n`;
+ }
+ }
+ headerStr += `\r\n`;
+ socket.write(headerStr);
let handshakeDone = false;
let buffer = Buffer.alloc(0);
diff --git a/src/recorder.ts b/src/recorder.ts
index 83727913..d85cae83 100644
--- a/src/recorder.ts
+++ b/src/recorder.ts
@@ -1161,7 +1161,10 @@ type EndpointType =
| "embedding"
| "audio-gen"
| "fal-audio"
- | "fal";
+ | "fal"
+ | "realtime"
+ | "realtime-transcription"
+ | "realtime-translation";
function buildFixtureMatch(
request: ChatCompletionRequest,
diff --git a/src/router.ts b/src/router.ts
index 12e6b4da..ede7598b 100644
--- a/src/router.ts
+++ b/src/router.ts
@@ -74,7 +74,12 @@ export function matchFixture(
const reqEndpoint = effective._endpointType as string | undefined;
if (match.endpoint !== undefined) {
if (match.endpoint !== reqEndpoint) continue;
- } else if (reqEndpoint && reqEndpoint !== "chat" && reqEndpoint !== "embedding") {
+ } else if (
+ reqEndpoint &&
+ reqEndpoint !== "chat" &&
+ reqEndpoint !== "embedding" &&
+ !reqEndpoint.startsWith("realtime")
+ ) {
// Fixture has no endpoint restriction but request is multimedia —
// only match if the response type is compatible.
// Function responses cannot be checked statically, so treat them as compatible.
diff --git a/src/server.ts b/src/server.ts
index e87afc96..e98c5fd4 100644
--- a/src/server.ts
+++ b/src/server.ts
@@ -2065,7 +2065,7 @@ export async function createServer(
upgradeHeaders: req.headers,
});
} else if (pathname === REALTIME_PATH) {
- const model = parsedUrl.searchParams.get("model") ?? "gpt-4o-realtime";
+ const model = parsedUrl.searchParams.get("model") ?? "gpt-realtime-2";
handleWebSocketRealtime(ws, fixtures, journal, {
...defaults,
model,
diff --git a/src/types.ts b/src/types.ts
index 0c9b21d8..738151cf 100644
--- a/src/types.ts
+++ b/src/types.ts
@@ -98,7 +98,10 @@ export interface FixtureMatch {
| "embedding"
| "audio-gen"
| "fal-audio"
- | "fal";
+ | "fal"
+ | "realtime"
+ | "realtime-transcription"
+ | "realtime-translation";
}
// Fixture response types
@@ -226,6 +229,32 @@ export type FixtureResponse =
| VideoResponse
| RawJSONResponse;
+// GA Realtime session types
+
+export type RealtimePhase = "final_answer" | "commentary";
+
+export interface GASessionAudioConfig {
+ voice: string | null;
+ input_audio_format: string | null;
+ output_audio_format: string | null;
+ input_audio_noise_reduction: { type: string } | null;
+ input_audio_transcription: { model: string } | null;
+}
+
+export interface GASessionConfig {
+ model: string;
+ modalities: string[];
+ instructions: string;
+ tools: unknown[];
+ temperature: number;
+ max_response_output_tokens: number | "inf";
+ audio: GASessionAudioConfig;
+ turn_detection: unknown | null;
+ input_audio_transcription: { model: string } | null;
+ type: "conversation" | "transcription" | "translation";
+ reasoning: { effort: string } | null;
+}
+
// Streaming physics
export interface StreamingProfile {
@@ -353,7 +382,10 @@ export interface FixtureFileEntry {
| "embedding"
| "audio-gen"
| "fal-audio"
- | "fal";
+ | "fal"
+ | "realtime"
+ | "realtime-transcription"
+ | "realtime-translation";
// predicate not supported in JSON files
};
response: FixtureFileResponse;
diff --git a/src/ws-realtime.ts b/src/ws-realtime.ts
index 636a89d1..634d3a5e 100644
--- a/src/ws-realtime.ts
+++ b/src/ws-realtime.ts
@@ -37,7 +37,7 @@ interface RealtimeItem {
type: "message" | "function_call" | "function_call_output";
id?: string;
role?: "user" | "assistant" | "system";
- content?: Array<{ type: string; text?: string }>;
+ content?: Array<{ type: string; text?: string; url?: string; transcript?: string | null }>;
name?: string;
call_id?: string;
arguments?: string;
@@ -52,8 +52,12 @@ interface SessionConfig {
voice: string | null;
input_audio_format: string | null;
output_audio_format: string | null;
+ input_audio_noise_reduction: { type: string } | null;
+ input_audio_transcription: { model: string } | null;
turn_detection: unknown | null;
temperature: number;
+ type: "conversation" | "transcription" | "translation";
+ reasoning: { effort: string } | null;
}
interface RealtimeMessage {
@@ -83,10 +87,38 @@ export function realtimeItemsToMessages(
for (const item of items) {
if (item.type === "message") {
- const text = item.content?.[0]?.text ?? "";
const role =
item.role === "assistant" ? "assistant" : item.role === "system" ? "system" : "user";
- messages.push({ role, content: text });
+
+ // Check if content contains multimodal input types (input_text, input_image, input_audio)
+ const hasMultimodal = item.content?.some(
+ (p) => p.type === "input_text" || p.type === "input_image" || p.type === "input_audio",
+ );
+
+ if (hasMultimodal && item.content) {
+ // Map realtime input content types to ChatMessage content parts
+ const mappedContent = item.content.map((part) => {
+ if (part.type === "input_text") {
+ return { type: "text" as const, text: part.text ?? "" };
+ }
+ if (part.type === "input_image") {
+ return {
+ type: "image_url" as const,
+ image_url: { url: part.url ?? "" },
+ };
+ }
+ if (part.type === "input_audio") {
+ return { type: "text" as const, text: "[audio input]" };
+ }
+ // Pass through unknown content types as-is
+ return part;
+ });
+ messages.push({ role, content: mappedContent });
+ } else {
+ // Existing behavior: extract text from first content element
+ const text = item.content?.[0]?.text ?? "";
+ messages.push({ role, content: text });
+ }
} else if (item.type === "function_call") {
if (!item.name) {
logger?.warn("Realtime function_call item missing 'name'");
@@ -120,18 +152,136 @@ export function realtimeItemsToMessages(
return messages;
}
-// ─── Event builders ─────────────────────────────────────────────────────────
+// ─── GA -> Beta translation ─────────────────────────────────────────────────
+
+/** GA -> Beta event name mapping */
+const GA_TO_BETA_EVENT: Record = {
+ "response.output_text.delta": "response.text.delta",
+ "response.output_text.done": "response.text.done",
+ "response.output_audio.delta": "response.audio.delta",
+ "response.output_audio.done": "response.audio.done",
+ "response.output_audio_transcript.delta": "response.audio_transcript.delta",
+ "response.output_audio_transcript.done": "response.audio_transcript.done",
+ "conversation.item.added": "conversation.item.created",
+};
+
+/** GA -> Beta content type mapping */
+const GA_TO_BETA_CONTENT_TYPE: Record = {
+ output_text: "text",
+ output_audio: "audio",
+};
+
+/** Events suppressed in Beta mode (GA-only events) */
+const BETA_SUPPRESSED_EVENTS = new Set(["conversation.item.done"]);
+
+function translateGAToBeta(event: Record): Record | null {
+ const type = event.type as string;
+ if (BETA_SUPPRESSED_EVENTS.has(type)) return null;
+
+ const translated = { ...event };
+ if (GA_TO_BETA_EVENT[type]) {
+ translated.type = GA_TO_BETA_EVENT[type];
+ }
+
+ // Translate content types in nested structures
+ if (translated.part && typeof translated.part === "object") {
+ const part = { ...(translated.part as Record) };
+ if (typeof part.type === "string" && GA_TO_BETA_CONTENT_TYPE[part.type]) {
+ part.type = GA_TO_BETA_CONTENT_TYPE[part.type];
+ }
+ translated.part = part;
+ }
+ if (translated.content_part && typeof translated.content_part === "object") {
+ const cp = { ...(translated.content_part as Record) };
+ if (typeof cp.type === "string" && GA_TO_BETA_CONTENT_TYPE[cp.type]) {
+ cp.type = GA_TO_BETA_CONTENT_TYPE[cp.type];
+ }
+ translated.content_part = cp;
+ }
+ // Translate content arrays
+ if (Array.isArray(translated.content)) {
+ translated.content = (translated.content as Record[]).map((c) => {
+ if (typeof c.type === "string" && GA_TO_BETA_CONTENT_TYPE[c.type]) {
+ return { ...c, type: GA_TO_BETA_CONTENT_TYPE[c.type] };
+ }
+ return c;
+ });
+ }
+ // Translate item.content arrays (response.output_item.added/done, conversation.item.added)
+ if (translated.item && typeof translated.item === "object") {
+ const item = { ...(translated.item as Record) };
+ delete item.phase; // GA-only field
+ if (Array.isArray(item.content)) {
+ item.content = (item.content as Record[]).map((c) => {
+ if (typeof c.type === "string" && GA_TO_BETA_CONTENT_TYPE[c.type]) {
+ return { ...c, type: GA_TO_BETA_CONTENT_TYPE[c.type] };
+ }
+ return c;
+ });
+ }
+ translated.item = item;
+ }
+ // Translate response.output[].content arrays (response.done)
+ if (translated.response && typeof translated.response === "object") {
+ const resp = { ...(translated.response as Record) };
+ if (Array.isArray(resp.output)) {
+ resp.output = (resp.output as Record[]).map((outItem) => {
+ const o = { ...(outItem as Record) };
+ if (Array.isArray(o.content)) {
+ o.content = (o.content as Record[]).map((c) =>
+ typeof c.type === "string" && GA_TO_BETA_CONTENT_TYPE[c.type]
+ ? { ...c, type: GA_TO_BETA_CONTENT_TYPE[c.type] }
+ : c,
+ );
+ }
+ return o;
+ });
+ }
+ translated.response = resp;
+ }
+
+ // Flatten GA session config for Beta (session.created / session.updated)
+ if (type === "session.created" || type === "session.updated") {
+ if (translated.session && typeof translated.session === "object") {
+ const session = { ...(translated.session as Record) };
+ if (session.audio && typeof session.audio === "object") {
+ const audio = session.audio as Record;
+ session.voice = audio.voice;
+ session.input_audio_format = audio.input_audio_format;
+ session.output_audio_format = audio.output_audio_format;
+ session.input_audio_transcription = audio.input_audio_transcription;
+ delete session.audio;
+ }
+ delete session.type;
+ delete session.reasoning;
+ translated.session = session;
+ }
+ }
+
+ return translated;
+}
+
+// ─── Event sending ──────────────────────────────────────────────────────────
-function evt(type: string, extra: Record = {}): string {
- return JSON.stringify({ type, event_id: realtimeId("event"), ...extra });
+function sendEvent(ws: WebSocketConnection, event: Record, isBeta: boolean): void {
+ const out = { ...event, event_id: event.event_id ?? realtimeId("event") };
+ if (isBeta) {
+ const translated = translateGAToBeta(out);
+ if (translated === null) return; // suppressed in Beta mode
+ ws.send(JSON.stringify(translated));
+ } else {
+ ws.send(JSON.stringify(out));
+ }
}
function buildErrorRealtimeEvent(
+ ws: WebSocketConnection,
message: string,
+ isBeta: boolean,
type = "invalid_request_error",
code?: string,
-): string {
- return evt("error", { error: { message, type, code } });
+): void {
+ sendEvent(ws, { type: "error", error: { message, type, code } }, isBeta);
}
// ─── Main handler ───────────────────────────────────────────────────────────
@@ -154,6 +304,10 @@ export function handleWebSocketRealtime(
const { logger } = defaults;
const sessionId = realtimeId("sess");
+ const isBeta = defaults.upgradeHeaders?.["openai-beta"]
+ ? String(defaults.upgradeHeaders["openai-beta"]).includes("realtime=v1")
+ : false;
+
const session: SessionConfig = {
model: defaults.model,
modalities: ["text"],
@@ -162,44 +316,71 @@ export function handleWebSocketRealtime(
voice: null,
input_audio_format: null,
output_audio_format: null,
+ input_audio_noise_reduction: null,
+ input_audio_transcription: null,
turn_detection: null,
temperature: 0.8,
+ type: "conversation",
+ reasoning: null,
};
const conversationItems: RealtimeItem[] = [];
- // Send session.created immediately on connect
- ws.send(
- evt("session.created", {
+ // Send session.created immediately on connect (GA format — shim flattens for Beta)
+ sendEvent(
+ ws,
+ {
+ type: "session.created",
session: {
id: sessionId,
object: "realtime.session",
- ...session,
+ model: session.model,
expires_at: Math.floor(Date.now() / 1000) + 3600,
- max_response_output_tokens: "inf",
- input_audio_transcription: null,
+ modalities: session.modalities,
+ instructions: session.instructions,
+ tools: session.tools,
tool_choice: "auto",
+ temperature: session.temperature,
+ max_response_output_tokens: "inf",
+ audio: {
+ voice: session.voice,
+ input_audio_format: session.input_audio_format,
+ output_audio_format: session.output_audio_format,
+ input_audio_noise_reduction: session.input_audio_noise_reduction,
+ input_audio_transcription: session.input_audio_transcription,
+ },
+ turn_detection: session.turn_detection,
+ type: session.type,
+ reasoning: session.reasoning,
},
- }),
+ },
+ isBeta,
);
// Serialize message processing to prevent event interleaving
let pending = Promise.resolve();
ws.on("message", (raw: string) => {
pending = pending.then(() =>
- processMessage(raw, ws, fixtures, journal, defaults, session, conversationItems).catch(
- (err: unknown) => {
- const msg = err instanceof Error ? err.message : "Internal error";
- logger.error(`WebSocket realtime error: ${msg}`);
- try {
- ws.send(buildErrorRealtimeEvent(msg, "server_error"));
- } catch (sendErr) {
- defaults.logger.debug(
- `Failed to send error to client: ${sendErr instanceof Error ? sendErr.message : "unknown"}`,
- );
- }
- },
- ),
+ processMessage(
+ raw,
+ ws,
+ fixtures,
+ journal,
+ defaults,
+ session,
+ conversationItems,
+ isBeta,
+ ).catch((err: unknown) => {
+ const msg = err instanceof Error ? err.message : "Internal error";
+ logger.error(`WebSocket realtime error: ${msg}`);
+ try {
+ buildErrorRealtimeEvent(ws, msg, isBeta, "server_error");
+ } catch (sendErr) {
+ defaults.logger.debug(
+ `Failed to send error to client: ${sendErr instanceof Error ? sendErr.message : "unknown"}`,
+ );
+ }
+ }),
);
});
}
@@ -221,14 +402,19 @@ async function processMessage(
},
session: SessionConfig,
conversationItems: RealtimeItem[],
+ isBeta: boolean,
): Promise {
let parsed: RealtimeMessage;
try {
parsed = JSON.parse(raw) as RealtimeMessage;
} catch (parseErr) {
const detail = parseErr instanceof Error ? parseErr.message : "unknown";
- ws.send(
- buildErrorRealtimeEvent(`Malformed JSON: ${detail}`, "invalid_request_error", "invalid_json"),
+ buildErrorRealtimeEvent(
+ ws,
+ `Malformed JSON: ${detail}`,
+ isBeta,
+ "invalid_request_error",
+ "invalid_json",
);
return;
}
@@ -238,33 +424,133 @@ async function processMessage(
// ── session.update ────────────────────────────────────────────────────
if (msgType === "session.update") {
if (parsed.session) {
- if (parsed.session.instructions !== undefined) {
- session.instructions = parsed.session.instructions;
- }
- if (parsed.session.tools !== undefined) {
- session.tools = parsed.session.tools;
+ const s = parsed.session;
+
+ // Validate session.type value before applying any mutations
+ const validTypes = new Set(["conversation", "transcription", "translation"]);
+ if ((s as Record).type !== undefined) {
+ if (!validTypes.has((s as Record).type as string)) {
+ sendEvent(
+ ws,
+ {
+ type: "error",
+ error: {
+ message: `Invalid session type: ${(s as Record).type}`,
+ type: "invalid_request_error",
+ code: "invalid_session_config",
+ },
+ },
+ isBeta,
+ );
+ return;
+ }
}
- if (parsed.session.modalities !== undefined) {
- session.modalities = parsed.session.modalities;
+
+ // Capture pre-mutation values for rollback on validation failure
+ const prevModel = session.model;
+ const prevType = session.type;
+
+ if (s.instructions !== undefined) session.instructions = s.instructions;
+ if (s.tools !== undefined) session.tools = s.tools;
+ if (s.modalities !== undefined) session.modalities = s.modalities;
+ if (s.model !== undefined) session.model = s.model;
+ if (s.temperature !== undefined) session.temperature = s.temperature;
+ if ((s as Record).type !== undefined)
+ session.type = (s as Record).type as SessionConfig["type"];
+ // GA nested audio config
+ if ((s as Record).audio) {
+ const audio = (s as Record).audio as Record;
+ if (audio.voice !== undefined) session.voice = audio.voice as string | null;
+ if (audio.input_audio_format !== undefined)
+ session.input_audio_format = audio.input_audio_format as string | null;
+ if (audio.output_audio_format !== undefined)
+ session.output_audio_format = audio.output_audio_format as string | null;
+ if (audio.input_audio_noise_reduction !== undefined)
+ session.input_audio_noise_reduction = audio.input_audio_noise_reduction as {
+ type: string;
+ } | null;
+ if (audio.input_audio_transcription !== undefined)
+ session.input_audio_transcription = audio.input_audio_transcription as {
+ model: string;
+ } | null;
}
- if (parsed.session.model !== undefined) {
- session.model = parsed.session.model;
+ // Beta flat fields (backward compat)
+ if (s.voice !== undefined) session.voice = s.voice;
+ if (s.input_audio_format !== undefined) session.input_audio_format = s.input_audio_format;
+ if (s.output_audio_format !== undefined) session.output_audio_format = s.output_audio_format;
+ // reasoning config
+ if ((s as Record).reasoning !== undefined)
+ session.reasoning = (s as Record).reasoning as {
+ effort: string;
+ } | null;
+
+ // Validate model+type combinations (rollback on failure)
+ const transcriptionModels = new Set(["gpt-realtime-whisper"]);
+ const translationModels = new Set(["gpt-realtime-translate"]);
+
+ if (session.type === "transcription" && !transcriptionModels.has(session.model)) {
+ session.model = prevModel;
+ session.type = prevType;
+ sendEvent(
+ ws,
+ {
+ type: "error",
+ error: {
+ message: `Model ${s.model ?? prevModel} does not support session type transcription`,
+ type: "invalid_request_error",
+ code: "invalid_session_config",
+ },
+ },
+ isBeta,
+ );
+ return;
}
- if (parsed.session.temperature !== undefined) {
- session.temperature = parsed.session.temperature;
+ if (session.type === "translation" && !translationModels.has(session.model)) {
+ session.model = prevModel;
+ session.type = prevType;
+ sendEvent(
+ ws,
+ {
+ type: "error",
+ error: {
+ message: `Model ${s.model ?? prevModel} does not support session type translation`,
+ type: "invalid_request_error",
+ code: "invalid_session_config",
+ },
+ },
+ isBeta,
+ );
+ return;
}
}
- ws.send(
- evt("session.updated", {
+
+ sendEvent(
+ ws,
+ {
+ type: "session.updated",
session: {
- ...session,
object: "realtime.session",
+ model: session.model,
expires_at: Math.floor(Date.now() / 1000) + 3600,
- max_response_output_tokens: "inf",
- input_audio_transcription: null,
+ modalities: session.modalities,
+ instructions: session.instructions,
+ tools: session.tools,
tool_choice: "auto",
+ temperature: session.temperature,
+ max_response_output_tokens: "inf",
+ audio: {
+ voice: session.voice,
+ input_audio_format: session.input_audio_format,
+ output_audio_format: session.output_audio_format,
+ input_audio_noise_reduction: session.input_audio_noise_reduction,
+ input_audio_transcription: session.input_audio_transcription,
+ },
+ turn_detection: session.turn_detection,
+ type: session.type,
+ reasoning: session.reasoning,
},
- }),
+ },
+ isBeta,
);
return;
}
@@ -272,11 +558,11 @@ async function processMessage(
// ── conversation.item.create ──────────────────────────────────────────
if (msgType === "conversation.item.create") {
if (!parsed.item) {
- ws.send(
- buildErrorRealtimeEvent(
- "Missing 'item' in conversation.item.create",
- "invalid_request_error",
- ),
+ buildErrorRealtimeEvent(
+ ws,
+ "Missing 'item' in conversation.item.create",
+ isBeta,
+ "invalid_request_error",
);
return;
}
@@ -289,7 +575,7 @@ async function processMessage(
? (conversationItems[conversationItems.length - 1].id ?? null)
: null;
conversationItems.push(item);
- ws.send(evt("conversation.item.created", { previous_item_id: previousId, item }));
+ sendEvent(ws, { type: "conversation.item.added", previous_item_id: previousId, item }, isBeta);
return;
}
@@ -302,11 +588,54 @@ async function processMessage(
defaults,
session,
conversationItems,
+ isBeta,
parsed.response,
);
return;
}
+ // ── input_audio_buffer.append ────────────────────────────────────────
+ if (msgType === "input_audio_buffer.append") {
+ // Accept silently — aimock doesn't process actual audio
+ return;
+ }
+
+ // ── input_audio_buffer.commit ──────────────────────────────────────
+ if (msgType === "input_audio_buffer.commit") {
+ sendEvent(ws, { type: "input_audio_buffer.committed" }, isBeta);
+ // In transcription/translation mode, add a placeholder user item
+ if (session.type === "transcription" || session.type === "translation") {
+ const audioItem: RealtimeItem = {
+ type: "message",
+ id: realtimeId("item"),
+ role: "user",
+ content: [{ type: "input_audio", transcript: null }],
+ };
+ conversationItems.push(audioItem);
+ sendEvent(
+ ws,
+ {
+ type: "conversation.item.added",
+ item: audioItem,
+ },
+ isBeta,
+ );
+ }
+ return;
+ }
+
+ // ── input_audio_buffer.clear ───────────────────────────────────────
+ if (msgType === "input_audio_buffer.clear") {
+ sendEvent(ws, { type: "input_audio_buffer.cleared" }, isBeta);
+ return;
+ }
+
+ // ── response.cancel ────────────────────────────────────────────────
+ if (msgType === "response.cancel") {
+ sendEvent(ws, { type: "response.cancelled" }, isBeta);
+ return;
+ }
+
// Unknown message type — ignore silently (matches OpenAI behavior)
}
@@ -326,15 +655,23 @@ async function handleResponseCreate(
},
session: SessionConfig,
conversationItems: RealtimeItem[],
+ isBeta: boolean,
responseOverrides?: { instructions?: string; [key: string]: unknown },
): Promise {
const instructions = (responseOverrides?.instructions ?? session.instructions) || undefined;
const messages = realtimeItemsToMessages(conversationItems, instructions, defaults.logger);
+ const endpointTypeMap: Record = {
+ conversation: "realtime",
+ transcription: "realtime-transcription",
+ translation: "realtime-translation",
+ };
+ const endpointType = endpointTypeMap[session.type] ?? "realtime";
+
const completionReq: ChatCompletionRequest = {
model: session.model,
messages,
- _endpointType: "chat",
+ _endpointType: endpointType,
};
const testId = defaults.testId ?? DEFAULT_TEST_ID;
@@ -379,8 +716,10 @@ async function handleResponseCreate(
},
});
// Send response.created with failed status then response.done with error
- ws.send(
- evt("response.created", {
+ sendEvent(
+ ws,
+ {
+ type: "response.created",
response: {
id: responseId,
object: "realtime.response",
@@ -389,10 +728,13 @@ async function handleResponseCreate(
output: [],
usage: null,
},
- }),
+ },
+ isBeta,
);
- ws.send(
- evt("response.done", {
+ sendEvent(
+ ws,
+ {
+ type: "response.done",
response: {
id: responseId,
object: "realtime.response",
@@ -408,7 +750,8 @@ async function handleResponseCreate(
},
usage: { total_tokens: 0, input_tokens: 0, output_tokens: 0 },
},
- }),
+ },
+ isBeta,
);
return;
}
@@ -427,8 +770,10 @@ async function handleResponseCreate(
body: completionReq,
response: { status, fixture },
});
- ws.send(
- evt("response.created", {
+ sendEvent(
+ ws,
+ {
+ type: "response.created",
response: {
id: responseId,
object: "realtime.response",
@@ -437,10 +782,13 @@ async function handleResponseCreate(
output: [],
usage: null,
},
- }),
+ },
+ isBeta,
);
- ws.send(
- evt("response.done", {
+ sendEvent(
+ ws,
+ {
+ type: "response.done",
response: {
id: responseId,
object: "realtime.response",
@@ -456,7 +804,8 @@ async function handleResponseCreate(
},
usage: { total_tokens: 0, input_tokens: 0, output_tokens: 0 },
},
- }),
+ },
+ isBeta,
);
return;
}
@@ -472,8 +821,10 @@ async function handleResponseCreate(
});
// response.created
- ws.send(
- evt("response.created", {
+ sendEvent(
+ ws,
+ {
+ type: "response.created",
response: {
id: responseId,
object: "realtime.response",
@@ -482,7 +833,8 @@ async function handleResponseCreate(
output: [],
usage: null,
},
- }),
+ },
+ isBeta,
);
const interruption = createInterruptionSignal(fixture);
@@ -499,12 +851,18 @@ async function handleResponseCreate(
type: "message",
role: "assistant",
status: "completed",
- content: [{ type: "text", text: response.content }],
+ content: [{ type: "output_text", text: response.content }],
};
+ // Determine phase: text is "commentary" when tool calls are also present
+ const hasToolCalls = response.toolCalls && response.toolCalls.length > 0;
+ const textPhase = hasToolCalls ? "commentary" : "final_answer";
+
// response.output_item.added (text)
- ws.send(
- evt("response.output_item.added", {
+ sendEvent(
+ ws,
+ {
+ type: "response.output_item.added",
response_id: responseId,
output_index: textOutputIndex,
item: {
@@ -513,22 +871,27 @@ async function handleResponseCreate(
role: "assistant",
status: "in_progress",
content: [],
+ phase: textPhase,
},
- }),
+ },
+ isBeta,
);
// response.content_part.added
- ws.send(
- evt("response.content_part.added", {
+ sendEvent(
+ ws,
+ {
+ type: "response.content_part.added",
response_id: responseId,
item_id: textItemId,
output_index: textOutputIndex,
content_index: contentIndex,
- part: { type: "text", text: "" },
- }),
+ part: { type: "output_text", text: "" },
+ },
+ isBeta,
);
- // response.text.delta (chunked)
+ // response.output_text.delta (chunked) — GA name
const content = response.content;
for (let i = 0; i < content.length; i += chunkSize) {
if (ws.isClosed) break;
@@ -539,14 +902,17 @@ async function handleResponseCreate(
}
if (ws.isClosed) break;
const chunk = content.slice(i, i + chunkSize);
- ws.send(
- evt("response.text.delta", {
+ sendEvent(
+ ws,
+ {
+ type: "response.output_text.delta",
response_id: responseId,
item_id: textItemId,
output_index: textOutputIndex,
content_index: contentIndex,
delta: chunk,
- }),
+ },
+ isBeta,
);
interruption?.tick();
if (interruption?.signal.aborted) {
@@ -568,15 +934,18 @@ async function handleResponseCreate(
return;
}
- // response.text.done
- ws.send(
- evt("response.text.done", {
+ // response.output_text.done
+ sendEvent(
+ ws,
+ {
+ type: "response.output_text.done",
response_id: responseId,
item_id: textItemId,
output_index: textOutputIndex,
content_index: contentIndex,
text: content,
- }),
+ },
+ isBeta,
);
if (ws.isClosed) {
@@ -585,14 +954,17 @@ async function handleResponseCreate(
}
// response.content_part.done
- ws.send(
- evt("response.content_part.done", {
+ sendEvent(
+ ws,
+ {
+ type: "response.content_part.done",
response_id: responseId,
item_id: textItemId,
output_index: textOutputIndex,
content_index: contentIndex,
- part: { type: "text", text: content },
- }),
+ part: { type: "output_text", text: content },
+ },
+ isBeta,
);
if (ws.isClosed) {
@@ -601,12 +973,32 @@ async function handleResponseCreate(
}
// response.output_item.done (text)
- ws.send(
- evt("response.output_item.done", {
+ sendEvent(
+ ws,
+ {
+ type: "response.output_item.done",
response_id: responseId,
output_index: textOutputIndex,
- item: textOutputItem,
- }),
+ item: { ...textOutputItem, phase: textPhase },
+ },
+ isBeta,
+ );
+
+ // conversation.item.done (text message)
+ sendEvent(
+ ws,
+ {
+ type: "conversation.item.done",
+ item: {
+ id: textItemId,
+ object: "realtime.item",
+ type: "message",
+ role: "assistant",
+ status: "completed",
+ content: textOutputItem.content,
+ },
+ },
+ isBeta,
);
if (ws.isClosed) {
@@ -633,8 +1025,10 @@ async function handleResponseCreate(
};
// response.output_item.added
- ws.send(
- evt("response.output_item.added", {
+ sendEvent(
+ ws,
+ {
+ type: "response.output_item.added",
response_id: responseId,
output_index: outputIndex,
item: {
@@ -644,8 +1038,10 @@ async function handleResponseCreate(
call_id: callId,
name: tc.name,
arguments: "",
+ phase: "final_answer",
},
- }),
+ },
+ isBeta,
);
// response.function_call_arguments.delta (chunked)
@@ -659,14 +1055,17 @@ async function handleResponseCreate(
}
if (ws.isClosed) break;
const chunk = args.slice(i, i + chunkSize);
- ws.send(
- evt("response.function_call_arguments.delta", {
+ sendEvent(
+ ws,
+ {
+ type: "response.function_call_arguments.delta",
response_id: responseId,
item_id: itemId,
output_index: outputIndex,
call_id: callId,
delta: chunk,
- }),
+ },
+ isBeta,
);
interruption?.tick();
if (interruption?.signal.aborted) {
@@ -680,25 +1079,49 @@ async function handleResponseCreate(
if (ws.isClosed) break;
// response.function_call_arguments.done
- ws.send(
- evt("response.function_call_arguments.done", {
+ sendEvent(
+ ws,
+ {
+ type: "response.function_call_arguments.done",
response_id: responseId,
item_id: itemId,
output_index: outputIndex,
call_id: callId,
arguments: args,
- }),
+ },
+ isBeta,
);
if (ws.isClosed) break;
// response.output_item.done
- ws.send(
- evt("response.output_item.done", {
+ sendEvent(
+ ws,
+ {
+ type: "response.output_item.done",
response_id: responseId,
output_index: outputIndex,
- item: toolOutputItem,
- }),
+ item: { ...toolOutputItem, phase: "final_answer" },
+ },
+ isBeta,
+ );
+
+ // conversation.item.done (tool call)
+ sendEvent(
+ ws,
+ {
+ type: "conversation.item.done",
+ item: {
+ id: itemId,
+ object: "realtime.item",
+ type: "function_call",
+ status: "completed",
+ call_id: callId,
+ name: tc.name,
+ arguments: args,
+ },
+ },
+ isBeta,
);
if (ws.isClosed) break;
@@ -719,8 +1142,10 @@ async function handleResponseCreate(
if (ws.isClosed) return;
// response.done
- ws.send(
- evt("response.done", {
+ sendEvent(
+ ws,
+ {
+ type: "response.done",
response: {
id: responseId,
object: "realtime.response",
@@ -728,7 +1153,8 @@ async function handleResponseCreate(
output: allOutputItems,
usage: { total_tokens: 0, input_tokens: 0, output_tokens: 0 },
},
- }),
+ },
+ isBeta,
);
// Accumulate into conversation for multi-turn
@@ -763,12 +1189,14 @@ async function handleResponseCreate(
type: "message",
role: "assistant",
status: "completed",
- content: [{ type: "text", text: response.content }],
+ content: [{ type: "output_text", text: response.content }],
};
// response.created
- ws.send(
- evt("response.created", {
+ sendEvent(
+ ws,
+ {
+ type: "response.created",
response: {
id: responseId,
object: "realtime.response",
@@ -777,12 +1205,15 @@ async function handleResponseCreate(
output: [],
usage: null,
},
- }),
+ },
+ isBeta,
);
// response.output_item.added
- ws.send(
- evt("response.output_item.added", {
+ sendEvent(
+ ws,
+ {
+ type: "response.output_item.added",
response_id: responseId,
output_index: outputIndex,
item: {
@@ -791,22 +1222,27 @@ async function handleResponseCreate(
role: "assistant",
status: "in_progress",
content: [],
+ phase: "final_answer",
},
- }),
+ },
+ isBeta,
);
// response.content_part.added
- ws.send(
- evt("response.content_part.added", {
+ sendEvent(
+ ws,
+ {
+ type: "response.content_part.added",
response_id: responseId,
item_id: itemId,
output_index: outputIndex,
content_index: contentIndex,
- part: { type: "text", text: "" },
- }),
+ part: { type: "output_text", text: "" },
+ },
+ isBeta,
);
- // response.text.delta (chunked)
+ // response.output_text.delta (chunked) — GA name
const content = response.content;
const interruption = createInterruptionSignal(fixture);
let interrupted = false;
@@ -820,14 +1256,17 @@ async function handleResponseCreate(
}
if (ws.isClosed) break;
const chunk = content.slice(i, i + chunkSize);
- ws.send(
- evt("response.text.delta", {
+ sendEvent(
+ ws,
+ {
+ type: "response.output_text.delta",
response_id: responseId,
item_id: itemId,
output_index: outputIndex,
content_index: contentIndex,
delta: chunk,
- }),
+ },
+ isBeta,
);
interruption?.tick();
if (interruption?.signal.aborted) {
@@ -848,40 +1287,68 @@ async function handleResponseCreate(
if (ws.isClosed) return;
- // response.text.done
- ws.send(
- evt("response.text.done", {
+ // response.output_text.done
+ sendEvent(
+ ws,
+ {
+ type: "response.output_text.done",
response_id: responseId,
item_id: itemId,
output_index: outputIndex,
content_index: contentIndex,
text: content,
- }),
+ },
+ isBeta,
);
// response.content_part.done
- ws.send(
- evt("response.content_part.done", {
+ sendEvent(
+ ws,
+ {
+ type: "response.content_part.done",
response_id: responseId,
item_id: itemId,
output_index: outputIndex,
content_index: contentIndex,
- part: { type: "text", text: content },
- }),
+ part: { type: "output_text", text: content },
+ },
+ isBeta,
);
// response.output_item.done
- ws.send(
- evt("response.output_item.done", {
+ sendEvent(
+ ws,
+ {
+ type: "response.output_item.done",
response_id: responseId,
output_index: outputIndex,
- item: outputItem,
- }),
+ item: { ...outputItem, phase: "final_answer" },
+ },
+ isBeta,
+ );
+
+ // conversation.item.done (text message)
+ sendEvent(
+ ws,
+ {
+ type: "conversation.item.done",
+ item: {
+ id: itemId,
+ object: "realtime.item",
+ type: "message",
+ role: "assistant",
+ status: "completed",
+ content: outputItem.content,
+ },
+ },
+ isBeta,
);
// response.done
- ws.send(
- evt("response.done", {
+ sendEvent(
+ ws,
+ {
+ type: "response.done",
response: {
id: responseId,
object: "realtime.response",
@@ -889,7 +1356,8 @@ async function handleResponseCreate(
output: [outputItem],
usage: { total_tokens: 0, input_tokens: 0, output_tokens: 0 },
},
- }),
+ },
+ isBeta,
);
// Accumulate assistant response into conversation for multi-turn
@@ -913,8 +1381,10 @@ async function handleResponseCreate(
});
// response.created
- ws.send(
- evt("response.created", {
+ sendEvent(
+ ws,
+ {
+ type: "response.created",
response: {
id: responseId,
object: "realtime.response",
@@ -923,7 +1393,8 @@ async function handleResponseCreate(
output: [],
usage: null,
},
- }),
+ },
+ isBeta,
);
const outputItems: unknown[] = [];
@@ -945,8 +1416,10 @@ async function handleResponseCreate(
};
// response.output_item.added
- ws.send(
- evt("response.output_item.added", {
+ sendEvent(
+ ws,
+ {
+ type: "response.output_item.added",
response_id: responseId,
output_index: tcIdx,
item: {
@@ -956,8 +1429,10 @@ async function handleResponseCreate(
call_id: callId,
name: tc.name,
arguments: "",
+ phase: "final_answer",
},
- }),
+ },
+ isBeta,
);
// response.function_call_arguments.delta (chunked)
@@ -971,14 +1446,17 @@ async function handleResponseCreate(
}
if (ws.isClosed) break;
const chunk = args.slice(i, i + chunkSize);
- ws.send(
- evt("response.function_call_arguments.delta", {
+ sendEvent(
+ ws,
+ {
+ type: "response.function_call_arguments.delta",
response_id: responseId,
item_id: itemId,
output_index: tcIdx,
call_id: callId,
delta: chunk,
- }),
+ },
+ isBeta,
);
interruption?.tick();
if (interruption?.signal.aborted) {
@@ -992,23 +1470,47 @@ async function handleResponseCreate(
if (ws.isClosed) break;
// response.function_call_arguments.done
- ws.send(
- evt("response.function_call_arguments.done", {
+ sendEvent(
+ ws,
+ {
+ type: "response.function_call_arguments.done",
response_id: responseId,
item_id: itemId,
output_index: tcIdx,
call_id: callId,
arguments: args,
- }),
+ },
+ isBeta,
);
// response.output_item.done
- ws.send(
- evt("response.output_item.done", {
+ sendEvent(
+ ws,
+ {
+ type: "response.output_item.done",
response_id: responseId,
output_index: tcIdx,
- item: outputItem,
- }),
+ item: { ...outputItem, phase: "final_answer" },
+ },
+ isBeta,
+ );
+
+ // conversation.item.done (tool call)
+ sendEvent(
+ ws,
+ {
+ type: "conversation.item.done",
+ item: {
+ id: itemId,
+ object: "realtime.item",
+ type: "function_call",
+ status: "completed",
+ call_id: callId,
+ name: tc.name,
+ arguments: args,
+ },
+ },
+ isBeta,
);
outputItems.push(outputItem);
@@ -1027,8 +1529,10 @@ async function handleResponseCreate(
if (ws.isClosed) return;
// response.done
- ws.send(
- evt("response.done", {
+ sendEvent(
+ ws,
+ {
+ type: "response.done",
response: {
id: responseId,
object: "realtime.response",
@@ -1036,7 +1540,8 @@ async function handleResponseCreate(
output: outputItems,
usage: { total_tokens: 0, input_tokens: 0, output_tokens: 0 },
},
- }),
+ },
+ isBeta,
);
// Accumulate assistant tool calls into conversation for multi-turn
@@ -1055,5 +1560,10 @@ async function handleResponseCreate(
body: completionReq,
response: { status: 500, fixture },
});
- ws.send(buildErrorRealtimeEvent("Fixture response did not match any known type", "server_error"));
+ buildErrorRealtimeEvent(
+ ws,
+ "Fixture response did not match any known type",
+ isBeta,
+ "server_error",
+ );
}