Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,20 @@
- **Model-aware fixture recording** — recorded fixtures now include the model name in match criteria, preventing collisions when an app makes multiple LLM calls with the same user message but different models. Model names are normalized by stripping date/version suffixes (e.g., `claude-opus-4-20250514` → `claude-opus-4`) so fixtures survive version bumps. Disable with `recordFullModelVersion: true`. ([#185](https://github.com/CopilotKit/aimock/issues/185))
- **Drift detection metadata** — recorded fixtures include `systemHash` and `toolsHash` in a `metadata` block for detecting system prompt or tool definition changes since recording.
- **Prefix model matching** — fixture router uses `startsWith` for string model matching, so `model: "claude-opus-4"` matches any `claude-opus-4-*` version.
- **GA Realtime protocol migration with Beta compatibility shim** — handler emits GA event names natively; `sendEvent()` wrapper translates back for Beta clients detected via `OpenAI-Beta` header. Default model changed to `gpt-realtime-2`.
- **5 new GA Realtime models** — `gpt-realtime-2`, `gpt-realtime-1.5`, `gpt-realtime-mini`, `gpt-realtime-translate`, `gpt-realtime-whisper`.
- **Translate and whisper session types** — dedicated session configurations for translation and transcription workloads on the Realtime API.
- **Image input support** — Realtime sessions accept image content parts alongside text and audio.
- **Commentary phase** — Realtime handler supports the GA commentary phase for model-generated annotations.
- **`conversation.item.done` and `response.cancel` events** — new GA Realtime event types for item completion tracking and response cancellation.
- **Endpoint type routing for Realtime** — router distinguishes GA vs Beta Realtime endpoints for fixture matching.
- **Drift detection for GA Realtime** — drift test suite extended with GA protocol shapes, Beta conformance shapes, and three-way triangulation.

### Tests

- **73 GA Realtime integration tests** — comprehensive test coverage for all GA event types, Beta compatibility, session management, model routing, image input, translate/whisper, commentary, and cancellation.
- **GA and Beta Realtime conformance suites** — API conformance tests validating event shapes against both GA and Beta protocol specs.
- **GA Realtime drift detection** — SDK shape tests and provider triangulation for the GA Realtime protocol.

## [1.22.1] - 2026-05-12

Expand Down
23 changes: 15 additions & 8 deletions DRIFT.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ When a model is deprecated:

## WebSocket Drift Coverage

In addition to the 23 existing drift tests (20 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover aimock's WS protocols (4 verified + 2 canary = 6 WS tests):
In addition to the 23 existing drift tests (20 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover aimock's WS protocols (6 verified + 2 canary = 8 WS tests):

### Gemini Interactions API (Beta)

Expand All @@ -120,13 +120,20 @@ The Gemini Interactions API (`/v1beta/interactions`) is covered by 4 drift tests

Uses `describe.skipIf(!GOOGLE_API_KEY)` like other Gemini tests. The Interactions API is in Beta — shapes may shift as Google iterates on the endpoint.

| Protocol | Text | Tool Call | Real Endpoint | Status |
| ------------------- | ---- | --------- | ------------------------------------------------------------------- | ---------- |
| OpenAI Responses WS | ✓ | ✓ | `wss://api.openai.com/v1/responses` | Verified |
| OpenAI Realtime | ✓ | ✓ | `wss://api.openai.com/v1/realtime` | Verified |
| Gemini Live | — | — | `wss://generativelanguage.googleapis.com/ws/...BidiGenerateContent` | Unverified |
| Protocol | Text | Tool Call | Real Endpoint | Status |
| ---------------------- | ---- | --------- | ------------------------------------------------------------------- | ---------- |
| OpenAI Responses WS | ✓ | ✓ | `wss://api.openai.com/v1/responses` | Verified |
| OpenAI Realtime (GA) | ✓ | ✓ | `wss://api.openai.com/v1/realtime` | Verified |
| OpenAI Realtime (Beta) | ✓ | ✓ | `wss://api.openai.com/v1/realtime` + `OpenAI-Beta: realtime=v1` | Verified |
| Gemini Live | — | — | `wss://generativelanguage.googleapis.com/ws/...BidiGenerateContent` | Unverified |

**Models**: `gpt-4o-mini` for Responses WS, `gpt-4o-mini-realtime-preview` for Realtime.
**Models**: `gpt-4o-mini` for Responses WS, `gpt-realtime-2` for Realtime GA (was `gpt-4o-mini-realtime-preview`).

**GA Realtime Drift Tests**:

- **Model canary** — Verifies all 5 GA models exist (`gpt-realtime-2`, `gpt-realtime-1.5`, `gpt-realtime-mini`, `gpt-realtime-translate`, `gpt-realtime-whisper`) and flags unknown realtime models
- **Protocol probe** — Connects with both GA and Beta protocol, normalizes event sequences, and verifies consistency
- **Event shape validation** — GA event names (`response.output_text.delta`, `conversation.item.added`, `conversation.item.done`) and nested session config (`session.audio.*`, `session.type`, `session.reasoning`)

**Auth**: Uses the same `OPENAI_API_KEY` and `GOOGLE_API_KEY` environment variables as HTTP tests. No new secrets needed.

Expand Down Expand Up @@ -175,4 +182,4 @@ The fix workflow also supports `workflow_dispatch` for manual runs.

## Cost

~29 API calls per run (20 HTTP response-shape + 3 model listing + 6 WS including canaries) using the cheapest available models (`gpt-4o-mini`, `gpt-4o-mini-realtime-preview`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.20/week at daily cadence. When Gemini Live text-capable models become available, the 2 canary tests will become full drift tests, increasing real WS connections from 4 to 6.
~31 API calls per run (20 HTTP response-shape + 3 model listing + 8 WS including canaries) using the cheapest available models (`gpt-4o-mini`, `gpt-realtime-2`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.25/week at daily cadence. The GA protocol probe adds a second Realtime WS connection (one GA, one Beta) per run. When Gemini Live text-capable models become available, the 2 canary tests will become full drift tests, increasing real WS connections from 6 to 8.
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,29 +35,29 @@ await mock.stop();

aimock mocks everything your AI app talks to:

| Tool | What it mocks | Docs |
| -------------- | -------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
| **LLMock** | OpenAI (Chat/Responses/Realtime), Claude, Gemini (REST/Live/Interactions), Bedrock, Azure, Vertex AI, Ollama, Cohere | [Providers](https://aimock.copilotkit.dev/docs) |
| **MCPMock** | MCP tools, resources, prompts with session management | [MCP](https://aimock.copilotkit.dev/mcp-mock) |
| **A2AMock** | Agent-to-agent protocol with SSE streaming | [A2A](https://aimock.copilotkit.dev/a2a-mock) |
| **AGUIMock** | AG-UI agent-to-UI event streams for frontend testing | [AG-UI](https://aimock.copilotkit.dev/agui-mock) |
| **VectorMock** | Pinecone, Qdrant, ChromaDB compatible endpoints | [Vector](https://aimock.copilotkit.dev/vector-mock) |
| **Services** | Tavily search, Cohere rerank, OpenAI moderation | [Services](https://aimock.copilotkit.dev/services) |
| Tool | What it mocks | Docs |
| -------------- | ---------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
| **LLMock** | OpenAI (Chat/Responses/Realtime GA+Beta), Claude, Gemini (REST/Live/Interactions), Bedrock, Azure, Vertex AI, Ollama, Cohere | [Providers](https://aimock.copilotkit.dev/docs) |
| **MCPMock** | MCP tools, resources, prompts with session management | [MCP](https://aimock.copilotkit.dev/mcp-mock) |
| **A2AMock** | Agent-to-agent protocol with SSE streaming | [A2A](https://aimock.copilotkit.dev/a2a-mock) |
| **AGUIMock** | AG-UI agent-to-UI event streams for frontend testing | [AG-UI](https://aimock.copilotkit.dev/agui-mock) |
| **VectorMock** | Pinecone, Qdrant, ChromaDB compatible endpoints | [Vector](https://aimock.copilotkit.dev/vector-mock) |
| **Services** | Tavily search, Cohere rerank, OpenAI moderation | [Services](https://aimock.copilotkit.dev/services) |

Run them all on one port with `npx @copilotkit/aimock --config aimock.json`, or use the programmatic API to compose exactly what you need.

## Features

- **[Record & Replay](https://aimock.copilotkit.dev/record-replay)** — Proxy real APIs, save as fixtures, replay deterministically forever
- **[Multi-turn Conversations](https://aimock.copilotkit.dev/multi-turn)** — Record and replay multi-turn traces with tool rounds; match distinct turns via `turnIndex`, `hasToolResult`, `toolCallId`, `sequenceIndex`, `systemMessage` (gate on host-supplied agent context), or custom predicates
- **[12 LLM Providers](https://aimock.copilotkit.dev/docs)** — OpenAI Chat, OpenAI Responses, OpenAI Realtime, Claude, Gemini, Gemini Live, Gemini Interactions, Azure, Bedrock, Vertex AI, Ollama, Cohere — full streaming support
- **[12 LLM Providers](https://aimock.copilotkit.dev/docs)** — OpenAI Chat, OpenAI Responses, OpenAI Realtime (GA + Beta shim), Claude, Gemini, Gemini Live, Gemini Interactions, Azure, Bedrock, Vertex AI, Ollama, Cohere — full streaming support
- **Multimedia APIs** — [image generation](https://aimock.copilotkit.dev/images) (DALL-E, Imagen), [text-to-speech](https://aimock.copilotkit.dev/speech), [audio transcription](https://aimock.copilotkit.dev/transcription), [video generation](https://aimock.copilotkit.dev/video)
- **[MCP](https://aimock.copilotkit.dev/mcp-mock) / [A2A](https://aimock.copilotkit.dev/a2a-mock) / [AG-UI](https://aimock.copilotkit.dev/agui-mock) / [Vector](https://aimock.copilotkit.dev/vector-mock)** — Mock every protocol your AI agents use
- **[Chaos Testing](https://aimock.copilotkit.dev/chaos-testing)** — 500 errors, malformed JSON, mid-stream disconnects at any probability
- **Per-Request Strict Mode** — `X-AIMock-Strict` header overrides the server-level `--strict` flag per request (`true`/`1` = strict, `false`/`0` = lenient)
- **[Drift Detection](https://aimock.copilotkit.dev/drift-detection)** — Daily CI validation against real APIs
- **[Streaming Physics](https://aimock.copilotkit.dev/streaming-physics)** — Configurable `ttft`, `tps`, and `jitter`
- **[WebSocket APIs](https://aimock.copilotkit.dev/websocket)** — OpenAI Realtime, Responses WS, Gemini Live
- **[WebSocket APIs](https://aimock.copilotkit.dev/websocket)** — OpenAI Realtime (GA protocol with 5 models: gpt-realtime-2, gpt-realtime-1.5, gpt-realtime-mini, gpt-realtime-translate, gpt-realtime-whisper; transcription/translation session types; image input; commentary phase), Responses WS, Gemini Live
- **[Prometheus Metrics](https://aimock.copilotkit.dev/metrics)** — Request counts, latencies, fixture match rates
- **[Docker + Helm](https://aimock.copilotkit.dev/docker)** — Container image and Helm chart for CI/CD
- **[Vitest & Jest Plugins](https://aimock.copilotkit.dev/test-plugins)** — Zero-config `useAimock()` with auto lifecycle and env patching
Expand Down
Loading
Loading