Skip to content

Commit 752966a

Browse files
authored
feat: add multimedia endpoint support (image, TTS, transcription, video) (#101)
## Summary - Add four new multimedia endpoint types: image generation (`/v1/images/generations`, `/v1beta/models/{model}:predict`), text-to-speech (`/v1/audio/speech`), audio transcription (`/v1/audio/transcriptions`), and video generation (`/v1/videos`, `/v1/videos/{id}`) - Add `match.endpoint` field to `FixtureMatch` for isolating fixtures by endpoint type, preventing cross-matching (e.g., image fixtures won't match chat requests) - Add convenience methods (`onImage`, `onSpeech`, `onTranscription`, `onVideo`) on `LLMock` and backfill `_endpointType` on all existing handlers ## New Endpoints | Route | Method | Format | Match field | |-------|--------|--------|-------------| | `/v1/images/generations` | POST | OpenAI | `prompt` → `userMessage` | | `/v1beta/models/{model}:predict` | POST | Gemini Imagen | `instances[0].prompt` → `userMessage` | | `/v1/audio/speech` | POST | OpenAI | `input` → `userMessage` | | `/v1/audio/transcriptions` | POST | OpenAI (multipart) | `match.endpoint` only | | `/v1/videos` | POST | OpenAI | `prompt` → `userMessage` | | `/v1/videos/{id}` | GET | OpenAI | Stored video ID | ## Test plan - [x] Image generation: single, multiple, base64, Gemini Imagen format - [x] TTS: correct Content-Type for mp3/opus, default format fallback - [x] Transcription: simple JSON and verbose_json with words/segments - [x] Video: create + status check, processing state, 404 for unknown ID - [x] X-Test-Id isolation for image endpoint - [x] Endpoint cross-matching prevention (image vs chat) - [x] Convenience methods (onImage, onSpeech, onTranscription, onVideo) - [x] Backfill: `endpoint: "chat"` and `endpoint: "embedding"` fixtures match existing handlers - [x] Full suite: 2216 tests pass, 0 failures
2 parents 74fb890 + a76ea32 commit 752966a

34 files changed

Lines changed: 3388 additions & 12 deletions

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,18 @@
11
# @copilotkit/aimock
22

3+
## 1.12.0
4+
5+
### Minor Changes
6+
7+
- Multimedia endpoint support: image generation (OpenAI DALL-E + Gemini Imagen), text-to-speech, audio transcription, and video generation with async polling (#101)
8+
- `match.endpoint` field for fixture isolation — prevents cross-matching between chat, image, speech, transcription, video, and embedding fixtures (#101)
9+
- Bidirectional endpoint filtering — generic fixtures only match compatible endpoint types (#101)
10+
- Convenience methods: `onImage`, `onSpeech`, `onTranscription`, `onVideo` (#101)
11+
- Record & replay for all multimedia endpoints — proxy to real APIs, save fixtures with correct format/type detection (#101)
12+
- `_endpointType` explicit field on `ChatCompletionRequest` for type safety (#101)
13+
- Comparison matrix and drift detection rules updated for multimedia (#101)
14+
- 54 new tests (32 integration, 11 record/replay, 12 type/routing)
15+
316
## 1.11.0
417

518
### Minor Changes

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
https://github.com/user-attachments/assets/646bf106-0320-41f2-a9b1-5090454830f3
44

5-
Mock infrastructure for AI application testing — LLM APIs, MCP tools, A2A agents, AG-UI event streams, vector databases, search, rerank, and moderation. One package, one port, zero dependencies.
5+
Mock infrastructure for AI application testing — LLM APIs, image generation, text-to-speech, transcription, video generation, MCP tools, A2A agents, AG-UI event streams, vector databases, search, rerank, and moderation. One package, one port, zero dependencies.
66

77
## Quick Start
88

@@ -43,6 +43,7 @@ Run them all on one port with `npx aimock --config aimock.json`, or use the prog
4343

4444
- **[Record & Replay](https://aimock.copilotkit.dev/record-replay)** — Proxy real APIs, save as fixtures, replay deterministically forever
4545
- **[11 LLM Providers](https://aimock.copilotkit.dev/docs)** — OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, Cohere — full streaming support
46+
- **[Multimedia APIs](https://aimock.copilotkit.dev/images)** — Image generation (DALL-E, Imagen), text-to-speech, audio transcription, video generation
4647
- **[MCP / A2A / AG-UI / Vector](https://aimock.copilotkit.dev/mcp-mock)** — Mock every protocol your AI agents use
4748
- **[Chaos Testing](https://aimock.copilotkit.dev/chaos-testing)** — 500 errors, malformed JSON, mid-stream disconnects at any probability
4849
- **[Drift Detection](https://aimock.copilotkit.dev/drift-detection)** — Daily CI validation against real APIs

charts/aimock/Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@ name: aimock
33
description: Mock infrastructure for AI application testing (OpenAI, Anthropic, Gemini, MCP, A2A, vector)
44
type: application
55
version: 0.1.0
6-
appVersion: "1.11.0"
6+
appVersion: "1.12.0"

docs/fixtures/index.html

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,26 @@ <h2>Response Types</h2>
162162
<td>embedding[]</td>
163163
<td>Vector of numbers</td>
164164
</tr>
165+
<tr>
166+
<td>Image</td>
167+
<td>image.url or images[].url</td>
168+
<td>Generated image URL(s) or base64 data</td>
169+
</tr>
170+
<tr>
171+
<td>Speech</td>
172+
<td>audio</td>
173+
<td>Base64-encoded audio data</td>
174+
</tr>
175+
<tr>
176+
<td>Transcription</td>
177+
<td>transcription.text, words?, segments?</td>
178+
<td>Transcribed text with optional timestamps</td>
179+
</tr>
180+
<tr>
181+
<td>Video</td>
182+
<td>video.url, video.duration?</td>
183+
<td>Generated video URL with async polling</td>
184+
</tr>
165185
</tbody>
166186
</table>
167187

@@ -239,6 +259,10 @@ <h3>Programmatically</h3>
239259
<span class="op">mock</span>.<span class="fn">onMessage</span>(<span class="str">"hello"</span>, { <span class="prop">content</span>: <span class="str">"Hi!"</span> });
240260
<span class="op">mock</span>.<span class="fn">onToolCall</span>(<span class="str">"get_weather"</span>, { <span class="prop">content</span>: <span class="str">"72F"</span> });
241261
<span class="op">mock</span>.<span class="fn">onEmbedding</span>(<span class="str">"my text"</span>, { <span class="prop">embedding</span>: [<span class="num">0.1</span>, <span class="num">0.2</span>] });
262+
<span class="op">mock</span>.<span class="fn">onImage</span>(<span class="str">"sunset"</span>, { <span class="prop">image</span>: { <span class="prop">url</span>: <span class="str">"https://example.com/sunset.png"</span> } });
263+
<span class="op">mock</span>.<span class="fn">onSpeech</span>(<span class="str">"hello"</span>, { <span class="prop">audio</span>: <span class="str">"SGVsbG8="</span> });
264+
<span class="op">mock</span>.<span class="fn">onTranscription</span>(<span class="str">"audio.mp3"</span>, { <span class="prop">transcription</span>: { <span class="prop">text</span>: <span class="str">"Hello"</span> } });
265+
<span class="op">mock</span>.<span class="fn">onVideo</span>(<span class="str">"cats"</span>, { <span class="prop">video</span>: { <span class="prop">url</span>: <span class="str">"https://example.com/cats.mp4"</span> } });
242266
<span class="op">mock</span>.<span class="fn">onJsonOutput</span>(<span class="str">"data"</span>, { <span class="prop">key</span>: <span class="str">"value"</span> });
243267
<span class="op">mock</span>.<span class="fn">onToolResult</span>(<span class="str">"call_123"</span>, { <span class="prop">content</span>: <span class="str">"Done"</span> });
244268

0 commit comments

Comments
 (0)