Skip to content

feat: mock non-speech audio generation (ElevenLabs, fal.ai, Gemini)#140

Open
jpr5 wants to merge 9 commits intomainfrom
blitz/audio-gen-118/integration
Open

feat: mock non-speech audio generation (ElevenLabs, fal.ai, Gemini)#140
jpr5 wants to merge 9 commits intomainfrom
blitz/audio-gen-118/integration

Conversation

@jpr5
Copy link
Copy Markdown
Contributor

@jpr5 jpr5 commented Apr 27, 2026

Summary

Adds mock support for non-speech audio generation endpoints, closing #118:

  • ElevenLabs sound effects (/v1/sound-generation) and music (/v1/music/*) endpoints with support for generation, streaming, and composition plans
  • fal.ai queue-based audio generation (/fal/queue/submit/*, /fal/queue/requests/*, /fal/run/*) with full lifecycle (submit → status → result → cancel)
  • Gemini HTTP audio via generateContent/streamGenerateContent with inlineData parts containing audio MIME types
  • Gemini WebSocket audio via the BidiGenerateContent Live API
  • Recording/replay for Gemini audio responses (both streaming SSE and non-streaming JSON)
  • Convenience methods: onAudio(), onSoundEffect(), onMusic(), onFalAudio()
  • AudioResponse broadened: audio field now supports both string (base64) and { b64Json, contentType } object form
  • Router: bidirectional endpoint filtering for audio-gen and fal-audio endpoint types
  • Stream collapse: Gemini reasoning/thought accumulation and audio inlineData extraction

Files changed (23 files, +3159/-50)

Category Files
New handlers elevenlabs-audio.ts, fal-audio.ts
Modified handlers gemini.ts, ws-gemini-live.ts, speech.ts
Infrastructure server.ts, router.ts, types.ts, helpers.ts, recorder.ts, stream-collapse.ts, llmock.ts, index.ts, fixture-loader.ts
Tests (6 new) elevenlabs-audio.test.ts, fal-audio.test.ts, gemini-audio.test.ts, gemini-audio-record.test.ts, ws-gemini-live-audio.test.ts, multimedia-types.test.ts
Docs index.html, mcp/index.html, sidebar.js

Test plan

  • 2584 tests pass (6 new test files, 36+ new tests)
  • tsc --noEmit clean
  • prettier --check clean
  • eslint clean
  • 3-round CR (7 agents per round) converged to 0 critical/high findings

Closes #118

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 27, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@copilotkit/aimock@140

commit: 0fe22fb

jpr5 added 9 commits April 27, 2026 12:13
AudioResponse.audio now accepts string (base64) or { b64Json, contentType }
object form. Adds "audio-gen" and "fal-audio" endpoint types to FixtureMatch,
RecordProviderKey, and router compatibility checks. Exports FORMAT_TO_CONTENT_TYPE
from helpers. Guards speech handler against object-form audio.
Handles AudioResponse fixtures in both non-streaming and streaming
Gemini paths by emitting inlineData parts with base64 audio. Adds
reasoning/thought accumulation and audio inlineData extraction to
collapseGeminiSSE. Fixes Cohere SSE collapse to preserve content
alongside tool calls.
Handles AudioResponse fixtures by sending inlineData frames with
turnComplete. Adds ContentWithToolCallsResponse handler that streams
text then sends tool calls. Filters thought parts from text extraction.
Pre-computes tool call IDs for wire/history consistency.
Handles /v1/sound-generation and /v1/music/* endpoints. Supports
sound-generation (text field), music compose/stream/plan (prompt field
with composition_plan fallback). Returns binary audio, chunked stream,
or JSON plan based on subType. Full journal integration with fixture
matching and proxy-and-record fallback.
Handles /fal/queue/submit, /fal/queue/requests (status/result/cancel),
and /fal/run endpoints. FalJobMap singleton with TTL and size-bound
eviction. Translates AudioResponse to fal.ai file objects with computed
sizes. Full journal integration, proxy-and-record fallback, test-ID
scoped job isolation.
…thods

Adds server routes for ElevenLabs (/v1/sound-generation, /v1/music/*),
fal.ai (/fal/queue/*, /fal/run/*) with CORS, chaos, and journal.
Adds falJobs.clear() to reset handler and LLMock.reset(). Adds
onAudio, onSoundEffect, onMusic, onFalAudio convenience methods.
Fixes nextRequestError splice race.
Detects audio inlineData in both non-streaming JSON and streaming SSE
Gemini responses, producing AudioResponse fixtures with b64Json and
contentType. Widens recorder EndpointType to include audio-gen and
fal-audio. Adds Float32Array alignment guard for embeddings.
6 new/modified test files covering ElevenLabs sound/music, fal.ai
queue lifecycle, Gemini HTTP audio (streaming + non-streaming),
Gemini WebSocket audio, audio recording/replay, and multimedia
type guard + endpoint filtering.
@jpr5 jpr5 force-pushed the blitz/audio-gen-118/integration branch from 70ace03 to 0fe22fb Compare April 27, 2026 19:14
@tombeckenham
Copy link
Copy Markdown

Just created tests with this for the elevenlabs adapter. It's looking good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mock music / non-speech audio generation (Gemini Lyria, ElevenLabs, fal)

2 participants