feat(voice): pluggable voice backend with Gemini Live & Qwen Realtime by heavygee · Pull Request #692 · tiann/hapi

heavygee · 2026-05-25T18:03:46Z

Overview

Rebased and completed @Overbaker's #401 onto current `main` after it went fallow (~4 weeks). All original design and implementation credit belongs to @Overbaker and @TennyDDDD — I've only done the merge work and fixed up the test runner.

Adds a pluggable voice backend architecture extending the existing ElevenLabs integration with two new providers:

Gemini 2.5 Live (`VOICE_BACKEND=gemini-live`): Google's real-time audio WebSocket API with full function calling (`messageCodingAgent`, `processPermissionRequest`) via a hub-side proxy that handles region restrictions
Qwen Realtime (`VOICE_BACKEND=qwen-realtime`): Alibaba DashScope via hub WebSocket proxy (browser WebSocket API cannot set `Authorization` headers, so the hub proxies and injects the key server-side)
ElevenLabs remains the default — existing behaviour completely unchanged

Changes from original PR

Rebased 135 upstream commits cleanly, including scheduling, goal state, conversation outline, composer enter behaviour setting, and more
HappyComposer: kept upstream's configurable enter-behaviour setting (feat(web): add configurable enter behavior setting #586) instead of Overbaker's hard-coded Ctrl+Enter change — the upstream setting covers both behaviours and is better for all users
Test runner fix: converted `gemini/toolAdapter.test.ts` and `gemini/pcmUtils.test.ts` from `bun:test` to `vitest` — the web package uses vitest, not bun's test runner
All HAPI Bot findings from the original PR were addressed by @TennyDDDD; this rebase inherits those fixes

Configuration

```bash

Gemini Live (free tier available, full function calling)

VOICE_BACKEND=gemini-live
GEMINI_API_KEY=your-google-api-key # also accepts GOOGLE_API_KEY

Qwen Realtime (voice conversation; function calling limited by model support)

VOICE_BACKEND=qwen-realtime
DASHSCOPE_API_KEY=your-dashscope-key # also accepts QWEN_API_KEY

ElevenLabs (default — unchanged)

VOICE_BACKEND=elevenlabs
ELEVENLABS_API_KEY=your-elevenlabs-key
```

Files changed

Area	Files	Description
Shared	`shared/src/voice.ts`	Backend types, Gemini/Qwen model constants, improved system prompt with explicit tool-call priority rule, Chinese language block separated from ElevenLabs config
Hub routes	`hub/src/web/routes/voice.ts`	Backend discovery (`GET /voice/backend`), `POST /voice/gemini-token`, `POST /voice/qwen-token`
Hub server	`hub/src/web/server.ts`	Gemini/Qwen WebSocket proxy handlers with JWT auth and message queueing during upstream connect
Hub socket	`hub/src/socket/server.ts`	`maxHttpBufferSize: 55 MB` to match upload limit
Gemini session	`web/src/realtime/GeminiLiveVoiceSession.tsx`	Full Gemini Live implementation — WebSocket + AudioWorklet, serial tool calls, mobile AudioContext
Qwen session	`web/src/realtime/QwenVoiceSession.tsx`	Qwen Realtime via hub proxy, OpenAI-compatible realtime protocol
Backend selector	`web/src/realtime/VoiceBackendSession.tsx`	Dynamic backend selector with `React.lazy`, gates voice button until module is registered
Audio pipeline	`web/src/realtime/gemini/`	PCM utils, AudioWorklet recorder, 24 kHz player, Gemini function-call adapter
Integration	`web/src/components/SessionChat.tsx`	Uses `VoiceBackendSession`, gates voice toggle on backend readiness
Config	`web/tsconfig.json`	Exclude test files from TS compilation

Voice settings additions

"Start voice session with summary" toggle (Settings → Voice Assistant, default off)

Surfaced from @Overbaker's original design intent in #401: rather than silently dropping the proactive narration behaviour, it's now user-configurable.

Off (default): Voice session opens with a greeting and waits for the user to speak — matching current `main` behaviour for ElevenLabs users.
On: Voice session opens by speaking a summary of current agent activity.

Backend behaviour notes:

Gemini: Toggle fully controls session-open speech. Off sends a greeting trigger (Gemini has no built-in first message); on speaks the context summary immediately.
ElevenLabs: ElevenLabs agents have a built-in first message configured at the platform level, so the session-open greeting is always present regardless of this toggle. The toggle does affect ongoing `onReady` events during the session — whether the assistant speaks aloud when an agent finishes a task (on) or feeds that update silently as context (off).
Qwen Realtime: Untested — requires `DASHSCOPE_API_KEY`. Toggle wired identically to ElevenLabs path.

Language setting now respected by all three backends. Selecting Chinese in voice settings appends `VOICE_CHINESE_LANGUAGE_BLOCK` to the system prompt for Gemini and Qwen (previously ignored; ElevenLabs uses its own language override field).

Test plan

All 221 hub tests pass (`bun test hub/src`)
All 636 web tests pass (`bun run test` in `web/`)
TypeScript clean (`tsc --noEmit` in both `web/` and `hub/`)
PCM audio round-trip tests pass
Gemini tool adapter tests pass
ElevenLabs default flow: existing voice behaviour unchanged
Gemini Live end-to-end: voice + function calling (requires `GEMINI_API_KEY`)
Qwen Realtime end-to-end: voice via hub proxy (requires `DASHSCOPE_API_KEY`)
Mobile browsers: iOS Safari, Android Chrome (AudioContext created in user gesture)

github-actions

Findings

[Major] Gemini turn completion ignores the user mute state — when Gemini starts speaking, the recorder is force-muted, but turnComplete always calls setMuted(false). If the user had muted the mic, the next model turn re-enables the MediaStream track and can stream microphone audio while the UI still shows muted, evidence web/src/realtime/GeminiLiveVoiceSession.tsx:217.
Suggested fix:
```
state.modelSpeaking = false
state.recorder?.setMuted(state.micMuted)
```

Summary

Review mode: initial
One major issue found in the new Gemini voice backend mute handling.

Testing

Not run (automation; bun is not installed in this runner).

HAPI Bot

github-actions

Findings

[Major] Gemini restart can keep microphone input disabled after stopping during playback — cleanup() closes the old session but leaves state.modelSpeaking unchanged. If the user stops or the socket closes while Gemini is speaking, the next startSession() begins with modelSpeaking === true, and sendAudioChunk() drops every mic chunk until a page reload or a later model turn flips the flag, evidence web/src/realtime/GeminiLiveVoiceSession.tsx:60 and related guard at web/src/realtime/GeminiLiveVoiceSession.tsx:301.
Suggested fix:
```
function cleanup() {
    // existing recorder/player/ws cleanup...
    state.ws = null
    state.modelSpeaking = false
}
```

Summary

Review mode: follow-up after new commits
One major issue remains in the new Gemini Live backend lifecycle handling. The previous mute-state finding is addressed by the latest commit.

Testing

Not run (automation; bun is not installed in this runner).

HAPI Bot

github-actions

Findings

[Major] Stale Gemini close handlers can tear down the next session — ws.onclose always calls module-level cleanup(), but cleanup() operates on state.ws, not necessarily the socket that fired the event. If the user stops/restarts quickly, the old socket can close after the new socket is assigned and close the new session, leaving voice disconnected, evidence web/src/realtime/GeminiLiveVoiceSession.tsx:258.
Suggested fix:
```
ws.onclose = (event) => {
    if (state.ws !== ws) return
    cleanup()
    resetRealtimeSessionState()
    // existing setupDone/status handling...
}
```
[Major] Stale Qwen close handlers can tear down the next session — same module-level cleanup race exists in the Qwen backend, so an old socket close can reset/close a newer session after a quick reconnect, evidence web/src/realtime/QwenVoiceSession.tsx:285.
Suggested fix:
```
ws.onclose = (event) => {
    if (state.ws !== ws) return
    cleanup()
    resetRealtimeSessionState()
    // existing sessionReady/status handling...
}
```

Questions

None.

Summary

Review mode: follow-up after new commits
The previous Gemini modelSpeaking restart issue is fixed in the latest head. Two lifecycle races remain in the new WebSocket-backed voice sessions.

Testing

Not run (automation; bun is not installed in this runner).

HAPI Bot

heavygee · 2026-05-25T20:33:21Z

Follow-on: per-user backend picker in Settings

The current VOICE_BACKEND env var approach works for initial setup, but switching backends requires an SSH + restart. Now that multiple backends are viable, a settings dropdown would be much more useful.

Rough shape:

GET /api/voice/backends returns only the backends with API keys configured on the hub (e.g. ["elevenlabs", "gemini-live"] — no Qwen entry if DASHSCOPE_API_KEY is unset)
Settings page shows a Voice backend picker alongside the existing voice picker, hidden if only one backend is available
Selection stored in localStorage; passed to VoiceBackendSession to route to the right session component

This composes cleanly with the voice picker from #690 — if you switch back to ElevenLabs, the voice picker reappears with your account voices (including clones). Gemini/Qwen don't have per-voice selection yet, so that section would hide itself when a non-ElevenLabs backend is active.

Will file this as a follow-on once #690 and this PR are settled, since both touch the settings page and it makes sense to land them together.

github-actions

Findings

[Minor] Gemini Live omits the non-ElevenLabs language block — VOICE_CHINESE_LANGUAGE_BLOCK is documented as the block appended for Gemini/Qwen backends, and Qwen appends it before sending session instructions, but buildGeminiLiveConfig() still sends only VOICE_SYSTEM_PROMPT. With VOICE_BACKEND=gemini-live, the assistant will not get the Mandarin response instruction that the new Qwen path gets, evidence shared/src/voice.ts:347.
Suggested fix:

export function buildGeminiLiveConfig(): GeminiLiveConfig {
    return {
        model: GEMINI_LIVE_MODEL,
        systemInstruction: VOICE_SYSTEM_PROMPT + VOICE_CHINESE_LANGUAGE_BLOCK,
        tools: [
            {
                functionDeclarations: buildGeminiLiveFunctionDeclarations()
            }
        ],
        responseModalities: ['AUDIO']
    }
}

Questions

None.

Summary

Review mode: follow-up after new commits
The previous stale-close findings are fixed in the latest head. One Gemini/Qwen behavior mismatch remains.

Testing

Not run (automation; PR code not executed).

HAPI Bot

heavygee · 2026-05-25T21:29:49Z

Update: Qwen Realtime now fully dogfooded.

A DashScope (Alibaba Cloud Model Studio) international account has been set up and the Qwen Realtime backend is confirmed working end-to-end.

What was tested:

Live voice session on qwen3.5-omni-flash-realtime via the hub proxy
Tool calls (messageCodingAgent, processPermissionRequest) — both captured correctly
Language detection — English and Chinese (auto-detect mode)
Session setup sequence — hub-owned session.update after session.created (protocol ordering fix confirmed)
Client frame filtering — isQwenSafeClientFrame gate confirmed working
All 11 paralinguistic scenarios via the voice test harness: laugh, chuckle, whisper, barely-audible whisper, singing/humming, crying, sigh, gasp, excited, slow/deliberate, thoughtful hmm — all produce audio output

Voice availability probe: All 6 catalog voices (Tina, Cherry, Mia, Chelsie, Serena, Ethan) confirmed valid on qwen3.5-omni-flash-realtime. voicePickerCatalog.ts is correct as-shipped.

Voice test harness (scripts/voice-test/) is now in feat/voice-selection-all-backends — CLI tool that outputs WAVs per scenario, with a probe-qwen-voices.ts for voice availability checks.

WAV artefacts from today's run: ~/voice-test-output/2026-06-02-test-harness/.

heavygee · 2026-05-25T21:38:52Z

On removing VOICE_CHINESE_LANGUAGE_BLOCK from both backends: language still works correctly for users.

VOICE_CHINESE_LANGUAGE_BLOCK forced Mandarin unconditionally — the right behaviour for a Chinese-only deployment but wrong for anyone else. Both Gemini and Qwen will naturally mirror the user's spoken language without an explicit instruction: a Mandarin speaker gets Mandarin responses, an English speaker gets English. The block was over-constraining, not enabling.

ElevenLabs handles this separately via its language field in the agent config (with override support per-conversation), so it was never affected.

github-actions

Findings

[Major] Global visibility suppresses native push for unrelated sessions — the new always-on { all: true } SSE connection is also reported as visible, so PushNotificationChannel sees the namespace as visible and sends the toast to that global connection instead of falling back to Web Push. When the user is viewing session A, a ready/permission/task notification for session B is delivered only to the open tab and no native push is sent, even though the selected-session connection is not subscribed to B. Evidence web/src/App.tsx:327 and existing push fallback logic in hub/src/push/pushNotificationChannel.ts:39.
Suggested fix:
```
const isGlobalVisibilityAuthoritative = !sessionEventSubscription

useVisibilityReporter({
    api,
    subscriptionId: globalSubscriptionId,
    enabled: sseEnabled && isGlobalVisibilityAuthoritative
})

useVisibilityReporter({
    api,
    subscriptionId: sessionSubscriptionId,
    enabled: sseEnabled && Boolean(sessionEventSubscription)
})
```

Questions

None.

Summary

Review mode: follow-up after new commits
The new split-SSE wiring still keeps session-list updates live, but its visibility reporting can now swallow native notifications for sessions other than the selected one.

Testing

Not run (automation; PR code not executed). Add coverage for the split-SSE visibility case: selected session visible, notification for another session should still call Web Push rather than only delivering an in-app toast.

HAPI Bot

heavygee · 2026-05-25T21:50:00Z

On the proactive/reactive toggle (scope increase explanation)

The HAPI Bot flagged that VOICE_CHINESE_LANGUAGE_BLOCK was present in Qwen but absent from Gemini. Looking into it, the bot had the direction backwards — we'd intentionally removed it from Gemini because it was causing the assistant to respond in Chinese regardless of the user's language. The right fix was to remove it from Qwen too for consistency.

But that review led somewhere useful: digging into why the original code behaved that way, it became clear the Chinese block wasn't just a language setting — it was tied to a proactive narration style where the assistant speaks unprompted when context changes or the agent finishes a task. That's a legitimate and useful mode; it's just not the right default for everyone.

Rather than silently drop that behaviour (which was part of @Overbaker's original design intent in #401), we've surfaced it as a Proactive voice toggle in Settings, defaulting to off (reactive, matching current main behaviour). Users who want the assistant to narrate agent activity unprompted can enable it.

The toggle applies uniformly to all three backends: Gemini, Qwen, and ElevenLabs.

github-actions

Findings

[Major] Normalize upstream close codes before closing the client WebSocket — both new voice proxies forward event.code from the upstream socket. Abnormal upstream closes commonly surface as 1006, and 1005/1006/1015 are reserved codes that cannot be sent in a close frame. If clientWs.close(event.code, ...) throws, the catch swallows it after the upstream is removed, leaving the browser socket open while later messages are dropped. Evidence hub/src/web/server.ts:68.
Suggested fix:

function toClientCloseCode(code: number): number {
    return code >= 1000 && code <= 4999 && code !== 1005 && code !== 1006 && code !== 1015
        ? code
        : 1011
}

function closeClientFromUpstream(clientWs: ServerWebSocket<unknown>, event: CloseEvent): void {
    try { clientWs.close(toClientCloseCode(event.code), event.reason || 'Upstream closed') } catch { /* client gone */ }
}

upstream.onclose = (event) => {
    pendingMap.delete(clientWs)
    closeClientFromUpstream(clientWs, event)
    upstreamMap.delete(clientWs)
}

[Minor] Barrel export defeats lazy backend loading — VoiceBackendSession lazy-imports Gemini/Qwen to keep alternate backends out of the initial path, but the barrel now statically re-exports those same modules. Any @/realtime import creates static dependencies on both backend modules, so the lazy split is unreliable. Evidence web/src/realtime/index.ts:20.
Suggested fix:
```
export { RealtimeVoiceSession, type RealtimeVoiceSessionProps } from './RealtimeVoiceSession'
export { VoiceBackendSession, type VoiceBackendSessionProps } from './VoiceBackendSession'
```

Questions

None.

Summary

Review mode: follow-up after new commits
Found two issues in the final diff: upstream abnormal-close handling can leave proxied voice sockets hung, and the realtime barrel export undermines the intended lazy backend split.

Testing

Not run (automation; PR code not executed). Static fallback attempted: bun typecheck is unavailable in this runner (bun: command not found), and direct tsc -p web/tsconfig.json --noEmit / tsc -p hub/tsconfig.json --noEmit are blocked by missing workspace type packages (vite/client, bun-types, node).

HAPI Bot

github-actions

Findings

[Major] Keep the session SSE connection responsible for session catch-up — after the split, only the global SSE passes onConnect, so reconnecting while a session is open invalidates the session query and fetches latest messages from the global connection lifecycle. That global connection does not restart when selectedSessionId changes or when the session-scoped SSE reconnects, so a dropped /api/events?sessionId=... stream can reconnect at web/src/hooks/useSSE.ts:543 without running the catch-up at web/src/App.tsx:206, leaving the active chat stale until another event arrives. Evidence web/src/App.tsx:312.
Suggested fix:

const handleSessionSseConnect = useCallback(() => {
    if (!selectedSessionId || !api) return
    void Promise.all([
        queryClient.invalidateQueries({ queryKey: queryKeys.session(selectedSessionId) }),
        fetchLatestMessages(api, selectedSessionId)
    ]).catch((error) => {
        console.error('Failed to refresh session after SSE connect:', error)
    })
}, [api, queryClient, selectedSessionId])

const { subscriptionId: sessionSubscriptionId } = useSSE({
    enabled: sseEnabled && Boolean(sessionEventSubscription),
    token: token ?? '',
    baseUrl,
    subscription: sessionEventSubscription ?? undefined,
    scope: 'full',
    onConnect: handleSessionSseConnect,
    onEvent: handleSseEvent
})

[Major] Do not disable visibility reporting for the global subscription while a session is open — useVisibilityReporter only clears its retry/local refs when disabled; it does not POST hidden or unregister the SSE subscription. With this new enabled: sseEnabled && !sessionEventSubscription, the all-sessions connection stays registered as visible from its initial visibility query, and SSEManager.sendToast delivers to any visible connection in the namespace. Result: while the user is reading a session, global toast delivery can count as visible and suppress push/Telegram fallback even if the session-scoped reporter later goes hidden or fails. Evidence web/src/App.tsx:330.
Suggested fix:
```
useVisibilityReporter({
    api,
    subscriptionId: globalSubscriptionId,
    enabled: sseEnabled
})

useVisibilityReporter({
    api,
    subscriptionId: sessionSubscriptionId,
    enabled: sseEnabled && Boolean(sessionEventSubscription)
})
```

Questions

None.

Summary

Review mode: follow-up after new commits
The previous bot findings were addressed, but the SSE split introduces two regressions: active-session reconnects no longer run session catch-up, and the global SSE subscription can remain visible without a reporter while a session is open.

Testing

Not run (automation; bun is unavailable in this runner: bun: command not found).

HAPI Bot

…#694 The following files were accidentally included from work intended for a separate PR (session-list-status / SSE split subscriptions) due to that branch being submitted from main rather than its own worktree branch: - web/src/App.tsx (reverted to upstream/main) - web/src/hooks/useSSE.ts (reverted to upstream/main) - web/src/hooks/useSSE.test.ts (removed — added by other branch) - web/src/lib/appSseSubscriptions.ts (removed — added by other branch) - web/src/lib/appSseSubscriptions.test.ts (removed — added by other branch) - hub/src/sync/syncEngine.ts (reverted to upstream/main) Apologies for the noise. This PR (tiann#692) is scoped to the pluggable voice backend only. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

heavygee · 2026-05-25T22:16:53Z

Apologies for the noise on the previous push — a separate agent was working on SSE split-subscription changes (for a different PR) and accidentally submitted from main rather than its own worktree branch, which caused those files to land here. The revert commit above removes them. This PR is now clean and scoped to the pluggable voice backend only.

github-actions

Findings

[Minor] Preserve the voice language preference for Gemini/Qwen — VoiceProvider still passes the selected ElevenLabs language code into startRealtimeSession, but the new Gemini and Qwen backends never read config.language. The PR also adds VOICE_CHINESE_LANGUAGE_BLOCK specifically for Gemini/Qwen, but buildGeminiLiveConfig() is called without the language and Qwen builds instructions from VOICE_SYSTEM_PROMPT only, so selecting Chinese in settings silently works only for ElevenLabs. Evidence web/src/realtime/GeminiLiveVoiceSession.tsx:127, web/src/realtime/QwenVoiceSession.tsx:160.
Suggested fix:

// shared/src/voice.ts
export function buildGeminiLiveConfig(language?: string): GeminiLiveConfig {
    const systemInstruction = language === 'zh'
        ? `${VOICE_SYSTEM_PROMPT}${VOICE_CHINESE_LANGUAGE_BLOCK}`
        : VOICE_SYSTEM_PROMPT

    return {
        model: GEMINI_LIVE_MODEL,
        systemInstruction,
        tools: [{ functionDeclarations: buildGeminiLiveFunctionDeclarations() }],
        responseModalities: ['AUDIO']
    }
}

// web/src/realtime/GeminiLiveVoiceSession.tsx
const liveConfig = buildGeminiLiveConfig(config.language)

// web/src/realtime/QwenVoiceSession.tsx
const basePrompt = config.language === 'zh'
    ? `${VOICE_SYSTEM_PROMPT}${VOICE_CHINESE_LANGUAGE_BLOCK}`
    : VOICE_SYSTEM_PROMPT

Questions

None.

Summary

Review mode: follow-up after new commits
The previous bot findings were reviewed as context. I found one integration gap in the new voice backends: the existing voice language setting is ignored outside ElevenLabs.

Testing

Not run (automation; bun is unavailable in this runner).

HAPI Bot

heavygee · 2026-05-26T08:20:15Z

Re: SSE visibility regression (Major) — this finding is against the split-subscription code that was already reverted in commit 91e52ca. The PR no longer contains any useVisibilityReporter or SSE changes; those belonged to a separate PR and were removed. No action needed here.

Re: voice language ignored for Gemini/Qwen (Minor) — fixed in a8918fe. buildGeminiLiveConfig() now accepts an optional language parameter and appends VOICE_CHINESE_LANGUAGE_BLOCK only when language === 'zh'. Both Gemini and Qwen pass config.language through, so selecting Chinese in voice settings now works correctly for all three backends.

Also corrected the UI label for the voice-start toggle ('Proactive voice' → 'Start voice session with summary') to accurately describe what the setting controls: whether a voice session opens with a spoken summary of current agent activity, or with a greeting and a wait for the user to speak.

github-actions

Findings
No findings.

Questions
None.

Summary
Review mode: follow-up after new commits
No issues found in the current diff. The previous HAPI Bot finding around Gemini/Qwen language propagation is addressed in the latest head. Residual risk: the new Gemini/Qwen WebSocket proxy and browser audio paths were reviewed statically only.

Testing
Not run (automation; bun is unavailable in this runner).

HAPI Bot

github-actions

Findings
No findings.

Questions
None.

Summary
Review mode: follow-up after new commits
No issues found in the current diff. I reviewed the full latest PR diff, including the two commits added after the previous HAPI Bot review. Residual risk: the new Gemini/Qwen WebSocket proxy and browser audio paths were reviewed statically only.

Testing
Not run (automation; bun is unavailable in this runner).

HAPI Bot

github-actions

Findings

[Major] Qwen session.update uses a schema the upstream service rejects — the Qwen backend configures qwen3-omni-flash-realtime, but sends flattened tool definitions plus pcm audio formats. Qwen validates session.update before returning session.updated, so this can keep the Qwen voice backend from ever reaching connected. Evidence web/src/realtime/QwenVoiceSession.tsx:153, web/src/realtime/QwenVoiceSession.tsx:172.
Suggested fix:
```
const tools = VOICE_TOOL_DEFINITIONS.map((td) => ({
    type: 'function' as const,
    function: {
        name: td.name,
        description: td.description,
        parameters: td.parameters
    }
}))

// ...
input_audio_format: 'pcm16',
output_audio_format: 'pcm24',
tools,
```

[Minor] HAPI_PUBLIC_URL with a trailing slash generates a double-slash WebSocket path — https://host/ becomes wss://host//api/voice/..., but the Bun fetch handler only matches /api/voice/gemini-ws and /api/voice/qwen-ws, so the proxy upgrade misses. Evidence hub/src/web/routes/voice.ts:149, hub/src/web/routes/voice.ts:172.
Suggested fix:

function buildVoiceWsUrl(base: string, pathname: string): string {
    const url = new URL(base)
    url.protocol = url.protocol === 'https:' ? 'wss:' : 'ws:'
    url.pathname = pathname
    url.search = ''
    url.hash = ''
    return url.toString()
}

const wsProxyUrl = buildVoiceWsUrl(publicUrl, '/api/voice/gemini-ws')

Questions
None.

Summary
Review mode: follow-up after new commits
Two issues found in the current full diff. The main risk is Qwen Realtime failing during session setup; the smaller risk is proxy URL generation for common public URL formatting.

Testing
Not run (security; review prompt prohibits executing code from PR content).

HAPI Bot

github-actions

Findings

[Major] Initial muted state is applied before the recorder has a media stream, so a session restarted while muted can still open the microphone. GeminiAudioRecorder.start() assigns mediaStream only after its first await, but startAudioCapture() does not await it before calling setMuted; the same pattern exists in Qwen. Evidence web/src/realtime/GeminiLiveVoiceSession.tsx:330, related web/src/realtime/QwenVoiceSession.tsx:341.
Suggested fix:
```
async function startAudioCapture(playbackContext: AudioContext): Promise<void> {
    state.player = new GeminiAudioPlayer(playbackContext)
    state.recorder = new GeminiAudioRecorder()

    await state.recorder.start(onChunk, onError)
    state.recorder.setMuted(state.micMuted)
}

await startAudioCapture(state.playbackContext!)
state.statusCallback?.('connected')
```

[Major] Client close codes are forwarded directly to upstream providers, which can leave paid realtime sessions running. Browser/server close callbacks can report reserved codes like 1006; WebSocket.close(1006, ...) throws, the catch swallows it, and the map entry is deleted without closing the Gemini/Qwen upstream socket. Evidence hub/src/web/server.ts:95, related hub/src/web/server.ts:146.
Suggested fix:

function closeUpstream(upstream: WebSocket, code: number, reason: string) {
    const safeCode = toClientCloseCode(code)
    const safeReason = (reason || 'Client closed').slice(0, 123)
    try {
        upstream.close(safeCode, safeReason)
    } catch {
        upstream.close(1011, 'Client closed')
    }
}

closeUpstream(upstream, code, reason)

Questions
None.

Summary
Review mode: follow-up after new commits
Two issues remain in the current full diff. The main risks are privacy/user expectation around muted voice restarts and orphaned upstream realtime voice sessions after abnormal client disconnects.

Testing
Not run (automation review; no PR code executed).

HAPI Bot

github-actions

Findings

[Major] Audio capture failures after WebSocket setup can leave voice startup permanently pending. GeminiAudioRecorder.start() still performs a second getUserMedia call and builds the audio graph; after the follow-up change, await startAudioCapture(...) runs inside the async onmessage handler, but any rejection there is not caught or passed to the outer Promise reject path. The UI can remain in connecting, startRealtimeSession() never finishes, and the partially-open provider socket is not cleaned up. Same pattern in web/src/realtime/QwenVoiceSession.tsx:195. Evidence web/src/realtime/GeminiLiveVoiceSession.tsx:181.
Suggested fix:
```
try {
    await startAudioCapture(state.playbackContext!)
} catch (error) {
    const message = error instanceof Error ? error.message : "Microphone error"
    cleanup()
    state.statusCallback?.("error", message)
    reject(error instanceof Error ? error : new Error(message))
    return
}
state.statusCallback?.("connected")
```

Questions
None.

Summary
Review mode: follow-up after new commits
The previous two HAPI Bot findings are addressed, but the new awaited audio-capture path needs an explicit rejection/cleanup path in both Gemini and Qwen. Residual testing gap: add a startup failure test/mock for GeminiAudioRecorder.start() rejecting after the provider WebSocket reaches ready state.

Testing
Not run (automation review; no PR code executed).

HAPI Bot

github-actions

Findings

[Major] Socket.IO limit still rejects valid 50 MiB uploads — MAX_UPLOAD_BYTES allows 50 MiB decoded files, but the RPC payload carries that file as base64 JSON through Socket.IO. A 50 MiB file expands to about 66.7 MiB before JSON overhead, so the new 55 MiB maxHttpBufferSize still makes files above roughly 41 MiB pass the REST check and then fail/disconnect during the hub-to-CLI RPC. Evidence hub/src/socket/server.ts:69; related size check hub/src/web/routes/sessions.ts:22.
Suggested fix:
```
maxHttpBufferSize: 68 * 1024 * 1024, // 50 MiB decoded uploads are ~66.7 MiB as base64 JSON
```

Questions
None.

Summary
Review mode: follow-up after new commits
The prior HAPI Bot audio-startup finding is addressed. One upload-size regression remains in the updated Socket.IO buffer limit.

Testing
Not run (automation review; no PR code executed).

HAPI Bot

heavygee · 2026-05-26T17:12:10Z

Cold review: Gemini hub config still hits ElevenLabs token path

Severity: functional bug on non-default backends (reproduced on daily driver with VOICE_BACKEND=gemini-live).

Symptom

User taps voice with hub configured for Gemini Live (VOICE_BACKEND=gemini-live, GEMINI_API_KEY set, ElevenLabs key absent/commented). UI shows ElevenLabs API key not configured (from POST /api/voice/token).

That error string cannot come from the Gemini path — it is emitted only by the ElevenLabs token route in hub/src/web/routes/voice.ts.

What we verified (not hub config)

Live hub on daily driver:

Process env: VOICE_BACKEND=gemini-live, GEMINI_API_KEY present
GET /api/voice/backend → {"backend":"gemini-live"}
POST /api/voice/gemini-token (authed) → allowed: true, proxied wsUrl

Conclusion: hub side is correct. Web client is mounting the ElevenLabs session implementation instead of GeminiLiveVoiceSession.

Root cause (this PR, not upstream main)

Upstream main still mounts RealtimeVoiceSession directly and has no pluggable backend — this regression is in #692 only.

Smoking gun — web/src/api/voice.ts:

export async function fetchVoiceBackend(api: ApiClient): Promise<VoiceBackendResponse> {
    try {
        const result = await api.fetchVoiceBackend()
        // ...
    } catch {
        return { backend: 'elevenlabs' }
    }
}

Any failure discovering the backend silently defaults to ElevenLabs. VoiceBackendSession then renders RealtimeVoiceSession, which calls POST /api/voice/token → exactly the observed error when ELEVENLABS_API_KEY is unset.

Likely triggers for the catch (any one suffices):

Auth on discovery — GET /api/voice/backend sits behind JWT middleware (hub/src/web/middleware/auth.ts). A 401/expired token on that single request falls through to ElevenLabs even while other authed calls work.
Transient network / HTTP error — same silent fallback, no retry.
Stale backend state (secondary) — VoiceBackendSession does not reset backend when props.api changes; a prior failed discovery could leave ElevenLabs registered until remount.

Recommended fix (implement on this branch)

Remove silent catch → elevenlabs fallback. Propagate the error; surface a user-visible “could not detect voice backend” message instead of misrouting.
Consider making GET /api/voice/backend public (returns only { backend }, no secrets). Removes auth timing as a failure mode for discovery. Token/proxy routes stay authed.
VoiceBackendSession: reset backend to null when api changes; only call onReadyChange(true) after successful discovery + session registration.
Optional UX: Settings → Voice still shows ElevenLabs language labels regardless of active backend — confusing but not the crash.

Falsification test (before/after)

DevTools → Network → tap voice mic:

Broken: POST /api/voice/token without prior successful GET /api/voice/backend returning gemini-live
Fixed: GET /api/voice/backend → gemini-live, then POST /api/voice/gemini-token, WebSocket to /api/voice/gemini-ws

Scope

Layer	Owner	Action
Upstream `main`	N/A	No upstream issue — feature does not exist there
This PR (#692)	Fix here	See above

Please do not “fix” by uncommenting ELEVENLABS_API_KEY when Gemini is intended — that masks the routing bug.

Posted from tooling triage (daily driver repro). Implementing agent: treat as blocking for Gemini/Qwen backend selection.

heavygee · 2026-05-26T19:24:46Z

Fixed in b485754 + ce9b0ac.

Two commits:

b485754 — fetchVoiceBackend() no longer catches errors and defaults to 'elevenlabs'. Any network/server failure now throws so VoiceBackendSession surfaces it via onStatusChange('error', ...). Also added setBackend(null) at effect start so a stale ElevenLabs registration cannot persist when api changes.
ce9b0ac — Unrecognised backend strings (future values, typos) now throw instead of silently falling back to ElevenLabs, closing the narrow remaining form of the misrouting bug.

Falsification test: DevTools → Network → tap voice mic with VOICE_BACKEND=gemini-live: should see GET /api/voice/backend → gemini-live, then POST /api/voice/gemini-token, then WebSocket to /api/voice/gemini-ws. No call to POST /api/voice/token.

github-actions

Findings

[Major] Qwen startup never triggers the initial spoken summary/greeting — after session.updated, Qwen only starts capture, marks connected, and resolves. Unlike the Gemini path, it never sends a conversation.item.create plus response.create, so hapi-voice-proactive=true does not speak the startup summary and the reactive path does not greet before waiting. Evidence web/src/realtime/QwenVoiceSession.tsx:204.
Suggested fix:

const proactive = localStorage.getItem('hapi-voice-proactive') === 'true'
if (proactive && config.initialContext) {
    this.sendTextMessage(`[Context] ${config.initialContext}`)
} else {
    if (config.initialContext) {
        this.sendContextualUpdate(config.initialContext)
    }
    this.sendTextMessage('[Greet the user as HAPI. Say a brief hello and invite them to speak. Do not mention Qwen or any model name.]')
}

[Major] Socket.IO limit still rejects valid 50 MiB uploads — MAX_UPLOAD_BYTES validates decoded files up to 50 MiB, but the hub sends the file to the CLI as base64 JSON over Socket.IO. A 50 MiB file expands to about 66.7 MiB before JSON overhead, so the new 55 MiB limit still lets REST accept files that fail during RPC. Evidence hub/src/socket/server.ts:69; related size check hub/src/web/routes/sessions.ts:22.
Suggested fix:
```
maxHttpBufferSize: 68 * 1024 * 1024, // 50 MiB decoded uploads are ~66.7 MiB as base64 JSON
```

Questions
None.

Summary
Review mode: follow-up after new commits
The prior bot pass is still relevant for the upload-size mismatch, and the updated Qwen backend misses the startup speech path covered by the new proactive setting. No additional findings above 80% confidence.

Testing
Not run (automation review; no PR code executed).

HAPI Bot

github-actions

Findings

[Major] Qwen sends unsupported conversation.item.create messages — Qwen's client-events reference only supports conversation.item.create for function_call_output, but this implementation uses it with item.type: 'message' for startup greetings, proactive updates, and contextual updates. The follow-up change now calls sendTextMessage() immediately after session.updated, so VOICE_BACKEND=qwen-realtime can fail before the user speaks. Evidence web/src/realtime/QwenVoiceSession.tsx:336. Reference: https://docs.qwencloud.com/api-reference/real-time-multimodal/client-events#conversation-item-create
Suggested fix:

type QwenSessionConfig = {
    modalities: ['text', 'audio']
    voice: string
    input_audio_format: 'pcm16'
    output_audio_format: 'pcm24'
    instructions: string
    temperature: number
    turn_detection: {
        type: 'server_vad'
        threshold: number
        silence_duration_ms: number
        prefix_padding_ms: number
    }
    tools: Array<unknown>
    tool_choice: 'auto'
}

class QwenVoiceSessionImpl implements VoiceSession {
    private currentSessionConfig: QwenSessionConfig | null = null

    private updateInstructions(update: string): void {
        if (!this.currentSessionConfig) return
        this.currentSessionConfig = {
            ...this.currentSessionConfig,
            instructions: `${this.currentSessionConfig.instructions}\n\n[System Context Update]\n${update}`
        }
        sendEvent('session.update', { session: this.currentSessionConfig })
    }

    sendTextMessage(message: string): void {
        this.updateInstructions(message)
        sendEvent('response.create')
    }

    sendContextualUpdate(update: string): void {
        this.updateInstructions(update)
    }
}

Questions
None.

Summary
Review mode: follow-up after new commits
The earlier upload-size finding is fixed, and the Qwen startup-greeting finding was addressed structurally, but the Qwen implementation still sends unsupported user-message items to the realtime API. No other findings above 80% confidence.

Testing
Not run (automation review; no PR code executed).

HAPI Bot

github-actions

Findings
No issues found above 80% confidence.

Questions
None.

Summary
Review mode: follow-up after new commits
No actionable regressions found in the latest full diff. The previous Qwen conversation.item.create issue appears addressed; residual risk remains around provider integration coverage, especially Qwen Realtime end-to-end, which the PR test plan marks untested.

Testing
Not run (automation review; no PR code executed).

HAPI Bot

hub/voice.ts: - Replace string-concat WS URL construction with buildVoiceWsUrl() which uses URL API to set protocol/pathname cleanly — fixes double-slash when HAPI_PUBLIC_URL has a trailing slash (would silently skip the proxy route) QwenVoiceSession.tsx: - Wrap tool definitions in {type:'function', function:{...}} as required by Qwen-Omni realtime schema — previous flat shape caused session.update rejection before audio capture could start - Use pcm16/pcm24 audio formats matching DashScope spec; remove input_audio_sample_rate (encoded in format name) via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

…ose codes GeminiLiveVoiceSession + QwenVoiceSession: - startAudioCapture() is now async and awaits recorder.start() before calling setMuted() — previously setMuted ran before getUserMedia resolved so a session restarted while muted would open the mic anyway - statusCallback('connected') now fires after audio is ready - setMuted() called unconditionally (not just when true) to correctly apply saved state in either direction hub/src/web/server.ts: - Both Gemini and Qwen close() handlers now pass the client code through toClientCloseCode() before forwarding to upstream — prevents reserved codes (e.g. 1006) from causing WebSocket.close() to throw and leave the upstream session open until provider timeout - Reason string capped at 123 bytes (WebSocket protocol limit) via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

An unhandled rejection inside the async onmessage callback does not propagate to the outer startSession Promise — the UI hangs on 'connecting' and the provider socket stays partially open. Wrapping the await in try/catch calls cleanup()/statusCallback('error')/reject() so failures surface correctly. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

…alling back to ElevenLabs fetchVoiceBackend no longer catches errors and defaults to 'elevenlabs' — any network or server failure now throws so VoiceBackendSession can surface it via onStatusChange('error', ...) rather than silently mounting the wrong backend. VoiceBackendSession also resets backend state to null when api changes, so a stale ElevenLabs registration from a prior discovery cannot persist into a new session. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

…alling back to ElevenLabs Unknown backend strings (future values, typos) now throw rather than defaulting to elevenlabs, closing the narrow remaining form of the original misrouting bug. Also removes the unnecessary `as VoiceBackendResponse` cast. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

…r base64 uploads Qwen session.updated handler now sends the same proactive summary or greeting trigger that Gemini does — previously it started silently in both proactive and reactive modes. maxHttpBufferSize raised to 68 MiB to account for base64 expansion: 50 MiB decoded files become ~66.7 MiB as base64 JSON, so the previous 55 MiB ceiling would disconnect uploads above ~41 MiB before they reached the CLI. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

….update for Qwen text Qwen's realtime API only supports conversation.item.create for function_call_output. Sending it with type:'message' for greetings/context was invalid and could fail before the user spoke. sendTextMessage and sendContextualUpdate now update session instructions via session.update (accumulating context into the system prompt) and trigger response.create only when a spoken reply is needed — matching Qwen's supported client event surface. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

…n start session.updated now returns early after the first ack — subsequent session.update calls (instruction appends) also echo session.updated but must not re-trigger audio capture or the greeting path. currentSessionConfig is now reset to null at the top of startSession so a stale config from a failed previous session cannot leak into the new one. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

Without this guard, a missing wsUrl in the hub token response would silently attempt to connect directly to Google with "proxied" as the API key — producing a confusing auth failure instead of a clear error. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

DashScope realtime API accepts only 'pcm' for both input and output audio formats. The pcm16/pcm24 values caused session.update rejection before audio capture could start, leaving the Qwen backend unusable. Also updates the default voice from Mia (not in the qwen3-omni-flash- realtime voice list) to Cherry, which is documented as supported. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

Failed token fetch, microphone denial, or WebSocket error during setup left state.playbackContext open. Each failure path now calls cleanup() before throwing/rejecting, preventing AudioContext leaks on mobile browsers with hard limits on concurrent contexts. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

Reverts changes to files that shouldn't differ from upstream: - .gitignore: remove fork-only AGENTS.local.md entry - web/src/App.tsx: restore dual-subscription SSE pattern (scope-aware) - web/src/hooks/useSSE.ts: restore SSEScope/scope parameter - web/src/hooks/useSSE.test.ts: restore (was accidentally deleted) - web/src/lib/appSseSubscriptions.ts: restore (was accidentally deleted) - web/src/lib/appSseSubscriptions.test.ts: restore (was accidentally deleted) - hub/src/sync/syncEngine.ts: restore (off-topic change)

Hub sends HAPI-owned Gemini setup on proxy connect and rejects client setup frames. Qwen proxy always uses QWEN_REALTIME_MODEL instead of a client query parameter. Shared buildGeminiLiveSetupMessage() keeps wire format in one place. Co-authored-by: Cursor <cursoragent@cursor.com>

Mirror the Gemini proxy security model for Qwen: - Hub sends initial session.update (voice/tools/instructions) on upstream connect so the browser cannot override config fields. - Proxy message() now calls isQwenSafeClientFrame() and closes the connection (1008) if a client session.update touches any field other than 'instructions' (blocks tool/voice/modality overrides). - QwenVoiceSession no longer sends session.update on session.created; it waits for the hub-relayed session.updated and then sends only instruction-only updates for context/proactive content. - Language passed as query param (?language=zh) so hub builds the correct Chinese system prompt without a client-supplied session.update. - buildQwenSessionUpdateMessage() and isQwenSafeClientFrame() added to @hapi/protocol/voice; 9 new unit tests cover filter edge cases.

…ring DashScope requires session.update to be sent AFTER session.created is received, not immediately on WebSocket open. Previously the hub sent session.update in upstream.onopen, which violated this ordering and risked the config being processed in an uninitialized session context. Add pendingSetupMap to buffer the hub-owned session.update payload. The onmessage handler now relays session.created to the browser first, then immediately sends the pending session.update to DashScope — matching the protocol ordering the old browser-side code used (which waited for session.created before sending session.update). Also remove maxHttpBufferSize from the socket.io Engine config. That setting is unrelated to voice backends; upstream/main had no such limit set and it is not introduced by this PR.

…-completions) Qwen Realtime session.update expects tools as flat objects: { type: 'function', name, description, parameters } The previous code used the chat-completions shape: { type: 'function', function: { name, description, parameters } } DashScope may reject session.update or silently ignore tools with the nested shape, causing tool calls to fail at runtime. Fix applied in buildQwenSessionUpdateMessage(); test updated to assert flat shape and that no nested `function` key is present.

…service Live-tested against DashScope international API: - Model: qwen3-omni-flash-realtime → qwen3.5-omni-flash-realtime (previous model ID did not exist on DashScope) - Default voice: Cherry → Tina (confirmed from session.created response on qwen3.5-omni-flash-realtime) - Default WS base: dashscope.aliyuncs.com → dashscope-intl.aliyuncs.com (international accounts use the -intl endpoint; China endpoint rejects international API keys; QWEN_REALTIME_WS_URL env var still overrides)

Two dogfooding fixes verified against live Qwen Realtime session: sendTextMessage: switch from instruction-injection to conversation.item.create Qwen Realtime requires a user conversation item before response.create. The previous approach (updateInstructions + response.create) produced "input messages do not contain elements with role user" errors. Now sends {type:message, role:user, content:[{type:input_text}]} then response.create. sendContextualUpdate is unchanged (instruction-only, no response trigger). Language handling: replace zh-only branch with buildVoiceLanguageBlock() Previously, only language='zh' added any instruction; all other languages (including English) sent no language block, causing Qwen to drift to Chinese. buildVoiceLanguageBlock() now covers three cases: - 'zh'/'zh-*': existing Chinese block (unchanged) - explicit code ('en','es','fr',...): "Always respond in [Language]" - undefined/auto: "Detect the user's language and maintain it" Applied to buildGeminiLiveConfig, buildQwenSessionUpdateMessage, and the client-side currentInstructions mirror in QwenVoiceSession. Also removes the Gemini hub proxy's zh-only filter, which was discarding explicit language selections other than Chinese.

Hub sends its owned setup on upstream open, then waits for Google's setupComplete acknowledgment before flushing queued client frames. isGeminiSetupCompleteFrame() detects the {"setupComplete":{}} message; message() queues instead of forwarding while pendingMap is live. Addresses the repeated Major finding from bot review on PR tiann#743. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

An authenticated client could flood the queue between upstream.onopen and Google's setupComplete acknowledgment. Add pendingBytesMap tracking and close with 1009 if the budget is exceeded. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

Language selection for French, Spanish, Japanese etc. was silently dropped — only 'zh' was forwarded as a query param. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

Codes like 'no', 'da', 'fi', 'pt-br', 'bg', 'ro', 'cs', 'el', 'ms', 'tl', 'uk', 'hu', 'hr', 'sk' were falling through to raw-code prompts ("Always respond in no"). Now resolve to proper display names. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

github-actions

Findings
No findings.

Questions
None.

Summary
Review mode: follow-up after new commits
Full current diff reviewed. I did not find a high-confidence issue introduced by the added or modified lines. Residual risk: the Gemini/Qwen websocket proxy and browser audio paths still depend on provider/browser behavior that repo tests do not fully simulate. Local validation could not be run because bun is not installed in this environment.

Testing
Not run (bun not found in environment).

HAPI Bot

…or ACP Merge tiann/hapi upstream (50 commits, v0.20.0 + v0.20.1) into the liuxin fork. Notable adoptions: - POST /sessions/:id/reopen + web Reopen button (tiann#826), archive metadata preservation (tiann#825), clearSessionArchiveMetadata / restoreSessionArchiveMetadata on sessionCache - runner self-restart resilience under external supervision (tiann#814) - session export (tiann#808), scratchlist v1.1 (tiann#798), voice backend enabled (Gemini Live / Qwen Realtime, tiann#692/tiann#742) - cursor legacy→ACP auto-migration machinery (tiann#799/tiann#824/tiann#835/tiann#844) - stale queued-message ghost fix (tiann#811), composer send-error restore path, mermaid/remark-math web fixes Conflict resolutions (32 files) — fork features preserved: - shared/models.ts: adopt upstream `fable`/`fable[1m]` aliases (verified working via `claude --model fable -p`), drop our `claude-fable-5` long-form ids, keep opus-4-6/4-7 1M pins - syncEngine.resumeSession: keep fork Restart behavior (archive active session then respawn — web Restart depends on it) over upstream's return-success-if-active; keep resumeWithSessionId override, --continue fallback, auto-discovery scan, and retry-without-token; adopt upstream's cursor auto-migrate hook, never-started fresh-spawn path, and the no-set-session-config fix; adapt upstream's 11-arg spawnSession call to our 13-arg signature (sandbox + continueLatest) - sessions routes: keep fork /resume-options + /pin alongside upstream /reopen + /migrate-to-acp - sessionBase.onSessionFound: merged signature (sessionId, extras?, sessionFilePath?) serving upstream's cursor metadata extras and our transcript-path scanner callback - useSendMessage.onError: upstream's attachment-aware composer restore + fork's clearTurnLock/pauseQueue queue stop - SessionChat: keep fork InactiveSessionBanner (one-click resume + browse) over upstream's plain-text banner; adopt voice integration; keep recent-12h section + pin + compact postTokens - web/server.ts: keep 100MB body cap + options-object push routes; adopt voice WS proxies and codexDesktop routes - telegram bot: adopt upstream formatReadyNotification Validation: typecheck green (cli/web/hub); 983/984 tests pass. The one failure is runner.integration stress test (spawn/stop 20 concurrent sessions) — environmental on this loaded dev machine; not in the deploy script's focused test set. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

heavygee mentioned this pull request May 25, 2026

feat(voice): pluggable voice backend with Gemini Live & Qwen Realtime #401

Closed

10 tasks

heavygee force-pushed the feat/pluggable-voice-backend branch from 21d2417 to 9081a9b Compare May 25, 2026 18:10