Problem
When using adk web with native-audio models (e.g., gemini-live-2.5-flash-native-audio), requesting TEXT modality causes an error. Native-audio models only support AUDIO modality.
Root Cause
In src/google/adk/cli/adk_web_server.py, the /run_live WebSocket endpoint accepts modalities from query parameters without validating them against the model's capabilities:
# Lines 1636-1638
modalities: List[Literal["TEXT", "AUDIO"]] = Query(
default=["AUDIO"]
), # Only allows "TEXT" or "AUDIO"
# Line 1655
run_config = RunConfig(response_modalities=modalities)
The code at runners.py:985-988 only sets a default when response_modalities is None, but doesn't validate when it's explicitly set to TEXT for native-audio models.
Proposed Solution
- Detect native-audio models by checking if the model name contains "native-audio"
- Force AUDIO modality for these models instead of returning an error
- Ensure output_audio_transcription is enabled so users can see text transcription of the audio response
Implementation
In adk_web_server.py around line 1653-1655:
async def forward_events():
runner = await self.get_runner_async(app_name)
# Check if agent uses a native-audio model
agent_or_app = self.agent_loader.load_agent(app_name)
root_agent = self._get_root_agent(agent_or_app)
model_name = root_agent.model if isinstance(root_agent.model, str) else ""
# Native audio models only support AUDIO modality
if "native-audio" in model_name:
effective_modalities = ["AUDIO"]
else:
effective_modalities = modalities
run_config = RunConfig(response_modalities=effective_modalities)
Additional Issue: Transcription not displayed in UI
When using AUDIO modality, output_audio_transcription is enabled by default in RunConfig, and transcription events are created by TranscriptionManager. However, the frontend does not currently render outputTranscription.text - it only displays content.parts[].text.
This means users won't see any text when using native-audio models, even though transcription data is available in the events.
Options to address this:
- Update the frontend to display
outputTranscription.text when present
- Convert transcription to content in the backend so existing UI can display it
Affected Files
src/google/adk/cli/adk_web_server.py - WebSocket endpoint for live sessions
src/google/adk/runners.py - Default modality handling
src/google/adk/flows/llm_flows/transcription_manager.py - Transcription event creation
Problem
When using
adk webwith native-audio models (e.g.,gemini-live-2.5-flash-native-audio), requesting TEXT modality causes an error. Native-audio models only support AUDIO modality.Root Cause
In
src/google/adk/cli/adk_web_server.py, the/run_liveWebSocket endpoint accepts modalities from query parameters without validating them against the model's capabilities:The code at
runners.py:985-988only sets a default whenresponse_modalitiesisNone, but doesn't validate when it's explicitly set to TEXT for native-audio models.Proposed Solution
Implementation
In
adk_web_server.pyaround line 1653-1655:Additional Issue: Transcription not displayed in UI
When using AUDIO modality,
output_audio_transcriptionis enabled by default inRunConfig, and transcription events are created byTranscriptionManager. However, the frontend does not currently renderoutputTranscription.text- it only displayscontent.parts[].text.This means users won't see any text when using native-audio models, even though transcription data is available in the events.
Options to address this:
outputTranscription.textwhen presentAffected Files
src/google/adk/cli/adk_web_server.py- WebSocket endpoint for live sessionssrc/google/adk/runners.py- Default modality handlingsrc/google/adk/flows/llm_flows/transcription_manager.py- Transcription event creation