Skip to content

fix: Improve Gemini Live model provider#25

Merged
mehtarac merged 7 commits into
mehtarac:mainfrom
mkmeral:bidi-gemini-improvements
Nov 11, 2025
Merged

fix: Improve Gemini Live model provider#25
mehtarac merged 7 commits into
mehtarac:mainfrom
mkmeral:bidi-gemini-improvements

Conversation

@mkmeral
Copy link
Copy Markdown

@mkmeral mkmeral commented Nov 10, 2025

Description

This PR fixes critical issues in the Gemini Live bidirectional streaming implementation that were preventing proper audio streaming and transcription functionality.

Key Changes

  1. Fixed "non-text parts in the response" warning

    • Changed text extraction to use message.server_content.model_turn.parts instead of deprecated message.text property
    • Reordered event handling to check audio (message.data) before text to avoid triggering SDK warnings on mixed content
    • Added support for concatenating multiple text parts from Gemini responses
  2. Updated default model to support bidirectional streaming

    • Changed default model from models/gemini-2.0-flash-live-preview-04-09 to gemini-2.5-flash-native-audio-preview-09-2025
    • New model has native support for bidirectional audio streaming
  3. Enabled transcription by default

    • Added default live_config with outputAudioTranscription and inputAudioTranscription enabled
    • User-provided configs now merge with defaults instead of replacing them, ensuring transcription remains enabled unless explicitly disabled
  4. Improved error handling

    • Changed error conversion to return ErrorEvent instead of None for better error propagation and debugging
  5. Re-enabled Gemini Live in integration tests

    • Removed temporary skip/disable of Gemini Live tests
    • Updated test configuration to use default model and config
    • Adjusted silence duration to 1.5s (Gemini has good VAD similar to OpenAI)

Issues Resolved

  • ✅ Eliminated SDK warning: "there are non-text parts in the response: ['inline_data']"
  • ✅ Fixed audio interruption handling (VAD properly detects "stop" commands)
  • ✅ Fixed default model not supporting bidirectional streaming
  • ✅ Enabled audio transcription for both input and output by default

@mkmeral mkmeral marked this pull request as draft November 10, 2025 13:08
@github-actions github-actions Bot added size/m and removed size/m labels Nov 11, 2025
@mkmeral mkmeral marked this pull request as ready for review November 11, 2025 11:42
yield provider_event
# Convert to provider-agnostic format (always returns list)
for event in self._convert_gemini_live_event(message):
yield event
Copy link
Copy Markdown
Collaborator

@pgrayy pgrayy Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to start it is okay to return the tool uses one at a time (we will execute them concurrently). But if a model supports returning multiple tool uses at once, we should give users the ability to control the execution pattern just as we do for uni agents. It may be that a user wants the tool uses processed sequentially for example.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, but I wanted to start by reusing our tool events. I think later on, we can introduce list of tool events as another event? releasing this right now is not one way door, so i'd be in favor of going forward now, and taking it as a feature request after launch.

So far, I have not seen any model return multiple tool uses, it's always different events

@github-actions github-actions Bot added size/m and removed size/m labels Nov 11, 2025
@mehtarac mehtarac merged commit 3864bc9 into mehtarac:main Nov 11, 2025
2 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants