Skip to content

feat: purpose interface change to bidi#11

Closed
JackYPCOnline wants to merge 2 commits into
mainfrom
interface
Closed

feat: purpose interface change to bidi#11
JackYPCOnline wants to merge 2 commits into
mainfrom
interface

Conversation

@JackYPCOnline
Copy link
Copy Markdown
Collaborator

@JackYPCOnline JackYPCOnline commented Oct 23, 2025

Description

This PR refactors the bidirectional streaming interface to provide a cleaner, more generic API that aligns with industry patterns (Google Gemini, OpenAI) while maintaining strong typing and provider flexibility.

This PR should also kickoff the bar-raise dicsussion. We should agree on the interface before making any code changes to model providers.

What have changed?

1. Unified Send Interface

Before:

await session.send_text_content("Hello")
await session.send_audio_content(audio_event)
await session.send_image_content(image_event)
await session.send_tool_result(tool_id, result)

After:

await session.send("Hello")
await session.send(AudioInputEvent(...))
await session.send(ImageInputEvent(...))
await session.send(ToolResultInputEvent(...))

Need to discuss: if we want to sperate static(image, text, etc) content vs realtime(audio for now, maybe video in future) content?

2. New Typed Events

2.1 Enhanced ImageInputEvent

Supports OpenAI/Gemini realtime API patterns:

class ImageInputEvent(TypedDict):
    image_url: Optional[str]      # Data URLs, hosted URLs, file IDs
    imageData: Optional[bytes]    # Raw bytes alternative
    mimeType: Optional[str]       # Required with imageData

Let' discuss the interface from below dimensions:

  • Flattern / two-layer model interface
    • one file or two files
  • Unify/ Seperate send functions

3. Model/Session Separation Clarified OR Combined

Model (Stateless):

  • Configuration management
  • Session factory
  • Client initilization

Session (Stateful):

  • Active connection state
  • Real-time communication
  • Event streaming

Pros:

Multiple Concurrent Sessions

# Use case: Customer service with multiple conversations
model = NovaSonicBidirectionalModel(region="us-east-1")

# Handle multiple customers simultaneously
customer1_session = await model.create_bidirectional_connection(
    system_prompt="You are helping customer 1"
)
customer2_session = await model.create_bidirectional_connection(
    system_prompt="You are helping customer 2"
)

await customer1_session.send_text_content("I need help with billing")
await customer2_session.send_text_content("I want to cancel my order")

Cons:

  1. API Complexity
  2. Session Management Burden

Unified:

User should only need implement one interface per model, provide similar user experience like general agent does:

agent = BidirectionalAgent(
    model=NovaSonic(region="us-east-1"),
    tools=[calculator_tool]
)

await agent.start()

await agent("What's 2 + 2?")

Pros:

  1. Simpler API
  2. Familiar Pattern

Cons:

  1. No Concurrent Sessions
  2. Hidden State Management

Recommdation:

Use Unified Interface with Internal Separation

  • AWS officially uses unified interface in their documentation
  • Simpler user experience for common use cases
  • Internal separation still provides clean architecture
  • Best of both worlds - simple API + robust implementation

Related Issues

Documentation PR

Type of Change

Bug fix
New feature
Breaking change
Documentation update
Other (please describe):

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

logger = logging.getLogger(__name__)


class BidirectionalModelSession(abc.ABC):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move away from the name Session as to not confuse with our SessionManager. Suggest "Connection"

Comment thread src/strands/experimental/bidirectional_streaming/models/base_session.py Outdated
metadata: Optional[Dict[str, Any]]


class ToolResultInputEvent(TypedDict):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use the existing ToolResult?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this class.

pass

@abc.abstractmethod
def _format_tools_for_provider(self, tool_specs: list[ToolSpec]) -> Any:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove private function

Comment thread src/strands/experimental/bidirectional_streaming/models/base_session.py Outdated
Comment thread src/strands/experimental/bidirectional_streaming/models/base_session.py Outdated
Comment thread src/strands/experimental/bidirectional_streaming/models/base_model.py Outdated
Comment thread src/strands/experimental/bidirectional_streaming/models/base_model.py Outdated
Comment thread src/strands/experimental/bidirectional_streaming/models/base_session.py Outdated
Comment thread src/strands/experimental/bidirectional_streaming/models/base_model.py Outdated
@JackYPCOnline JackYPCOnline requested a review from pgrayy October 29, 2025 20:39
@JackYPCOnline JackYPCOnline mentioned this pull request Oct 29, 2025
7 tasks
@mkmeral mkmeral mentioned this pull request Oct 30, 2025
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants