feat: (Agent): Finalize Bidirectional Agent class#12
Conversation
There was a problem hiding this comment.
Can we build this into Session? Then in the agent init, we can do something like:
self._session = Session()And in agent start, we can do:
self._session.start()From here, session itself can raise a "not started" error if we try to use a member method that requires activation (e.g., send_interrupt).
There was a problem hiding this comment.
Also, would we be able to rename to something like self._loop_connection to avoid overloading session.
There was a problem hiding this comment.
i think this will work -- I am going to leave this comment open to do this change once we merge in the agent class, and the bidirectional event loop so that this trivial change can be made on top. Currently both of the PRs are open and the loops are in different state.
…ntation. Will be added later when implementation is added.
|
For interface. First, I'll assume we wont have audio config as part of the interface, as it's related to audio adapter. For methods, given that we have a class, do we need a method that returns callable (e.g. |
| async with BidirectionalAgent(model=model, tools=[calculator]) as agent: | ||
| print("New BidirectionalAgent Experience") | ||
| print("Try asking: 'What is 25 times 8?' or 'Calculate the square root of 144'") | ||
| await agent.connect() |
There was a problem hiding this comment.
This looks good but it's difficult for me to really understand what consumers will really experience. Would be great to see more full-fledged examples to truly understand the devex more than a simple toy example:
- Integrated with a client UI
- One component as part of a larger system (e.g. in-vehicle assistant)
- Integrated with non-bidi agents
- Real-time interrupts
- Listening for activation phrases - "Hey Alexa"
- Remove adapter from constructor - Implement BidirectionlIO interface - Add adapter the run() method
| @@ -1,244 +1,136 @@ | |||
| """Bidirectional Agent for real-time streaming conversations. | |||
There was a problem hiding this comment.
I personally think Bidi is an acceptable shorthand and so I would suggest every package, class, method, and variable use that naming convention. Examples:
src/strands/experimental/bidi/agent/agent.pyclass BidiAgent
The benefit here is that customers have less to type. I also notice this is a convention used out in the wild. Examples:
There was a problem hiding this comment.
Side quest: Would you be able to come up with a distinguishing name for normal agents? I have been calling them converse stream agents so I can talk "bidi agent" vs "converse stream agent". It would be nice to have something shorter though. You can then share the name with the team so going forward it is easier for everyone to talk about.
There was a problem hiding this comment.
I personally don't like "bidi" shorthand in the repo for customer facing things. That said, I'd like to avoid bidirectional name in most of the stuff, unless it'll be used together with other agents/packages, and it requires some clarification. I think this only applies to bidi agent for now.
For the rest, they are under bidi namespace, and if customers want to have it more explanatory they can just use import aliases
There was a problem hiding this comment.
I think it is good to have the prefix since customers are going to be importing the components top-level:
from strands.experimental.bidi.models import GeminiBidiModel
# vs
from strands.experimental.bidirectional.models import GeminiModel- The second import is longer
- The second import has a less descriptive name that'll be used in the customer code.
- In Python, it is conventional for people to use shorthand names (e.g.,
import numpy as np). Here we just go ahead shorten on the customer's behalf. bidiis a commonly accepted shorthand for bidirectional.
For these reasons, I feel pretty strongly about utilizing the Bidi/bidi prefix.
| Unified method for sending text, audio, and image input to the model during | ||
| an active conversation session. | ||
|
|
||
| logger.debug("Conversation start - initializing connection") |
There was a problem hiding this comment.
Nit: I would recommend having an LLM adjust the formatting of the logs to follow the guidelines outlined in https://github.com/strands-agents/sdk-python/blob/main/STYLE_GUIDE.md.
There was a problem hiding this comment.
Will apply these changes in a follow-up PR
| from ..event_loop.bidirectional_event_loop import start_bidirectional_connection, stop_bidirectional_connection | ||
| from ....types.tools import ToolResult, ToolUse, AgentTool | ||
|
|
||
| from ..event_loop.bidirectional_event_loop import BidirectionalAgentLoop |
There was a problem hiding this comment.
Let's go ahead and make BidiAgentLoop private (i.e., call it _BidiAgentLoop).
There was a problem hiding this comment.
+1 It might be nice to see what we have private vs public in all bidi code
There was a problem hiding this comment.
Will do this in a follow-up commit in the PR here: #10
| @@ -1,244 +1,136 @@ | |||
| """Bidirectional Agent for real-time streaming conversations. | |||
There was a problem hiding this comment.
I personally don't like "bidi" shorthand in the repo for customer facing things. That said, I'd like to avoid bidirectional name in most of the stuff, unless it'll be used together with other agents/packages, and it requires some clarification. I think this only applies to bidi agent for now.
For the rest, they are under bidi namespace, and if customers want to have it more explanatory they can just use import aliases
| from ..event_loop.bidirectional_event_loop import start_bidirectional_connection, stop_bidirectional_connection | ||
| from ....types.tools import ToolResult, ToolUse, AgentTool | ||
|
|
||
| from ..event_loop.bidirectional_event_loop import BidirectionalAgentLoop |
There was a problem hiding this comment.
+1 It might be nice to see what we have private vs public in all bidi code
| @@ -43,7 +47,6 @@ | |||
| "BidirectionalStreamEvent", | |||
| "VoiceActivityEvent", | |||
| "UsageMetricsEvent", | |||
There was a problem hiding this comment.
I would say let's not expose the Events top level. I think for now it is good to encourage users to import from the subpackage to avoid confusion. We have both stream events and hook events and so it'll be good to help users distinguish. Also, I think it will be good to be consistent with what we expose top level for the uni agent
There was a problem hiding this comment.
So this was a discussion I had with @zastrowm
I think we want to start exposing these events. There are a couple of reasons. First TS consistency, as they will also do that.
Also it allows us to expose more complicated events that can be used in different places. For example, error event was a problem, because if you want to make the event (they were dicts) serializable, then you cannot include exception object, but using the typed dicts, we can have our cake and eat it too :D
The class below is technically still a dict (so can be send to websockets, and used with json.dumps, etc), but it also gives customers the ability to access exceptions directly, if they want to raise a new exception from this one. Essentially exposing events allows us to do this.
Additionally, we can start to expose the typed dicts in agent and multi-agent as they'd be backwards compatible: they extend dicts, they are dicts
class ErrorEvent(TypedEvent):
"""Error occurred during the session.
Stores the full Exception object as an instance attribute for debugging while
keeping the event dict JSON-serializable. The exception can be accessed via
the `error` property for re-raising or type-based error handling.
Parameters:
error: The exception that occurred.
details: Optional additional error information.
"""
def __init__(
self,
error: Exception,
details: Optional[Dict[str, Any]] = None,
):
# Store serializable data in dict (for JSON serialization)
super().__init__(
{
"type": "bidirectional_error",
"message": str(error),
"code": type(error).__name__,
"details": details,
}
)
# Store exception as instance attribute (not serialized)
self._error = error
@property
def error(self) -> Exception:
"""The original exception that occurred.
Can be used for re-raising or type-based error handling.
"""
return self._error
There was a problem hiding this comment.
Things to take into consideration:
- In exposing typed-events, I would expect all events coming out of BiDi to be typed, not a subset. Is that the case?
- For consistency, UniDi events should also be typed, not just BiDi. Open question whether that's at the same time or as a follow-up
- If we start publishing typed events, the entire class + shape should be bar-raised, not just the dict emitted. In the past we haven't cared about the concrete members (or class naming, or init members) because they weren't public apis - if we're making them public they should be bar-raised
There was a problem hiding this comment.
In exposing typed-events, I would expect all events coming out of BiDi to be typed, not a subset. Is that the case?
It should be the case, I will double check
For consistency, UniDi events should also be typed, not just BiDi. Open question whether that's at the same time or as a follow-up
Yes, I agree. I'd prefer a followup, not to overburden ourselves right before re:invent. That said, we should also do a poc to make sure such change won't break existing customers. I'd expect so, but it's better to make sure
If we start publishing typed events, the entire class + shape should be bar-raised, not just the dict emitted. In the past we haven't cared about the concrete members (or class naming, or init members) because they weren't public apis - if we're making them public they should be bar-raised
That makes sense. I will work on a doc for it to show what are the events/types we have
| @@ -3,6 +3,9 @@ | |||
| # Main components - Primary user interface | |||
There was a problem hiding this comment.
Follow up: Per discussion, we are Rachit is going to apply the Bidi/bidi prefix in a follow PR.
| @@ -1,244 +1,136 @@ | |||
| """Bidirectional Agent for real-time streaming conversations. | |||
There was a problem hiding this comment.
I think it is good to have the prefix since customers are going to be importing the components top-level:
from strands.experimental.bidi.models import GeminiBidiModel
# vs
from strands.experimental.bidirectional.models import GeminiModel- The second import is longer
- The second import has a less descriptive name that'll be used in the customer code.
- In Python, it is conventional for people to use shorthand names (e.g.,
import numpy as np). Here we just go ahead shorten on the customer's behalf. bidiis a commonly accepted shorthand for bidirectional.
For these reasons, I feel pretty strongly about utilizing the Bidi/bidi prefix.
| model: BidirectionalModel, | ||
| tools: list | None = None, | ||
| model: BidirectionalModel| str | None = None, | ||
| tools: list[str| AgentTool| ToolProvider]| None = None, |
There was a problem hiding this comment.
Nit: list[...]| vs list[...] |
| Args: | ||
| sender: Async callable that sends events to the client (e.g., websocket.send_json). | ||
| receiver: Async callable that receives events from the client (e.g., websocket.receive_json). | ||
| async def run(self, io_channels: list[BidiIO | tuple[Callable, Callable]]) -> None: |
There was a problem hiding this comment.
Follow up: Let's try to define a more specific Callable type for the tuple.
| if not io_channels: | ||
| raise ValueError("io_channels parameter cannot be empty. Provide either an IO channel or (sender, receiver) tuple.") | ||
|
|
||
| transport = io_channels[0] |
There was a problem hiding this comment.
We can do a for loop correct? Also, let's use io or io_channel every where in place of transport and adapter.
|
|
||
| async def _run_with_transport( | ||
| self, | ||
| transport: BidiIO | tuple[Callable, Callable], |
There was a problem hiding this comment.
Let's convert the tuple to a BidiIO so that we don't need to build conditions around the type in this method.
| - input_channels (int): Input channels (default: 1) | ||
| - output_channels (int): Output channels (default: 1) | ||
| """ | ||
| if pyaudio is None: |
There was a problem hiding this comment.
Looks like we don't need this check anymore since we import pyaudio top-level.
There was a problem hiding this comment.
you're right -- will remove
Summary
This PR introduces a clean agent-driven architecture for bidirectional streaming, transforming complex manual PyAudio coordination into simple, reusable patterns. The architecture separates core agent logic from hardware IO channels, with the agent as the primary interface managing IO channel lifecycle and enabling extensible multi-modal capabilities.
BidirectionalIO Interface:
Protocol:
Audio Implementation:
Purpose:
Pure hardware abstraction layer - bridges PyAudio (microphone/speakers) ↔ BidirectionalAgent
Before: (200+ lines)
After: (2-3 lines)
Primary Pattern: Agent-Driven with IO Channels (Recommended)
Context Manager Pattern: Guaranteed Cleanup
Custom Transport Pattern: Maximum Flexibility
Agent-Driven Design:
run()method, not constructorAPI Simplifications:
input_channel()andoutput_channel()methodsBidirectionalInputfor cleaner signatures with modern|syntaxFuture Roadmap
This agent-driven architecture with BidirectionalIO abstraction enables: