Skip to content

Commit 4499329

Browse files
committed
add webrtc support to python sdk
1 parent 46a9d19 commit 4499329

11 files changed

Lines changed: 1443 additions & 0 deletions

WEBRTC_CHANGES.md

Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
# WebRTC Support Implementation for ElevenLabs Python SDK
2+
3+
This document summarizes the WebRTC support implementation added to the ElevenLabs Python SDK, following the same architecture as the JavaScript SDK.
4+
5+
## Overview
6+
7+
WebRTC support has been added to enable real-time, low-latency conversations with ElevenLabs agents using the LiveKit WebRTC infrastructure. This provides an alternative to the existing WebSocket-based connections with improved performance for real-time audio applications.
8+
9+
## Files Added
10+
11+
### Core Implementation
12+
13+
1. **`src/elevenlabs/conversational_ai/base_connection.py`**
14+
- Abstract base class for all connection types
15+
- Defines the common interface for WebSocket and WebRTC connections
16+
- Includes `ConnectionType` enum with `WEBSOCKET` and `WEBRTC` options
17+
18+
2. **`src/elevenlabs/conversational_ai/websocket_connection.py`**
19+
- WebSocket connection implementation extending `BaseConnection`
20+
- Maintains existing WebSocket functionality in the new architecture
21+
22+
3. **`src/elevenlabs/conversational_ai/webrtc_connection.py`**
23+
- WebRTC connection implementation using LiveKit Python SDK
24+
- Handles LiveKit room management, audio tracks, and data channels
25+
- Supports automatic conversation token fetching from ElevenLabs API
26+
27+
4. **`src/elevenlabs/conversational_ai/connection_factory.py`**
28+
- Factory functions for creating connections based on type
29+
- Includes logic for determining connection type based on parameters
30+
31+
5. **`src/elevenlabs/conversational_ai/webrtc_conversation.py`**
32+
- WebRTC-specific conversation class extending `BaseConversation`
33+
- Provides async interface for WebRTC conversations
34+
- Integrates with LiveKit for real-time audio streaming
35+
36+
6. **`src/elevenlabs/conversational_ai/conversation_factory.py`**
37+
- High-level factory functions for creating different conversation types
38+
- Includes convenience functions `create_webrtc_conversation()` and `create_websocket_conversation()`
39+
- Provides unified `create_conversation()` function with connection type selection
40+
41+
### Testing
42+
43+
7. **`tests/test_webrtc_conversation.py`**
44+
- Comprehensive test suite for WebRTC functionality
45+
- Tests connection type determination, factory functions, and conversation lifecycle
46+
- Includes mocked LiveKit integration tests
47+
48+
### Examples
49+
50+
8. **`examples/webrtc_conversation_example.py`**
51+
- Complete working examples of WebRTC conversation usage
52+
- Shows both explicit token and automatic token fetching approaches
53+
- Demonstrates the differences between WebSocket and WebRTC connections
54+
55+
## Files Modified
56+
57+
### Dependencies
58+
59+
1. **`pyproject.toml`**
60+
- Added `livekit = ">=0.15.0"` dependency for WebRTC support
61+
62+
### Core Conversation Module
63+
64+
2. **`src/elevenlabs/conversational_ai/conversation.py`**
65+
- Updated `ConversationInitiationData` to include `connection_type` and `conversation_token` parameters
66+
- Added imports for the new connection system
67+
- Added helper methods `_determine_connection_type()` and `_create_connection()` to `BaseConversation`
68+
69+
## Key Features
70+
71+
### Connection Types
72+
73+
- **WebSocket (existing)**: Traditional WebSocket-based connections
74+
- **WebRTC (new)**: Real-time connections using LiveKit infrastructure
75+
76+
### Authentication Methods
77+
78+
- **Agent ID**: For public agents, no additional authentication required
79+
- **Conversation Token**: For private agents, obtained from ElevenLabs API
80+
- **Automatic Token Fetching**: SDK can automatically fetch tokens when agent ID is provided
81+
82+
### API Design
83+
84+
The implementation follows the same patterns as the JavaScript SDK:
85+
86+
```python
87+
# WebRTC conversation with explicit token
88+
conversation = create_webrtc_conversation(
89+
client=client,
90+
agent_id="your-agent-id",
91+
conversation_token="your-token",
92+
audio_interface=async_audio_interface,
93+
callback_agent_response=on_response
94+
)
95+
96+
# WebRTC conversation with automatic token fetching
97+
conversation = create_webrtc_conversation(
98+
client=client,
99+
agent_id="your-agent-id", # Token will be fetched automatically
100+
audio_interface=async_audio_interface
101+
)
102+
103+
# Generic factory with connection type
104+
conversation = create_conversation(
105+
client=client,
106+
agent_id="your-agent-id",
107+
connection_type=ConnectionType.WEBRTC,
108+
audio_interface=async_audio_interface
109+
)
110+
```
111+
112+
### Backward Compatibility
113+
114+
- All existing WebSocket-based conversation code continues to work unchanged
115+
- New connection types are opt-in through explicit parameters
116+
- Default behavior remains WebSocket connections
117+
118+
## Technical Architecture
119+
120+
### Connection Hierarchy
121+
122+
```
123+
BaseConnection (abstract)
124+
├── WebSocketConnection
125+
└── WebRTCConnection (uses LiveKit)
126+
```
127+
128+
### Conversation Hierarchy
129+
130+
```
131+
BaseConversation
132+
├── Conversation (sync WebSocket)
133+
├── AsyncConversation (async WebSocket)
134+
└── WebRTCConversation (async WebRTC)
135+
```
136+
137+
### Factory Pattern
138+
139+
The implementation uses factory functions to create appropriate conversation types based on:
140+
- Explicit connection type parameter
141+
- Presence of conversation token (implies WebRTC)
142+
- Audio interface type (sync vs async)
143+
- Callback function types (sync vs async)
144+
145+
## Benefits of WebRTC Implementation
146+
147+
1. **Lower Latency**: Direct peer-to-peer audio streaming
148+
2. **Better Audio Quality**: Optimized for real-time audio
149+
3. **Reduced Server Load**: Audio doesn't go through application servers
150+
4. **Adaptive Bitrate**: Automatic quality adjustment based on network conditions
151+
5. **Better Connectivity**: NAT traversal and firewall handling
152+
153+
## Usage Examples
154+
155+
### Basic WebRTC Conversation
156+
157+
```python
158+
import asyncio
159+
from elevenlabs import ElevenLabs
160+
from elevenlabs.conversational_ai.conversation_factory import create_webrtc_conversation
161+
162+
async def main():
163+
client = ElevenLabs(api_key="your-api-key")
164+
165+
conversation = create_webrtc_conversation(
166+
client=client,
167+
agent_id="your-agent-id",
168+
audio_interface=YourAsyncAudioInterface(),
169+
)
170+
171+
await conversation.start_session()
172+
await conversation.send_user_message("Hello!")
173+
# ... conversation logic
174+
await conversation.end_session()
175+
176+
asyncio.run(main())
177+
```
178+
179+
### Connection Type Comparison
180+
181+
```python
182+
# WebSocket (existing)
183+
ws_conversation = create_conversation(
184+
client=client,
185+
agent_id="agent-id",
186+
connection_type=ConnectionType.WEBSOCKET,
187+
audio_interface=SyncAudioInterface() # Sync interface
188+
)
189+
190+
# WebRTC (new)
191+
webrtc_conversation = create_conversation(
192+
client=client,
193+
agent_id="agent-id",
194+
connection_type=ConnectionType.WEBRTC,
195+
audio_interface=AsyncAudioInterface() # Async interface required
196+
)
197+
```
198+
199+
## Testing
200+
201+
The implementation includes comprehensive tests covering:
202+
203+
- Connection type determination logic
204+
- Factory function behavior
205+
- WebRTC conversation lifecycle
206+
- Message handling
207+
- Error conditions
208+
- Token fetching
209+
210+
All tests use proper mocking to avoid external dependencies during testing.
211+
212+
## Future Considerations
213+
214+
1. **Audio Interface Implementations**: Additional concrete audio interface implementations for common use cases
215+
2. **Advanced WebRTC Features**: Support for video, screen sharing, or advanced audio processing
216+
3. **Monitoring and Analytics**: Integration with LiveKit's monitoring features
217+
4. **Connection Fallback**: Automatic fallback from WebRTC to WebSocket in case of connection issues
218+
219+
## Migration Guide
220+
221+
For users wanting to upgrade from WebSocket to WebRTC:
222+
223+
1. Install the updated SDK with `livekit` dependency
224+
2. Update audio interface to async (`AsyncAudioInterface`)
225+
3. Update callback functions to async
226+
4. Change connection type to `ConnectionType.WEBRTC`
227+
5. Provide conversation token or agent ID for authentication
228+
229+
The migration is non-breaking - existing code continues to work without changes.

0 commit comments

Comments
 (0)