|
| 1 | +# WebRTC Support Implementation for ElevenLabs Python SDK |
| 2 | + |
| 3 | +This document summarizes the WebRTC support implementation added to the ElevenLabs Python SDK, following the same architecture as the JavaScript SDK. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +WebRTC support has been added to enable real-time, low-latency conversations with ElevenLabs agents using the LiveKit WebRTC infrastructure. This provides an alternative to the existing WebSocket-based connections with improved performance for real-time audio applications. |
| 8 | + |
| 9 | +## Files Added |
| 10 | + |
| 11 | +### Core Implementation |
| 12 | + |
| 13 | +1. **`src/elevenlabs/conversational_ai/base_connection.py`** |
| 14 | + - Abstract base class for all connection types |
| 15 | + - Defines the common interface for WebSocket and WebRTC connections |
| 16 | + - Includes `ConnectionType` enum with `WEBSOCKET` and `WEBRTC` options |
| 17 | + |
| 18 | +2. **`src/elevenlabs/conversational_ai/websocket_connection.py`** |
| 19 | + - WebSocket connection implementation extending `BaseConnection` |
| 20 | + - Maintains existing WebSocket functionality in the new architecture |
| 21 | + |
| 22 | +3. **`src/elevenlabs/conversational_ai/webrtc_connection.py`** |
| 23 | + - WebRTC connection implementation using LiveKit Python SDK |
| 24 | + - Handles LiveKit room management, audio tracks, and data channels |
| 25 | + - Supports automatic conversation token fetching from ElevenLabs API |
| 26 | + |
| 27 | +4. **`src/elevenlabs/conversational_ai/connection_factory.py`** |
| 28 | + - Factory functions for creating connections based on type |
| 29 | + - Includes logic for determining connection type based on parameters |
| 30 | + |
| 31 | +5. **`src/elevenlabs/conversational_ai/webrtc_conversation.py`** |
| 32 | + - WebRTC-specific conversation class extending `BaseConversation` |
| 33 | + - Provides async interface for WebRTC conversations |
| 34 | + - Integrates with LiveKit for real-time audio streaming |
| 35 | + |
| 36 | +6. **`src/elevenlabs/conversational_ai/conversation_factory.py`** |
| 37 | + - High-level factory functions for creating different conversation types |
| 38 | + - Includes convenience functions `create_webrtc_conversation()` and `create_websocket_conversation()` |
| 39 | + - Provides unified `create_conversation()` function with connection type selection |
| 40 | + |
| 41 | +### Testing |
| 42 | + |
| 43 | +7. **`tests/test_webrtc_conversation.py`** |
| 44 | + - Comprehensive test suite for WebRTC functionality |
| 45 | + - Tests connection type determination, factory functions, and conversation lifecycle |
| 46 | + - Includes mocked LiveKit integration tests |
| 47 | + |
| 48 | +### Examples |
| 49 | + |
| 50 | +8. **`examples/webrtc_conversation_example.py`** |
| 51 | + - Complete working examples of WebRTC conversation usage |
| 52 | + - Shows both explicit token and automatic token fetching approaches |
| 53 | + - Demonstrates the differences between WebSocket and WebRTC connections |
| 54 | + |
| 55 | +## Files Modified |
| 56 | + |
| 57 | +### Dependencies |
| 58 | + |
| 59 | +1. **`pyproject.toml`** |
| 60 | + - Added `livekit = ">=0.15.0"` dependency for WebRTC support |
| 61 | + |
| 62 | +### Core Conversation Module |
| 63 | + |
| 64 | +2. **`src/elevenlabs/conversational_ai/conversation.py`** |
| 65 | + - Updated `ConversationInitiationData` to include `connection_type` and `conversation_token` parameters |
| 66 | + - Added imports for the new connection system |
| 67 | + - Added helper methods `_determine_connection_type()` and `_create_connection()` to `BaseConversation` |
| 68 | + |
| 69 | +## Key Features |
| 70 | + |
| 71 | +### Connection Types |
| 72 | + |
| 73 | +- **WebSocket (existing)**: Traditional WebSocket-based connections |
| 74 | +- **WebRTC (new)**: Real-time connections using LiveKit infrastructure |
| 75 | + |
| 76 | +### Authentication Methods |
| 77 | + |
| 78 | +- **Agent ID**: For public agents, no additional authentication required |
| 79 | +- **Conversation Token**: For private agents, obtained from ElevenLabs API |
| 80 | +- **Automatic Token Fetching**: SDK can automatically fetch tokens when agent ID is provided |
| 81 | + |
| 82 | +### API Design |
| 83 | + |
| 84 | +The implementation follows the same patterns as the JavaScript SDK: |
| 85 | + |
| 86 | +```python |
| 87 | +# WebRTC conversation with explicit token |
| 88 | +conversation = create_webrtc_conversation( |
| 89 | + client=client, |
| 90 | + agent_id="your-agent-id", |
| 91 | + conversation_token="your-token", |
| 92 | + audio_interface=async_audio_interface, |
| 93 | + callback_agent_response=on_response |
| 94 | +) |
| 95 | + |
| 96 | +# WebRTC conversation with automatic token fetching |
| 97 | +conversation = create_webrtc_conversation( |
| 98 | + client=client, |
| 99 | + agent_id="your-agent-id", # Token will be fetched automatically |
| 100 | + audio_interface=async_audio_interface |
| 101 | +) |
| 102 | + |
| 103 | +# Generic factory with connection type |
| 104 | +conversation = create_conversation( |
| 105 | + client=client, |
| 106 | + agent_id="your-agent-id", |
| 107 | + connection_type=ConnectionType.WEBRTC, |
| 108 | + audio_interface=async_audio_interface |
| 109 | +) |
| 110 | +``` |
| 111 | + |
| 112 | +### Backward Compatibility |
| 113 | + |
| 114 | +- All existing WebSocket-based conversation code continues to work unchanged |
| 115 | +- New connection types are opt-in through explicit parameters |
| 116 | +- Default behavior remains WebSocket connections |
| 117 | + |
| 118 | +## Technical Architecture |
| 119 | + |
| 120 | +### Connection Hierarchy |
| 121 | + |
| 122 | +``` |
| 123 | +BaseConnection (abstract) |
| 124 | +├── WebSocketConnection |
| 125 | +└── WebRTCConnection (uses LiveKit) |
| 126 | +``` |
| 127 | + |
| 128 | +### Conversation Hierarchy |
| 129 | + |
| 130 | +``` |
| 131 | +BaseConversation |
| 132 | +├── Conversation (sync WebSocket) |
| 133 | +├── AsyncConversation (async WebSocket) |
| 134 | +└── WebRTCConversation (async WebRTC) |
| 135 | +``` |
| 136 | + |
| 137 | +### Factory Pattern |
| 138 | + |
| 139 | +The implementation uses factory functions to create appropriate conversation types based on: |
| 140 | +- Explicit connection type parameter |
| 141 | +- Presence of conversation token (implies WebRTC) |
| 142 | +- Audio interface type (sync vs async) |
| 143 | +- Callback function types (sync vs async) |
| 144 | + |
| 145 | +## Benefits of WebRTC Implementation |
| 146 | + |
| 147 | +1. **Lower Latency**: Direct peer-to-peer audio streaming |
| 148 | +2. **Better Audio Quality**: Optimized for real-time audio |
| 149 | +3. **Reduced Server Load**: Audio doesn't go through application servers |
| 150 | +4. **Adaptive Bitrate**: Automatic quality adjustment based on network conditions |
| 151 | +5. **Better Connectivity**: NAT traversal and firewall handling |
| 152 | + |
| 153 | +## Usage Examples |
| 154 | + |
| 155 | +### Basic WebRTC Conversation |
| 156 | + |
| 157 | +```python |
| 158 | +import asyncio |
| 159 | +from elevenlabs import ElevenLabs |
| 160 | +from elevenlabs.conversational_ai.conversation_factory import create_webrtc_conversation |
| 161 | + |
| 162 | +async def main(): |
| 163 | + client = ElevenLabs(api_key="your-api-key") |
| 164 | + |
| 165 | + conversation = create_webrtc_conversation( |
| 166 | + client=client, |
| 167 | + agent_id="your-agent-id", |
| 168 | + audio_interface=YourAsyncAudioInterface(), |
| 169 | + ) |
| 170 | + |
| 171 | + await conversation.start_session() |
| 172 | + await conversation.send_user_message("Hello!") |
| 173 | + # ... conversation logic |
| 174 | + await conversation.end_session() |
| 175 | + |
| 176 | +asyncio.run(main()) |
| 177 | +``` |
| 178 | + |
| 179 | +### Connection Type Comparison |
| 180 | + |
| 181 | +```python |
| 182 | +# WebSocket (existing) |
| 183 | +ws_conversation = create_conversation( |
| 184 | + client=client, |
| 185 | + agent_id="agent-id", |
| 186 | + connection_type=ConnectionType.WEBSOCKET, |
| 187 | + audio_interface=SyncAudioInterface() # Sync interface |
| 188 | +) |
| 189 | + |
| 190 | +# WebRTC (new) |
| 191 | +webrtc_conversation = create_conversation( |
| 192 | + client=client, |
| 193 | + agent_id="agent-id", |
| 194 | + connection_type=ConnectionType.WEBRTC, |
| 195 | + audio_interface=AsyncAudioInterface() # Async interface required |
| 196 | +) |
| 197 | +``` |
| 198 | + |
| 199 | +## Testing |
| 200 | + |
| 201 | +The implementation includes comprehensive tests covering: |
| 202 | + |
| 203 | +- Connection type determination logic |
| 204 | +- Factory function behavior |
| 205 | +- WebRTC conversation lifecycle |
| 206 | +- Message handling |
| 207 | +- Error conditions |
| 208 | +- Token fetching |
| 209 | + |
| 210 | +All tests use proper mocking to avoid external dependencies during testing. |
| 211 | + |
| 212 | +## Future Considerations |
| 213 | + |
| 214 | +1. **Audio Interface Implementations**: Additional concrete audio interface implementations for common use cases |
| 215 | +2. **Advanced WebRTC Features**: Support for video, screen sharing, or advanced audio processing |
| 216 | +3. **Monitoring and Analytics**: Integration with LiveKit's monitoring features |
| 217 | +4. **Connection Fallback**: Automatic fallback from WebRTC to WebSocket in case of connection issues |
| 218 | + |
| 219 | +## Migration Guide |
| 220 | + |
| 221 | +For users wanting to upgrade from WebSocket to WebRTC: |
| 222 | + |
| 223 | +1. Install the updated SDK with `livekit` dependency |
| 224 | +2. Update audio interface to async (`AsyncAudioInterface`) |
| 225 | +3. Update callback functions to async |
| 226 | +4. Change connection type to `ConnectionType.WEBRTC` |
| 227 | +5. Provide conversation token or agent ID for authentication |
| 228 | + |
| 229 | +The migration is non-breaking - existing code continues to work without changes. |
0 commit comments