|
1 | | -# Agora Conversational AI Python SDK |
2 | | - |
3 | | -[](https://buildwithfern.com?utm_source=github&utm_medium=github&utm_campaign=readme&utm_source=https%3A%2F%2Fgithub.com%2FAgoraIO-Conversational-AI%2Fagora-agent-python-sdk) |
4 | | -[](https://pypi.python.org/pypi/agora-agent-sdk) |
5 | | - |
6 | | -The Agora Conversational AI SDK provides convenient access to the Agora Conversational AI APIs, enabling you to build voice-powered AI agents with support for both **cascading flows** (ASR → LLM → TTS) and **multimodal flows** (MLLM) for real-time audio processing. |
| 1 | +# Agoraio Python Library |
| 2 | + |
| 3 | +[](https://buildwithfern.com?utm_source=github&utm_medium=github&utm_campaign=readme&utm_source=https%3A%2F%2Fgithub.com%2FAgoraIO-Conversational-AI%2Fagent-server-sdk-python) |
| 4 | +[](https://pypi.python.org/pypi/agora-agent-server-sdk) |
| 5 | + |
| 6 | +The Agora Conversational AI SDK provides convenient access to the Agora Conversational AI APIs, |
| 7 | +enabling you to build voice-powered AI agents with support for both cascading flows (ASR -> LLM -> TTS) |
| 8 | +and multimodal flows (MLLM) for real-time audio processing. |
| 9 | + |
| 10 | + |
| 11 | +## Table of Contents |
| 12 | + |
| 13 | +- [Installation](#installation) |
| 14 | +- [Quick Start](#quick-start) |
| 15 | +- [Documentation](#documentation) |
| 16 | +- [Reference](#reference) |
| 17 | +- [Mllm Flow Multimodal](#mllm-flow-multimodal) |
| 18 | +- [Usage](#usage) |
| 19 | +- [Async Client](#async-client) |
| 20 | +- [Exception Handling](#exception-handling) |
| 21 | +- [Pagination](#pagination) |
| 22 | +- [Advanced](#advanced) |
| 23 | + - [Access Raw Response Data](#access-raw-response-data) |
| 24 | + - [Retries](#retries) |
| 25 | + - [Timeouts](#timeouts) |
| 26 | + - [Custom Client](#custom-client) |
| 27 | +- [Contributing](#contributing) |
7 | 28 |
|
8 | 29 | ## Installation |
9 | 30 |
|
10 | 31 | ```sh |
11 | | -pip install agora-agent-sdk |
| 32 | +pip install agora-agent-server-sdk |
12 | 33 | ``` |
13 | 34 |
|
14 | 35 | ## Quick Start |
@@ -122,20 +143,323 @@ session = agent.create_session( |
122 | 143 |
|
123 | 144 | ## Documentation |
124 | 145 |
|
125 | | -| Topic | Link | |
126 | | -| ------------------ | -------------------------------------------------------------------------------- | |
127 | | -| **API docs** | [docs.agora.io](https://docs.agora.io/en/conversational-ai/overview) | |
128 | | -| **Installation** | [docs/getting-started/installation.md](docs/getting-started/installation.md) | |
129 | | -| **Authentication** | [docs/getting-started/authentication.md](docs/getting-started/authentication.md) | |
130 | | -| **Quick Start** | [docs/getting-started/quick-start.md](docs/getting-started/quick-start.md) | |
131 | | -| **Cascading flow** | [docs/guides/cascading-flow.md](docs/guides/cascading-flow.md) | |
132 | | -| **MLLM flow** | [docs/guides/mllm-flow.md](docs/guides/mllm-flow.md) | |
133 | | -| **Low-level API** | [docs/guides/low-level-api.md](docs/guides/low-level-api.md) | |
134 | | -| **Error handling** | [docs/guides/error-handling.md](docs/guides/error-handling.md) | |
135 | | -| **Pagination** | [docs/guides/pagination.md](docs/guides/pagination.md) | |
136 | | -| **Advanced** | [docs/guides/advanced.md](docs/guides/advanced.md) | |
137 | | -| **API reference** | [reference.md](reference.md) | |
| 146 | +API reference documentation is available [here](https://docs.agora.io/en/conversational-ai/overview). |
| 147 | + |
| 148 | +## Reference |
| 149 | + |
| 150 | +A full reference for this library is available [here](https://github.com/AgoraIO-Conversational-AI/agent-server-sdk-python/blob/HEAD/./reference.md). |
| 151 | + |
| 152 | +## MLLM Flow (Multimodal) |
| 153 | + |
| 154 | +For real-time audio processing using OpenAI's Realtime API or Google Gemini Live, use the MLLM (Multimodal Large Language Model) flow instead of the cascading ASR -> LLM -> TTS flow. See the [MLLM Overview](https://docs.agora.io/en/conversational-ai/models/mllm/overview) for more details. |
| 155 | + |
| 156 | +```python |
| 157 | +from agora-agent-server-sdk import Agora |
| 158 | +from agora-agent-server-sdk.agents import ( |
| 159 | + StartAgentsRequestProperties, |
| 160 | + StartAgentsRequestPropertiesAdvancedFeatures, |
| 161 | + StartAgentsRequestPropertiesMllm, |
| 162 | + StartAgentsRequestPropertiesMllmVendor, |
| 163 | + StartAgentsRequestPropertiesTts, |
| 164 | + StartAgentsRequestPropertiesTtsVendor, |
| 165 | + StartAgentsRequestPropertiesLlm, |
| 166 | + StartAgentsRequestPropertiesTurnDetection, |
| 167 | + StartAgentsRequestPropertiesTurnDetectionType, |
| 168 | +) |
| 169 | + |
| 170 | +client = Agora( |
| 171 | + customer_id="YOUR_CUSTOMER_ID", |
| 172 | + customer_secret="YOUR_CUSTOMER_SECRET", |
| 173 | +) |
| 174 | + |
| 175 | +client.agents.start( |
| 176 | + appid="your_app_id", |
| 177 | + name="mllm_agent", |
| 178 | + properties=StartAgentsRequestProperties( |
| 179 | + channel="channel_name", |
| 180 | + token="your_token", |
| 181 | + agent_rtc_uid="1001", |
| 182 | + remote_rtc_uids=["1002"], |
| 183 | + idle_timeout=120, |
| 184 | + advanced_features=StartAgentsRequestPropertiesAdvancedFeatures( |
| 185 | + enable_mllm=True, |
| 186 | + ), |
| 187 | + mllm=StartAgentsRequestPropertiesMllm( |
| 188 | + url="wss://api.openai.com/v1/realtime", |
| 189 | + api_key="<your_openai_api_key>", |
| 190 | + vendor=StartAgentsRequestPropertiesMllmVendor.OPENAI, |
| 191 | + params={ |
| 192 | + "model": "gpt-4o-realtime-preview", |
| 193 | + "voice": "alloy", |
| 194 | + }, |
| 195 | + input_modalities=["audio"], |
| 196 | + output_modalities=["text", "audio"], |
| 197 | + greeting_message="Hello! I'm ready to chat in real-time.", |
| 198 | + ), |
| 199 | + turn_detection=StartAgentsRequestPropertiesTurnDetection( |
| 200 | + type=StartAgentsRequestPropertiesTurnDetectionType.SERVER_VAD, |
| 201 | + threshold=0.5, |
| 202 | + silence_duration_ms=500, |
| 203 | + ), |
| 204 | + # TTS and LLM are still required but not used when MLLM is enabled |
| 205 | + tts=StartAgentsRequestPropertiesTts( |
| 206 | + vendor=StartAgentsRequestPropertiesTtsVendor.MICROSOFT, |
| 207 | + params={}, |
| 208 | + ), |
| 209 | + llm=StartAgentsRequestPropertiesLlm( |
| 210 | + url="https://api.openai.com/v1/chat/completions", |
| 211 | + ), |
| 212 | + ), |
| 213 | +) |
| 214 | +``` |
| 215 | + |
| 216 | + |
| 217 | +## Usage |
| 218 | + |
| 219 | +Instantiate and use the client with the following: |
| 220 | + |
| 221 | +```python |
| 222 | +from agora_agent import Agora, MicrosoftTtsParams, Tts_Microsoft |
| 223 | +from agora_agent.agents import ( |
| 224 | + StartAgentsRequestProperties, |
| 225 | + StartAgentsRequestPropertiesAsr, |
| 226 | + StartAgentsRequestPropertiesLlm, |
| 227 | +) |
| 228 | + |
| 229 | +client = Agora( |
| 230 | + authorization="YOUR_AUTHORIZATION", |
| 231 | + username="YOUR_USERNAME", |
| 232 | + password="YOUR_PASSWORD", |
| 233 | +) |
| 234 | +client.agents.start( |
| 235 | + appid="appid", |
| 236 | + name="unique_name", |
| 237 | + properties=StartAgentsRequestProperties( |
| 238 | + channel="channel_name", |
| 239 | + token="token", |
| 240 | + agent_rtc_uid="1001", |
| 241 | + remote_rtc_uids=["1002"], |
| 242 | + idle_timeout=120, |
| 243 | + asr=StartAgentsRequestPropertiesAsr( |
| 244 | + language="en-US", |
| 245 | + ), |
| 246 | + tts=Tts_Microsoft( |
| 247 | + params=MicrosoftTtsParams( |
| 248 | + key="key", |
| 249 | + region="region", |
| 250 | + voice_name="voice_name", |
| 251 | + ), |
| 252 | + ), |
| 253 | + llm=StartAgentsRequestPropertiesLlm( |
| 254 | + url="https://api.openai.com/v1/chat/completions", |
| 255 | + api_key="<your_llm_key>", |
| 256 | + system_messages=[ |
| 257 | + {"role": "system", "content": "You are a helpful chatbot."} |
| 258 | + ], |
| 259 | + params={"model": "gpt-4o-mini"}, |
| 260 | + max_history=32, |
| 261 | + greeting_message="Hello, how can I assist you today?", |
| 262 | + failure_message="Please hold on a second.", |
| 263 | + ), |
| 264 | + ), |
| 265 | +) |
| 266 | +``` |
| 267 | + |
| 268 | +## Async Client |
| 269 | + |
| 270 | +The SDK also exports an `async` client so that you can make non-blocking calls to our API. Note that if you are constructing an Async httpx client class to pass into this client, use `httpx.AsyncClient()` instead of `httpx.Client()` (e.g. for the `httpx_client` parameter of this client). |
| 271 | + |
| 272 | +```python |
| 273 | +import asyncio |
| 274 | + |
| 275 | +from agora_agent import AsyncAgora, MicrosoftTtsParams, Tts_Microsoft |
| 276 | +from agora_agent.agents import ( |
| 277 | + StartAgentsRequestProperties, |
| 278 | + StartAgentsRequestPropertiesAsr, |
| 279 | + StartAgentsRequestPropertiesLlm, |
| 280 | +) |
| 281 | + |
| 282 | +client = AsyncAgora( |
| 283 | + authorization="YOUR_AUTHORIZATION", |
| 284 | + username="YOUR_USERNAME", |
| 285 | + password="YOUR_PASSWORD", |
| 286 | +) |
| 287 | + |
| 288 | + |
| 289 | +async def main() -> None: |
| 290 | + await client.agents.start( |
| 291 | + appid="appid", |
| 292 | + name="unique_name", |
| 293 | + properties=StartAgentsRequestProperties( |
| 294 | + channel="channel_name", |
| 295 | + token="token", |
| 296 | + agent_rtc_uid="1001", |
| 297 | + remote_rtc_uids=["1002"], |
| 298 | + idle_timeout=120, |
| 299 | + asr=StartAgentsRequestPropertiesAsr( |
| 300 | + language="en-US", |
| 301 | + ), |
| 302 | + tts=Tts_Microsoft( |
| 303 | + params=MicrosoftTtsParams( |
| 304 | + key="key", |
| 305 | + region="region", |
| 306 | + voice_name="voice_name", |
| 307 | + ), |
| 308 | + ), |
| 309 | + llm=StartAgentsRequestPropertiesLlm( |
| 310 | + url="https://api.openai.com/v1/chat/completions", |
| 311 | + api_key="<your_llm_key>", |
| 312 | + system_messages=[ |
| 313 | + {"role": "system", "content": "You are a helpful chatbot."} |
| 314 | + ], |
| 315 | + params={"model": "gpt-4o-mini"}, |
| 316 | + max_history=32, |
| 317 | + greeting_message="Hello, how can I assist you today?", |
| 318 | + failure_message="Please hold on a second.", |
| 319 | + ), |
| 320 | + ), |
| 321 | + ) |
| 322 | + |
| 323 | + |
| 324 | +asyncio.run(main()) |
| 325 | +``` |
| 326 | + |
| 327 | +## Exception Handling |
| 328 | + |
| 329 | +When the API returns a non-success status code (4xx or 5xx response), a subclass of the following error |
| 330 | +will be thrown. |
| 331 | + |
| 332 | +```python |
| 333 | +from agora_agent.core.api_error import ApiError |
| 334 | + |
| 335 | +try: |
| 336 | + client.agents.start(...) |
| 337 | +except ApiError as e: |
| 338 | + print(e.status_code) |
| 339 | + print(e.body) |
| 340 | +``` |
| 341 | + |
| 342 | +## Pagination |
| 343 | + |
| 344 | +Paginated requests will return a `SyncPager` or `AsyncPager`, which can be used as generators for the underlying object. |
| 345 | + |
| 346 | +```python |
| 347 | +from agora_agent import Agora |
| 348 | + |
| 349 | +client = Agora( |
| 350 | + authorization="YOUR_AUTHORIZATION", |
| 351 | + username="YOUR_USERNAME", |
| 352 | + password="YOUR_PASSWORD", |
| 353 | +) |
| 354 | +response = client.agents.list( |
| 355 | + appid="appid", |
| 356 | +) |
| 357 | +for item in response: |
| 358 | + yield item |
| 359 | +# alternatively, you can paginate page-by-page |
| 360 | +for page in response.iter_pages(): |
| 361 | + yield page |
| 362 | +``` |
| 363 | + |
| 364 | +```python |
| 365 | +# You can also iterate through pages and access the typed response per page |
| 366 | +pager = client.agents.list(...) |
| 367 | +for page in pager.iter_pages(): |
| 368 | + print(page.response) # access the typed response for each page |
| 369 | + for item in page: |
| 370 | + print(item) |
| 371 | +``` |
| 372 | + |
| 373 | +## Advanced |
| 374 | + |
| 375 | +### Access Raw Response Data |
| 376 | + |
| 377 | +The SDK provides access to raw response data, including headers, through the `.with_raw_response` property. |
| 378 | +The `.with_raw_response` property returns a "raw" client that can be used to access the `.headers` and `.data` attributes. |
| 379 | + |
| 380 | +```python |
| 381 | +from agora_agent import Agora |
| 382 | + |
| 383 | +client = Agora( |
| 384 | + ..., |
| 385 | +) |
| 386 | +response = client.agents.with_raw_response.start(...) |
| 387 | +print(response.headers) # access the response headers |
| 388 | +print(response.data) # access the underlying object |
| 389 | +pager = client.agents.list(...) |
| 390 | +print(pager.response) # access the typed response for the first page |
| 391 | +for item in pager: |
| 392 | + print(item) # access the underlying object(s) |
| 393 | +for page in pager.iter_pages(): |
| 394 | + print(page.response) # access the typed response for each page |
| 395 | + for item in page: |
| 396 | + print(item) # access the underlying object(s) |
| 397 | +``` |
| 398 | + |
| 399 | +### Retries |
| 400 | + |
| 401 | +The SDK is instrumented with automatic retries with exponential backoff. A request will be retried as long |
| 402 | +as the request is deemed retryable and the number of retry attempts has not grown larger than the configured |
| 403 | +retry limit (default: 2). |
| 404 | + |
| 405 | +A request is deemed retryable when any of the following HTTP status codes is returned: |
| 406 | + |
| 407 | +- [408](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408) (Timeout) |
| 408 | +- [429](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429) (Too Many Requests) |
| 409 | +- [5XX](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500) (Internal Server Errors) |
| 410 | + |
| 411 | +Use the `max_retries` request option to configure this behavior. |
| 412 | + |
| 413 | +```python |
| 414 | +client.agents.start(..., request_options={ |
| 415 | + "max_retries": 1 |
| 416 | +}) |
| 417 | +``` |
| 418 | + |
| 419 | +### Timeouts |
| 420 | + |
| 421 | +The SDK defaults to a 60 second timeout. You can configure this with a timeout option at the client or request level. |
| 422 | + |
| 423 | +```python |
| 424 | + |
| 425 | +from agora_agent import Agora |
| 426 | + |
| 427 | +client = Agora( |
| 428 | + ..., |
| 429 | + timeout=20.0, |
| 430 | +) |
| 431 | + |
| 432 | + |
| 433 | +# Override timeout for a specific method |
| 434 | +client.agents.start(..., request_options={ |
| 435 | + "timeout_in_seconds": 1 |
| 436 | +}) |
| 437 | +``` |
| 438 | + |
| 439 | +### Custom Client |
| 440 | + |
| 441 | +You can override the `httpx` client to customize it for your use-case. Some common use-cases include support for proxies |
| 442 | +and transports. |
| 443 | + |
| 444 | +```python |
| 445 | +import httpx |
| 446 | +from agora_agent import Agora |
| 447 | + |
| 448 | +client = Agora( |
| 449 | + ..., |
| 450 | + httpx_client=httpx.Client( |
| 451 | + proxy="http://my.test.proxy.example.com", |
| 452 | + transport=httpx.HTTPTransport(local_address="0.0.0.0"), |
| 453 | + ), |
| 454 | +) |
| 455 | +``` |
138 | 456 |
|
139 | 457 | ## Contributing |
140 | 458 |
|
141 | | -This library is generated programmatically. Contributions to the README and docs are welcome. For code changes, open an issue first to discuss. |
| 459 | +While we value open-source contributions to this SDK, this library is generated programmatically. |
| 460 | +Additions made directly to this library would have to be moved over to our generation code, |
| 461 | +otherwise they would be overwritten upon the next generated release. Feel free to open a PR as |
| 462 | +a proof of concept, but know that we will not be able to merge it as-is. We suggest opening |
| 463 | +an issue first to discuss with us! |
| 464 | + |
| 465 | +On the other hand, contributions to the README are always very welcome! |
0 commit comments