Skip to content

Commit 60f844d

Browse files
SDK regeneration
1 parent e37d049 commit 60f844d

6 files changed

Lines changed: 78 additions & 11 deletions

File tree

README.md

Lines changed: 70 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,21 @@
1-
# Agora Agent Server SDK for Python
1+
# Agoraio Python Library
22

33
[![fern shield](https://img.shields.io/badge/%F0%9F%8C%BF-Built%20with%20Fern-brightgreen)](https://buildwithfern.com?utm_source=github&utm_medium=github&utm_campaign=readme&utm_source=https%3A%2F%2Fgithub.com%2FAgoraIO-Conversational-AI%2Fagent-server-sdk-python)
44
[![pypi](https://img.shields.io/pypi/v/agora-agent-server-sdk)](https://pypi.python.org/pypi/agora-agent-server-sdk)
55

6-
The Agora Conversational AI SDK provides convenient access to the Agora Conversational AI APIs,
7-
enabling you to build voice-powered AI agents with support for both cascading flows (ASR -> LLM -> TTS)
6+
The Agora Conversational AI SDK provides convenient access to the Agora Conversational AI APIs,
7+
enabling you to build voice-powered AI agents with support for both cascading flows (ASR -> LLM -> TTS)
88
and multimodal flows (MLLM) for real-time audio processing.
99

10+
1011
## Table of Contents
1112

1213
- [Installation](#installation)
1314
- [Quick Start](#quick-start)
1415
- [Documentation](#documentation)
1516
- [Reference](#reference)
1617
- [Mllm Flow Multimodal](#mllm-flow-multimodal)
18+
- [Mllm Flow Multimodal](#mllm-flow-multimodal)
1719
- [Usage](#usage)
1820
- [Async Client](#async-client)
1921
- [Exception Handling](#exception-handling)
@@ -212,6 +214,71 @@ client.agents.start(
212214
)
213215
```
214216

217+
## MLLM Flow (Multimodal)
218+
219+
For real-time audio processing using OpenAI's Realtime API or Google Gemini Live, use the MLLM (Multimodal Large Language Model) flow instead of the cascading ASR -> LLM -> TTS flow. See the [MLLM Overview](https://docs.agora.io/en/conversational-ai/models/mllm/overview) for more details.
220+
221+
```python
222+
from agora-agent-server-sdk import Agora
223+
from agora-agent-server-sdk.agents import (
224+
StartAgentsRequestProperties,
225+
StartAgentsRequestPropertiesAdvancedFeatures,
226+
StartAgentsRequestPropertiesMllm,
227+
StartAgentsRequestPropertiesMllmVendor,
228+
StartAgentsRequestPropertiesTts,
229+
StartAgentsRequestPropertiesTtsVendor,
230+
StartAgentsRequestPropertiesLlm,
231+
StartAgentsRequestPropertiesTurnDetection,
232+
StartAgentsRequestPropertiesTurnDetectionType,
233+
)
234+
235+
client = Agora(
236+
customer_id="YOUR_CUSTOMER_ID",
237+
customer_secret="YOUR_CUSTOMER_SECRET",
238+
)
239+
240+
client.agents.start(
241+
appid="your_app_id",
242+
name="mllm_agent",
243+
properties=StartAgentsRequestProperties(
244+
channel="channel_name",
245+
token="your_token",
246+
agent_rtc_uid="1001",
247+
remote_rtc_uids=["1002"],
248+
idle_timeout=120,
249+
advanced_features=StartAgentsRequestPropertiesAdvancedFeatures(
250+
enable_mllm=True,
251+
),
252+
mllm=StartAgentsRequestPropertiesMllm(
253+
url="wss://api.openai.com/v1/realtime",
254+
api_key="<your_openai_api_key>",
255+
vendor=StartAgentsRequestPropertiesMllmVendor.OPENAI,
256+
params={
257+
"model": "gpt-4o-realtime-preview",
258+
"voice": "alloy",
259+
},
260+
input_modalities=["audio"],
261+
output_modalities=["text", "audio"],
262+
greeting_message="Hello! I'm ready to chat in real-time.",
263+
),
264+
turn_detection=StartAgentsRequestPropertiesTurnDetection(
265+
type=StartAgentsRequestPropertiesTurnDetectionType.SERVER_VAD,
266+
threshold=0.5,
267+
silence_duration_ms=500,
268+
),
269+
# TTS and LLM are still required but not used when MLLM is enabled
270+
tts=StartAgentsRequestPropertiesTts(
271+
vendor=StartAgentsRequestPropertiesTtsVendor.MICROSOFT,
272+
params={},
273+
),
274+
llm=StartAgentsRequestPropertiesLlm(
275+
url="https://api.openai.com/v1/chat/completions",
276+
),
277+
),
278+
)
279+
```
280+
281+
215282
## Usage
216283

217284
Instantiate and use the client with the following:

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name = "agora-agent-server-sdk"
33

44
[tool.poetry]
55
name = "agora-agent-server-sdk"
6-
version = "1.1.0"
6+
version = "1.1.1"
77
description = ""
88
readme = "README.md"
99
authors = []

src/agora_agent/agents/types/start_agents_request_properties.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -71,19 +71,19 @@ class StartAgentsRequestProperties(UncheckedBaseModel):
7171
Automatic Speech Recognition (ASR) configuration.
7272
"""
7373

74-
tts: Tts = pydantic.Field()
74+
tts: typing.Optional[Tts] = pydantic.Field(default=None)
7575
"""
7676
Text-to-speech (TTS) module configuration.
7777
"""
7878

79-
llm: StartAgentsRequestPropertiesLlm = pydantic.Field()
79+
llm: typing.Optional[StartAgentsRequestPropertiesLlm] = pydantic.Field(default=None)
8080
"""
8181
Large language model (LLM) configuration.
8282
"""
8383

8484
mllm: typing.Optional[StartAgentsRequestPropertiesMllm] = pydantic.Field(default=None)
8585
"""
86-
Multimodal Large Language Model (MLLM) configuration for real-time audio and text processing.
86+
Multimodal Large Language Model (MLLM) configuration for real-time audio and text processing. MLLM is an exclusive alternative to the standard `asr` + `llm` + `tts` pipeline.
8787
"""
8888

8989
avatar: typing.Optional[StartAgentsRequestPropertiesAvatar] = pydantic.Field(default=None)

src/agora_agent/agents/types/start_agents_request_properties_mllm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
class StartAgentsRequestPropertiesMllm(UncheckedBaseModel):
1212
"""
13-
Multimodal Large Language Model (MLLM) configuration for real-time audio and text processing.
13+
Multimodal Large Language Model (MLLM) configuration for real-time audio and text processing. MLLM is an exclusive alternative to the standard `asr` + `llm` + `tts` pipeline.
1414
"""
1515

1616
url: typing.Optional[str] = pydantic.Field(default=None)

src/agora_agent/agents/types/start_agents_request_properties_sal.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ class StartAgentsRequestPropertiesSal(UncheckedBaseModel):
2929
> - For a registered voiceprint, ensure that:
3030
> - Size: A single voiceprint file must not exceed 2 MB.
3131
> - Duration: 10 to 15 seconds, with at least 8 seconds of effective audio without silent segments.
32-
> - Format: 16kHz sampling rate, 16-bit depth, mono PCM audio file. The file name extension must be ".pcm".
32+
> - Format: 16kHz sampling rate, 16-bit depth, mono PCM audio file. The file name extension must be ".pcm".
3333
"""
3434

3535
if IS_PYDANTIC_V2:

src/agora_agent/core/client_wrapper.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,10 @@ def __init__(
2626

2727
def get_headers(self) -> typing.Dict[str, str]:
2828
headers: typing.Dict[str, str] = {
29-
"User-Agent": "agora-agent-server-sdk/1.1.0",
29+
"User-Agent": "agora-agent-server-sdk/1.1.1",
3030
"X-Fern-Language": "Python",
3131
"X-Fern-SDK-Name": "agora-agent-server-sdk",
32-
"X-Fern-SDK-Version": "1.1.0",
32+
"X-Fern-SDK-Version": "1.1.1",
3333
**(self.get_custom_headers() or {}),
3434
}
3535
headers["Authorization"] = httpx.BasicAuth(self._get_username(), self._get_password())._auth_header

0 commit comments

Comments
 (0)