Skip to content

Commit b4aaf17

Browse files
Merge pull request #12 from AgoraIO-Conversational-AI/release/v1_2_0
Release/v1
2 parents 5c5a388 + a428633 commit b4aaf17

13 files changed

Lines changed: 315 additions & 107 deletions

README.md

Lines changed: 70 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,19 @@
33
[![fern shield](https://img.shields.io/badge/%F0%9F%8C%BF-Built%20with%20Fern-brightgreen)](https://buildwithfern.com?utm_source=github&utm_medium=github&utm_campaign=readme&utm_source=https%3A%2F%2Fgithub.com%2FAgoraIO-Conversational-AI%2Fagent-server-sdk-python)
44
[![pypi](https://img.shields.io/pypi/v/agent-server-sdk-python)](https://pypi.python.org/pypi/agent-server-sdk-python)
55

6-
The Agora Conversational AI SDK provides convenient access to the Agora Conversational AI APIs,
7-
enabling you to build voice-powered AI agents with support for both cascading flows (ASR -> LLM -> TTS)
6+
The Agora Conversational AI SDK provides convenient access to the Agora Conversational AI APIs,
7+
enabling you to build voice-powered AI agents with support for both cascading flows (ASR -> LLM -> TTS)
88
and multimodal flows (MLLM) for real-time audio processing.
99

10+
1011
## Table of Contents
1112

1213
- [Installation](#installation)
1314
- [Quick Start](#quick-start)
1415
- [Documentation](#documentation)
1516
- [Reference](#reference)
1617
- [Mllm Flow Multimodal](#mllm-flow-multimodal)
18+
- [Mllm Flow Multimodal](#mllm-flow-multimodal)
1719
- [Usage](#usage)
1820
- [Async Client](#async-client)
1921
- [Exception Handling](#exception-handling)
@@ -152,6 +154,71 @@ A full reference for this library is available [here](https://github.com/AgoraIO
152154

153155
For real-time audio processing using OpenAI's Realtime API or Google Gemini Live, use the MLLM (Multimodal Large Language Model) flow instead of the cascading ASR -> LLM -> TTS flow. See the [MLLM Overview](https://docs.agora.io/en/conversational-ai/models/mllm/overview) for more details.
154156

157+
```python
158+
from agora_agent import Agora, Area
159+
from agora_agent.agentkit import (
160+
AdvancedFeatures,
161+
TurnDetectionConfig,
162+
TurnDetectionTypeValues,
163+
)
164+
from agora_agent.agents import (
165+
StartAgentsRequestProperties,
166+
StartAgentsRequestPropertiesMllm,
167+
StartAgentsRequestPropertiesMllmVendor,
168+
StartAgentsRequestPropertiesTts,
169+
StartAgentsRequestPropertiesTtsVendor,
170+
StartAgentsRequestPropertiesLlm,
171+
)
172+
173+
client = Agora(
174+
area=Area.US,
175+
app_id="YOUR_APP_ID",
176+
app_certificate="YOUR_APP_CERTIFICATE",
177+
)
178+
179+
client.agents.start(
180+
client.app_id,
181+
name="mllm_agent",
182+
properties=StartAgentsRequestProperties(
183+
channel="channel_name",
184+
token="your_token",
185+
agent_rtc_uid="1001",
186+
remote_rtc_uids=["1002"],
187+
idle_timeout=120,
188+
advanced_features=AdvancedFeatures(enable_mllm=True),
189+
mllm=StartAgentsRequestPropertiesMllm(
190+
url="wss://api.openai.com/v1/realtime",
191+
api_key="<your_openai_api_key>",
192+
vendor=StartAgentsRequestPropertiesMllmVendor.OPENAI,
193+
params={
194+
"model": "gpt-4o-realtime-preview",
195+
"voice": "alloy",
196+
},
197+
input_modalities=["audio"],
198+
output_modalities=["text", "audio"],
199+
greeting_message="Hello! I'm ready to chat in real-time.",
200+
),
201+
turn_detection=TurnDetectionConfig(
202+
type=TurnDetectionTypeValues.SERVER_VAD, # deprecated; use config.end_of_speech instead
203+
threshold=0.5,
204+
silence_duration_ms=500,
205+
),
206+
# TTS and LLM are still required but not used when MLLM is enabled
207+
tts=StartAgentsRequestPropertiesTts(
208+
vendor=StartAgentsRequestPropertiesTtsVendor.MICROSOFT,
209+
params={},
210+
),
211+
llm=StartAgentsRequestPropertiesLlm(
212+
url="https://api.openai.com/v1/chat/completions",
213+
),
214+
),
215+
)
216+
```
217+
218+
## MLLM Flow (Multimodal)
219+
220+
For real-time audio processing using OpenAI's Realtime API or Google Gemini Live, use the MLLM (Multimodal Large Language Model) flow instead of the cascading ASR -> LLM -> TTS flow. See the [MLLM Overview](https://docs.agora.io/en/conversational-ai/models/mllm/overview) for more details.
221+
155222
```python
156223
from agora-agent-server-sdk import Agora
157224
from agora-agent-server-sdk.agents import (
@@ -212,6 +279,7 @@ client.agents.start(
212279
)
213280
```
214281

282+
215283
## Usage
216284

217285
Instantiate and use the client with the following:

0 commit comments

Comments
 (0)