You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Agora Conversational AI SDK provides convenient access to the Agora Conversational AI APIs, enabling you to build voice-powered AI agents with support for both cascading flows (ASR -> LLM -> TTS) and multimodal flows (MLLM) for real-time audio processing.
6
+
The Agora Agent Server SDK for Python lets you build real-time voice agents on Agora Conversational AI with a high-level `Agent` / `AgentSession` API and a generated low-level REST client.
Minimal builder-based example using supported preset-backed models with no vendor API keys:
20
+
The recommended onboarding path is a server-side builder flow: define the agent once, configure preset-backed providers in the builder, and let AgentKit infer the reseller `preset` values when the session starts.
21
21
22
22
```python
23
+
import os
24
+
import time
25
+
23
26
from agora_agent import Agora, Area
24
-
from agora_agent.agentkit import Agent, DeepgramSTT, OpenAI, OpenAITTS
27
+
from agora_agent.agentkit import (
28
+
Agent,
29
+
DataChannel,
30
+
DeepgramSTT,
31
+
MiniMaxTTS,
32
+
OpenAI,
33
+
expires_in_hours,
34
+
)
35
+
36
+
AGENT_PROMPT= (
37
+
"You are a concise, technically credible voice assistant. "
38
+
"Keep replies short unless the user asks for detail."
39
+
)
40
+
41
+
GREETING="Hi there! I am your Agora voice assistant. How can I help?"
instructions="You are a concise voice assistant.",
35
-
greeting="Hello! How can I help you today?",
55
+
name=f"conversation-{int(time.time())}",
56
+
instructions=AGENT_PROMPT,
57
+
greeting=GREETING,
58
+
failure_message="Please wait a moment.",
59
+
max_history=50,
60
+
turn_detection={
61
+
"config": {
62
+
"speech_threshold": 0.5,
63
+
"start_of_speech": {
64
+
"mode": "vad",
65
+
"vad_config": {
66
+
"interrupt_duration_ms": 160,
67
+
"prefix_padding_ms": 300,
68
+
},
69
+
},
70
+
"end_of_speech": {
71
+
"mode": "vad",
72
+
"vad_config": {
73
+
"silence_duration_ms": 480,
74
+
},
75
+
},
76
+
},
77
+
},
78
+
advanced_features={
79
+
"enable_rtm": True,
80
+
"enable_tools": True,
81
+
},
82
+
parameters={
83
+
"data_channel": DataChannel.RTM,
84
+
"enable_error_message": True,
85
+
},
36
86
).with_stt(
37
-
DeepgramSTT(model="nova-3")
87
+
DeepgramSTT(
88
+
model="nova-3",
89
+
language="en",
90
+
)
38
91
).with_llm(
39
-
OpenAI(model="gpt-5-mini")
92
+
OpenAI(
93
+
model="gpt-4o-mini",
94
+
greeting_message=GREETING,
95
+
failure_message="Please wait a moment.",
96
+
max_history=15,
97
+
params={
98
+
"max_tokens": 1024,
99
+
"temperature": 0.7,
100
+
"top_p": 0.95,
101
+
},
102
+
)
40
103
).with_tts(
41
-
OpenAITTS(voice="alloy")
104
+
MiniMaxTTS(
105
+
model="speech_2_6_turbo",
106
+
voice_id="English_captivating_female1",
107
+
)
42
108
)
43
109
44
110
session = agent.create_session(
45
111
client,
46
-
channel="support-room-123",
47
-
agent_uid="1",
48
-
remote_uids=["100"],
112
+
channel=f"demo-channel-{int(time.time())}",
113
+
agent_uid="123456",
114
+
remote_uids=["*"],
115
+
idle_timeout=30,
116
+
expires_in=expires_in_hours(1),
117
+
debug=False,
49
118
)
50
119
51
-
agent_id = session.start()
52
-
print(agent_id)
53
-
54
-
55
-
if__name__=="__main__":
56
-
main()
120
+
return session.start()
57
121
```
58
122
59
123
### Why no token or vendor key in the example?
60
124
61
-
The SDK-managed path is the recommended path. `Agora` generates the required ConvoAI REST auth and RTC join tokens automatically, and AgentKit infers the matching supported presets from the vendor configs when you omit vendor API keys.
125
+
`Agora` generates the required ConvoAI REST auth and RTC join tokens automatically when you provide `app_id` and `app_certificate`. AgentKit then inspects the builder-provided vendor configs and infers the matching supported `preset` values for reseller-backed models, so you do not pass vendor API keys in this flow.
126
+
127
+
### BYOK version of the same builder flow
128
+
129
+
Use the same `Agent` builder shape, but provide credentials explicitly when you want vendor-managed billing and routing instead of Agora-managed presets.
130
+
131
+
```python
132
+
agent = Agent(
133
+
instructions=AGENT_PROMPT,
134
+
greeting=GREETING,
135
+
).with_stt(
136
+
DeepgramSTT(
137
+
api_key=os.environ["DEEPGRAM_API_KEY"],
138
+
model="nova-3",
139
+
language="en",
140
+
)
141
+
).with_llm(
142
+
OpenAI(
143
+
api_key=os.environ["OPENAI_API_KEY"],
144
+
model="gpt-4o-mini",
145
+
max_tokens=1024,
146
+
temperature=0.7,
147
+
top_p=0.95,
148
+
)
149
+
).with_tts(
150
+
MiniMaxTTS(
151
+
key=os.environ["MINIMAX_API_KEY"],
152
+
group_id=os.environ["MINIMAX_GROUP_ID"],
153
+
model="speech_2_6_turbo",
154
+
voice_id="English_captivating_female1",
155
+
url="wss://api-uw.minimax.io/ws/v1/t2a_v2",
156
+
)
157
+
)
158
+
```
62
159
63
160
## BYOK
64
161
65
162
If you want to bring your own vendor credentials instead of using Agora-managed presets, use the BYOK guide:
66
163
67
164
-[BYOK Guide](./docs/guides/byok.md)
68
165
166
+
## MLLM (Realtime / Multimodal)
167
+
168
+
Use `with_mllm()` for OpenAI Realtime or Gemini Live. No STT, LLM, or TTS vendor is needed when MLLM mode is enabled.
169
+
170
+
```python
171
+
from agora_agent.agentkit import Agent, OpenAIRealtime
0 commit comments