Skip to content

Commit 2865172

Browse files
authored
Merge pull request #37 from seymourtang/feat/add-cn-prefix
feat:add CN vendor support
2 parents fd79ff0 + 7469650 commit 2865172

45 files changed

Lines changed: 3474 additions & 493 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@
33
__pycache__/
44
dist/
55
poetry.toml
6+
.venv/

README.md

Lines changed: 41 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ pip install agora-agents
1919

2020
## Quick Start
2121

22-
Start with the `Agent` builder: create a client with app credentials, choose your ASR, LLM, and TTS providers, then start a session. Omit vendor API keys for supported Agora-managed models, or provide keys when you want BYOK.
22+
Start with the `Agent` builder: create a client with app credentials, pass it to `Agent(client=client, ...)`, choose your ASR, LLM, and TTS providers, then start a session. Omit vendor API keys for supported Agora-managed global models, or provide keys when you want BYOK.
2323
Set Agora interaction language with `turn_detection.language`; provider-specific STT language values remain under `asr.params`. Ares uses only the REST `asr.language` value sourced from `turn_detection.language`.
2424

2525
```python
@@ -31,8 +31,8 @@ from agora_agent import (
3131
Agora,
3232
Area,
3333
DeepgramSTT,
34-
MiniMaxTTS,
3534
OpenAI,
35+
MiniMaxTTS,
3636
expires_in_hours,
3737
)
3838

@@ -54,7 +54,7 @@ def start_conversation() -> str:
5454
app_certificate=app_certificate,
5555
)
5656

57-
agent = Agent(name=f"conversation-{int(time.time())}", turn_detection={"language": "en-US"}).with_stt(
57+
agent = Agent(client=client, turn_detection={"language": "en-US"}).with_stt(
5858
DeepgramSTT(
5959
model="nova-3",
6060
language="en",
@@ -80,10 +80,10 @@ def start_conversation() -> str:
8080
)
8181

8282
session = agent.create_session(
83-
client,
8483
channel=f"demo-channel-{int(time.time())}",
8584
agent_uid="123456",
8685
remote_uids=["*"],
86+
name=f"conversation-{int(time.time())}",
8787
idle_timeout=30,
8888
expires_in=expires_in_hours(1),
8989
debug=False,
@@ -94,35 +94,42 @@ def start_conversation() -> str:
9494

9595
### Why no token or vendor key in the example?
9696

97-
`Agora` generates the required ConvoAI REST auth and RTC join tokens automatically when you provide `app_id` and `app_certificate`. For supported Agora-managed models, leave vendor API keys unset; provide keys when you want BYOK.
97+
`Agora` generates the required ConvoAI REST auth and RTC join tokens automatically when you provide `app_id` and `app_certificate`. For supported Agora-managed global models, leave vendor API keys unset; provide keys when you want BYOK. CN MiniMax TTS is not Agora-managed in the same way and typically includes `key`.
98+
99+
### Regional agent builders
100+
101+
Bind the client once with `Agent(client=client, ...)` and pass vendor classes directly such as `OpenAI(...)` or `MiniMaxTTS(...)`. The bound client selects the API routing region and provides IDE hints via `CNAgent` / `GlobalAgent`, but does not restrict which vendor classes you can use. See `[docs/guides/regional-routing.md](./docs/guides/regional-routing.md)` for regional examples.
98102

99103
## AI Studio pipeline IDs
100104

101105
Use `pipeline_id` when you want a published AI Studio pipeline to provide the base agent configuration:
102106

103107
```python
108+
import time
109+
# client = Agora(area=Area.US, app_id="...", app_certificate="...")
104110
agent = Agent(
105-
name="support",
111+
client=client,
106112
pipeline_id="studio-pipeline-id",
107113
)
108114

109115
session = agent.create_session(
110-
client,
111-
channel="support-room",
116+
channel=f"demo-channel-{int(time.time())}",
112117
agent_uid="1",
113118
remote_uids=["100"],
119+
name=f"conversation-{int(time.time())}",
114120
)
115121
```
116122

117123
You can override it per session:
118124

119125
```python
126+
import time
120127
session = agent.create_session(
121-
client,
122-
channel="support-room",
128+
channel=f"demo-channel-{int(time.time())}",
123129
agent_uid="1",
124130
remote_uids=["100"],
125131
pipeline_id="session-pipeline-id",
132+
name=f"conversation-{int(time.time())}",
126133
)
127134
```
128135

@@ -133,7 +140,7 @@ AgentKit sends the resolved value as the top-level `/join` field `pipeline_id`,
133140
Use the same `Agent` builder shape, but provide credentials explicitly when you want vendor-managed billing and routing instead of Agora-managed models.
134141

135142
```python
136-
agent = Agent(turn_detection={"language": "en-US"}).with_stt(
143+
agent = Agent(client=client, turn_detection={"language": "en-US"}).with_stt(
137144
DeepgramSTT(
138145
api_key=os.environ["DEEPGRAM_API_KEY"],
139146
model="nova-3",
@@ -151,12 +158,11 @@ agent = Agent(turn_detection={"language": "en-US"}).with_stt(
151158
top_p=0.95,
152159
)
153160
).with_tts(
154-
MiniMaxTTS(
155-
key=os.environ["MINIMAX_API_KEY"],
156-
group_id=os.environ["MINIMAX_GROUP_ID"],
157-
model="speech_2_6_turbo",
158-
voice_id="English_captivating_female1",
159-
url="wss://api-uw.minimax.io/ws/v1/t2a_v2",
161+
ElevenLabsTTS(
162+
key=os.environ["ELEVENLABS_API_KEY"],
163+
model_id="eleven_flash_v2_5",
164+
voice_id=os.environ["ELEVENLABS_VOICE_ID"],
165+
base_url="wss://api.elevenlabs.io/v1",
160166
)
161167
)
162168
```
@@ -174,15 +180,30 @@ If you want to bring your own vendor credentials instead of using Agora-managed
174180
Use `with_mllm()` for OpenAI Realtime, Gemini Live, Vertex AI, or xAI Grok. No STT, LLM, or TTS vendor is needed when MLLM mode is enabled.
175181

176182
```python
177-
from agora_agent import Agent, OpenAIRealtime
183+
from agora_agent import Agent, Agora, Area, OpenAIRealtime
184+
import time
178185

179-
agent = Agent(name="realtime-assistant").with_mllm(
186+
client = Agora(
187+
area=Area.US,
188+
app_id=os.environ["AGORA_APP_ID"],
189+
app_certificate=os.environ["AGORA_APP_CERTIFICATE"],
190+
)
191+
192+
agent = Agent(client=client).with_mllm(
180193
OpenAIRealtime(
181194
api_key=os.environ["OPENAI_API_KEY"],
182195
model="gpt-4o-realtime-preview",
183196
greeting_message="Hello! Ready to chat.",
184197
)
185198
)
199+
200+
session = agent.create_session(
201+
channel=f"demo-channel-{int(time.time())}",
202+
agent_uid="1",
203+
remote_uids=["*"],
204+
name=f"conversation-{int(time.time())}",
205+
)
206+
session.start()
186207
```
187208

188209
See the [MLLM Flow guide](./docs/guides/mllm-flow.md) for full examples with Gemini Live and Vertex AI.
@@ -333,4 +354,4 @@ otherwise they would be overwritten upon the next generated release. Feel free t
333354
a proof of concept, but know that we will not be able to merge it as-is. We suggest opening
334355
an issue first to discuss with us!
335356

336-
On the other hand, contributions to the README are always very welcome!
357+
On the other hand, contributions to the README are always very welcome!

docs/concepts/agent.md

Lines changed: 24 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,17 @@ description: The Agent builder — configure an AI agent with LLM, TTS, STT, and
66

77
# Agent
88

9-
The `Agent` class is a fluent builder for configuring AI agent properties. It collects vendor settings (LLM, TTS, STT, MLLM, avatar) and session parameters, then produces a fully configured `AgentSession` when you call `create_session()`.
9+
The `Agent` class is a fluent builder for configuring AI agent properties. Pass a bound `Agora` or `AsyncAgora` client with `client=...` — it is required for `create_session()` and `create_async_session()`. The builder collects vendor settings (LLM, TTS, STT, MLLM, avatar) and produces a fully configured `AgentSession` when you call `create_session()`. The agent instance name is set on `create_session(name=...)`, not on the `Agent` constructor.
1010

1111
## Constructor
1212

1313
<!-- snippet: executable -->
1414
```python
15-
from agora_agent import Agent, OpenAI
15+
from agora_agent import Agent, Agora, Area, OpenAI
1616

17-
agent = Agent(name='support-assistant').with_llm(
17+
client = Agora(area=Area.US, app_id='your-app-id', app_certificate='your-app-certificate')
18+
19+
agent = Agent(client=client).with_llm(
1820
OpenAI(
1921
api_key='your-openai-key',
2022
base_url='https://api.openai.com/v1/chat/completions',
@@ -29,7 +31,8 @@ agent = Agent(name='support-assistant').with_llm(
2931

3032
| Parameter | Type | Required | Description |
3133
|---|---|---|---|
32-
| `name` | `str` | No | Agent display name (used as session name if not overridden) |
34+
| `client` | `Agora` / `AsyncAgora` | Yes | Authenticated client from `Agora(...)` or `AsyncAgora(...)`. Required for `create_session()` and `create_async_session()`. |
35+
| `pipeline_id` | `str` | No | Published AI Studio pipeline ID used as this agent's base configuration |
3336
| `instructions` | `str` | No | Deprecated. Use LLM vendor `system_messages` instead. |
3437
| `greeting` | `str` | No | Deprecated. Use LLM/MLLM vendor `greeting_message` instead. |
3538
| `failure_message` | `str` | No | Deprecated. Use LLM/MLLM vendor `failure_message` instead. |
@@ -43,6 +46,8 @@ agent = Agent(name='support-assistant').with_llm(
4346
| `rtc` | `RtcConfig` | No | RTC media encryption |
4447
| `filler_words` | `FillerWordsConfig` | No | Filler words while waiting for LLM |
4548

49+
When `client` is provided, `Agent(client=...)` returns `CNAgent` for `Area.CN` and `GlobalAgent` for global areas.
50+
4651
## Builder Methods
4752

4853
Each `with_*` method returns a **new** `Agent` instance — the original is unchanged. This immutability lets you safely reuse a base configuration for multiple sessions.
@@ -63,7 +68,6 @@ Each `with_*` method returns a **new** `Agent` instance — the original is unch
6368
|---|---|---|
6469
| `with_instructions(text)` | `str` | Deprecated. Use LLM vendor `system_messages` instead. |
6570
| `with_greeting(text)` | `str` | Deprecated. Use LLM/MLLM vendor `greeting_message` instead. |
66-
| `with_name(name)` | `str` | Override the agent name |
6771
| `with_turn_detection(config)` | `TurnDetectionConfig` | Configure `turn_detection.language` and cascading-flow SOS/EOS detection; use `with_interruption()` for interruption behavior |
6872
| `with_sal(config)` | `SalConfig` | Set SAL configuration |
6973
| `with_advanced_features(features)` | `Dict[str, Any]` | Set advanced features |
@@ -79,11 +83,12 @@ Each `with_*` method returns a **new** `Agent` instance — the original is unch
7983

8084
<!-- snippet: executable -->
8185
```python
82-
from agora_agent import Agent
83-
from agora_agent import OpenAI, ElevenLabsTTS, DeepgramSTT
86+
from agora_agent import Agent, Agora, Area, DeepgramSTT, ElevenLabsTTS, OpenAI
87+
88+
client = Agora(area=Area.US, app_id='your-app-id', app_certificate='your-app-certificate')
8489

8590
agent = (
86-
Agent(name='my-agent')
91+
Agent(client=client)
8792
.with_llm(OpenAI(
8893
api_key='your-openai-key',
8994
base_url='https://api.openai.com/v1/chat/completions',
@@ -101,12 +106,13 @@ Because each `with_*` call returns a new `Agent`, you can build a base configura
101106

102107
<!-- snippet: executable -->
103108
```python
104-
from agora_agent import Agent, Agora, Area, OpenAI, ElevenLabsTTS, DeepgramSTT
109+
from agora_agent import Agent, Agora, Area, DeepgramSTT, ElevenLabsTTS, OpenAI
110+
import time
105111

106112
client = Agora(area=Area.US, app_id='your-app-id', app_certificate='your-app-certificate')
107113

108114
base = (
109-
Agent()
115+
Agent(client=client)
110116
.with_llm(OpenAI(
111117
api_key='your-openai-key',
112118
base_url='https://api.openai.com/v1/chat/completions',
@@ -117,23 +123,23 @@ base = (
117123
.with_stt(DeepgramSTT(api_key='your-deepgram-key', language='en-US'))
118124
)
119125

120-
# Same agent config, different channels
121-
session_a = base.create_session(client, channel='room-a', agent_uid='1', remote_uids=['100'])
122-
session_b = base.create_session(client, channel='room-b', agent_uid='1', remote_uids=['200'])
126+
# Same agent config, different channels and session names
127+
session_a = base.create_session(channel=f"demo-channel-{int(time.time())}", agent_uid='1', remote_uids=['100'], name=f"conversation-{int(time.time())}")
128+
session_b = base.create_session(channel=f"demo-channel-{int(time.time())}", agent_uid='1', remote_uids=['200'], name=f"conversation-{int(time.time())}")
123129
```
124130

125131
## `create_session()`
126132

127-
Creates a new `AgentSession` bound to a client and channel.
133+
Creates a new `AgentSession` using the client already bound to the agent. Pass the agent instance name via the `name` parameter here — it is sent to the Start Agent API when the session starts.
128134

129135
<!-- snippet: fragment -->
130136
```python
137+
import time
131138
session = agent.create_session(
132-
client,
133-
channel='my-channel',
139+
channel=f"demo-channel-{int(time.time())}",
134140
agent_uid='1',
135141
remote_uids=['100'],
136-
name='optional-session-name',
142+
name=f"conversation-{int(time.time())}",
137143
token='optional-pre-built-token',
138144
idle_timeout=300,
139145
enable_string_uid=True,
@@ -142,11 +148,10 @@ session = agent.create_session(
142148

143149
| Parameter | Type | Required | Description |
144150
|---|---|---|---|
145-
| `client` | `Agora` or `AsyncAgora` | Yes | The authenticated client |
146151
| `channel` | `str` | Yes | Agora channel name |
147152
| `agent_uid` | `str` | Yes | UID for the agent in the channel |
148153
| `remote_uids` | `List[str]` | Yes | UIDs of remote participants to listen to |
149-
| `name` | `str` | No | Session name (defaults to agent name or auto-generated) |
154+
| `name` | `str` | No | Agent instance name for the Start Agent API (defaults to `agent-{timestamp}` if omitted) |
150155
| `token` | `str` | No | Pre-built RTC token (if not provided, generated from client credentials) |
151156
| `idle_timeout` | `int` | No | Idle timeout in seconds |
152157
| `enable_string_uid` | `bool` | No | Enable string UIDs |
@@ -165,7 +170,6 @@ See [Avatar Integration](../guides/avatars.md) for details.
165170

166171
| Property | Type | Description |
167172
|---|---|---|
168-
| `agent.name` | `Optional[str]` | Agent name |
169173
| `agent.instructions` | `Optional[str]` | System prompt |
170174
| `agent.greeting` | `Optional[str]` | Greeting message |
171175
| `agent.failure_message` | `Optional[str]` | Message spoken when LLM fails |

docs/concepts/architecture.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ The Python SDK has three layers:
2626

2727
This is the primary developer-facing API. It provides:
2828

29-
- **`Agent`** — a fluent builder for configuring AI agents with LLM, TTS, STT, MLLM, and avatar vendors
29+
- **`Agent`** — a fluent builder for configuring AI agents with LLM, TTS, STT, MLLM, and avatar vendors. Requires a bound `Agora` / `AsyncAgora` client via `client=...`.
3030
- **`AgentSession` / `AsyncAgentSession`** — lifecycle management for running agents (start, stop, say, interrupt)
3131
- **Vendor classes** — typed configuration for 28+ vendor integrations across 5 categories
3232
- **`generate_rtc_token()`** — helper for building RTC tokens

docs/concepts/session.md

Lines changed: 35 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ description: Manage the full lifecycle of a running Agora Conversational AI agen
88

99
`AgentSession` (sync) and `AsyncAgentSession` (async) manage the lifecycle of a running AI agent. They handle starting, stopping, sending speech, interrupting, updating configuration, and retrieving history.
1010

11+
Presets are configured at session creation time when you use them explicitly. Most applications should configure vendors on the `Agent` builder instead — see [Quick Start](../getting-started/quick-start.md).
12+
1113
## State Machine
1214

1315
An agent session moves through these states:
@@ -31,16 +33,17 @@ You can check the current state with `session.status`.
3133

3234
## Creating a Session
3335

34-
Use `Agent.create_session()` to create a session:
36+
Use `Agent.create_session()` to create a session. The parent `Agent` must be constructed with `client=...`. Set the agent instance name with the `name` parameter — this value is sent to the Start Agent API when you call `session.start()`.
3537

3638
<!-- snippet: executable -->
3739
```python
38-
from agora_agent import Agent, Agora, Area, OpenAI, ElevenLabsTTS, DeepgramSTT
40+
from agora_agent import Agent, Agora, Area, DeepgramSTT, ElevenLabsTTS, OpenAI
41+
import time
3942

4043
client = Agora(area=Area.US, app_id='your-app-id', app_certificate='your-app-certificate')
4144

4245
agent = (
43-
Agent(name='my-agent')
46+
Agent(client=client)
4447
.with_llm(OpenAI(
4548
api_key='your-openai-key',
4649
base_url='https://api.openai.com/v1/chat/completions',
@@ -51,7 +54,7 @@ agent = (
5154
.with_stt(DeepgramSTT(api_key='your-deepgram-key', language='en-US'))
5255
)
5356

54-
session = agent.create_session(client, channel='my-channel', agent_uid='1', remote_uids=['100'])
57+
session = agent.create_session(channel=f"demo-channel-{int(time.time())}", agent_uid='1', remote_uids=['100'], name=f"conversation-{int(time.time())}")
5558
```
5659

5760
## Sync Methods
@@ -110,6 +113,32 @@ await session.stop()
110113
| Update | `session.update(props)``None` | `await session.update(props)``None` |
111114
| History | `session.get_history()` → response | `await session.get_history()` → response |
112115
| Info | `session.get_info()` → response | `await session.get_info()` → response |
116+
| Think | `session.think(options)` → response | `await session.think(options)` → response |
117+
| Turns | `session.get_turns(options)` → response | `await session.get_turns(options)` → response |
118+
| All turns | `session.get_all_turns(options)` → response | `await session.get_all_turns(options)` → response |
119+
120+
## Agora-managed models and BYOK
121+
122+
When you omit credentials for supported Agora-managed global models on the builder, AgentKit sends the matching Agora-managed configuration at session start. Pass your own vendor API keys when you need BYOK. CN MiniMax TTS is not Agora-managed in the same way and typically includes `key`.
123+
124+
<!-- snippet: fragment -->
125+
```python
126+
from agora_agent import Agent, Agora, Area, DeepgramSTT, OpenAI, OpenAITTS
127+
128+
client = Agora(area=Area.US, app_id='your-app-id', app_certificate='your-app-certificate')
129+
130+
agent = (
131+
Agent(client=client)
132+
.with_stt(DeepgramSTT(model="nova-3", language="en-US"))
133+
.with_llm(OpenAI(
134+
model="gpt-4o-mini",
135+
system_messages=[{"role": "system", "content": "Be concise."}],
136+
))
137+
.with_tts(OpenAITTS(voice="alloy"))
138+
)
139+
```
140+
141+
For explicit project-specific preset values and the full list of Agora-managed models, see [AgentSession Reference](../reference/session.md).
113142

114143
## Events
115144

@@ -166,7 +195,8 @@ Session methods raise `RuntimeError` if called in an invalid state:
166195

167196
<!-- snippet: fragment -->
168197
```python
169-
session = agent.create_session(client, channel='my-channel', agent_uid='1', remote_uids=['100'])
198+
import time
199+
session = agent.create_session(channel=f"demo-channel-{int(time.time())}", agent_uid='1', remote_uids=['100'], name=f"conversation-{int(time.time())}")
170200

171201
# This raises RuntimeError — session hasn't started yet
172202
session.say('Hello!') # RuntimeError: Cannot say in idle state

0 commit comments

Comments
 (0)