Skip to content

Commit f07a849

Browse files
Merge pull request #32 from AgoraIO/release/v2.1.0
Release/v2.1.0
2 parents 2010939 + 8e22e6d commit f07a849

100 files changed

Lines changed: 3132 additions & 1072 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.fernignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@ src/agora_agent/agentkit/
1010
# Documentation - managed manually, not generated by Fern
1111
docs/
1212
README.md
13+
reference.md
14+
15+
# Tests - managed manually, not generated by Fern
16+
tests/
1317

1418
# Compatibility shim and CI/release workflows are managed manually
1519
compat/

README.md

Lines changed: 40 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ pip install agora-agents
2020
## Quick Start
2121

2222
Start with the `Agent` builder: create a client with app credentials, choose your ASR, LLM, and TTS providers, then start a session. Omit vendor API keys for supported Agora-managed models, or provide keys when you want BYOK.
23+
Set Agora interaction language with `turn_detection.language`; provider-specific STT language values remain under `asr.params`.
2324

2425
```python
2526
import os
@@ -29,12 +30,9 @@ from agora_agent import (
2930
Agent,
3031
Agora,
3132
Area,
32-
DataChannel,
3333
DeepgramSTT,
34-
GenericAvatar,
3534
MiniMaxTTS,
3635
OpenAI,
37-
XaiGrok,
3836
expires_in_hours,
3937
)
4038

@@ -56,49 +54,18 @@ def start_conversation() -> str:
5654
app_certificate=app_certificate,
5755
)
5856

59-
agent = Agent(
60-
name=f"conversation-{int(time.time())}",
61-
instructions=AGENT_PROMPT,
62-
greeting=GREETING,
63-
failure_message="Please wait a moment.",
64-
max_history=50,
65-
turn_detection={
66-
"config": {
67-
"speech_threshold": 0.5,
68-
"start_of_speech": {
69-
"mode": "vad",
70-
"vad_config": {
71-
"interrupt_duration_ms": 160,
72-
"prefix_padding_ms": 300,
73-
},
74-
},
75-
"end_of_speech": {
76-
"mode": "vad",
77-
"vad_config": {
78-
"silence_duration_ms": 480,
79-
},
80-
},
81-
},
82-
},
83-
advanced_features={
84-
"enable_rtm": True,
85-
"enable_tools": True,
86-
},
87-
parameters={
88-
"data_channel": DataChannel.RTM,
89-
"enable_error_message": True,
90-
},
91-
).with_stt(
57+
agent = Agent(name=f"conversation-{int(time.time())}", turn_detection={"language": "en-US"}).with_stt(
9258
DeepgramSTT(
9359
model="nova-3",
9460
language="en",
9561
)
9662
).with_llm(
9763
OpenAI(
9864
model="gpt-4o-mini",
65+
system_messages=[{"role": "system", "content": AGENT_PROMPT}],
9966
greeting_message=GREETING,
10067
failure_message="Please wait a moment.",
101-
max_history=15,
68+
max_history=50,
10269
params={
10370
"max_tokens": 1024,
10471
"temperature": 0.7,
@@ -129,15 +96,44 @@ def start_conversation() -> str:
12996

13097
`Agora` generates the required ConvoAI REST auth and RTC join tokens automatically when you provide `app_id` and `app_certificate`. For supported Agora-managed models, leave vendor API keys unset; provide keys when you want BYOK.
13198

99+
## AI Studio pipeline IDs
100+
101+
Use `pipeline_id` when you want a published AI Studio pipeline to provide the base agent configuration:
102+
103+
```python
104+
agent = Agent(
105+
name="support",
106+
pipeline_id="studio-pipeline-id",
107+
)
108+
109+
session = agent.create_session(
110+
client,
111+
channel="support-room",
112+
agent_uid="1",
113+
remote_uids=["100"],
114+
)
115+
```
116+
117+
You can override it per session:
118+
119+
```python
120+
session = agent.create_session(
121+
client,
122+
channel="support-room",
123+
agent_uid="1",
124+
remote_uids=["100"],
125+
pipeline_id="session-pipeline-id",
126+
)
127+
```
128+
129+
AgentKit sends the resolved value as the top-level `/join` field `pipeline_id`, not inside `properties`. Explicit Agent config such as `with_llm()`, `with_tts()`, `with_stt()`, `with_mllm()`, and `advanced_features` may send `properties` fields that override the saved pipeline settings.
130+
132131
### BYOK version
133132

134133
Use the same `Agent` builder shape, but provide credentials explicitly when you want vendor-managed billing and routing instead of Agora-managed models.
135134

136135
```python
137-
agent = Agent(
138-
instructions=AGENT_PROMPT,
139-
greeting=GREETING,
140-
).with_stt(
136+
agent = Agent(turn_detection={"language": "en-US"}).with_stt(
141137
DeepgramSTT(
142138
api_key=os.environ["DEEPGRAM_API_KEY"],
143139
model="nova-3",
@@ -146,7 +142,10 @@ agent = Agent(
146142
).with_llm(
147143
OpenAI(
148144
api_key=os.environ["OPENAI_API_KEY"],
145+
base_url="https://api.openai.com/v1/chat/completions",
149146
model="gpt-4o-mini",
147+
system_messages=[{"role": "system", "content": AGENT_PROMPT}],
148+
greeting_message=GREETING,
150149
max_tokens=1024,
151150
temperature=0.7,
152151
top_p=0.95,

changelog.md

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,25 @@ All notable changes to this project will be documented in this file.
44

55
The format is based on [Keep a Changelog](https://keepachangelog.com/).
66

7+
## [v2.1.0] — 2026-06-02
8+
9+
### Added
10+
11+
- **Turn detection language** — AgentKit now manages Agora interaction language through `turn_detection.language`, validates it against the supported BCP-47 language list, and sends the default `en-US` when no language is provided.
12+
- **Provider parameter parity** — ASR, LLM, MLLM, TTS, and avatar wrappers expose typed provider parameters plus passthrough fields where the generated core supports additional properties.
13+
14+
### Changed
15+
16+
- **Generated core refresh** — Regenerated core types from the v2.1 API schema.
17+
- **Deepgram TTS passthrough**`DeepgramTTS` now uses `additional_params` for passthrough fields and flattens them into `tts.params`; the removed nested `params.params` shape is no longer documented or emitted.
18+
- **OpenAI TTS** — Docs and tests now reflect the generated core shape, including `instructions` and `speed` under `tts.params`.
19+
- **TTS provider docs** — Updated TTS provider reference tables to match implemented wrapper fields and generated core params.
20+
21+
### Fixed
22+
23+
- **Managed-provider validation** — AgentKit validation now distinguishes preset-backed providers from BYOK providers so required provider fields are only required when credentials are caller-supplied.
24+
- **Language placement** — Provider-specific STT language values remain under `asr.params`, while Agora interaction language is emitted separately as `turn_detection.language`.
25+
726
## [v2.0.0] — 2026-05-21
827

928
### Added
@@ -52,7 +71,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).
5271

5372
### Added
5473

55-
- **`DeepgramTTS`** — New TTS vendor wrapper for Deepgram (Beta). Accepts `api_key`, `model`, `base_url`, `sample_rate`, `params`, and `skip_patterns`.
74+
- **`DeepgramTTS`** — New TTS vendor wrapper for Deepgram (Beta). Accepts `api_key`, `model`, `base_url`, `sample_rate`, `additional_params`, and `skip_patterns`.
5675
- **`Agent.with_tools(enabled=True)`** — Dedicated builder method to enable MCP tool invocation (`advanced_features.enable_tools`). Replaces the raw `with_advanced_features(AdvancedFeatures(enable_tools=True))` call.
5776
- **LLM vendors: `headers` field** — All four LLM vendors (`OpenAI`, `AzureOpenAI`, `Anthropic`, `Gemini`) now accept an optional `headers: Dict[str, str]` parameter. Use this to pass custom HTTP headers to the LLM provider (e.g., tenant identifiers, routing headers).
5877
- **`AgentSession.think()` / `AsyncAgentSession.think()`** — Send a custom instruction to a running agent through the `agent_management` API.
@@ -107,7 +126,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).
107126

108127
### Added
109128

110-
- **`OpenAITTS`** — New optional parameters: `response_format` (str, e.g. `"pcm"`) and `speed` (float).
129+
- **`OpenAITTS`** — New optional parameters: `instructions` (str) and `speed` (float).
111130
- **`CartesiaTTS`**`voice_id` user-facing field is preserved; voice is serialized to the required nested object format automatically.
112131
- **`RimeTTS`** — New optional parameters: `lang` (str), `sampling_rate` (int, serialized as `samplingRate`), `speed_alpha` (float, serialized as `speedAlpha`).
113132
- **`OpenAIRealtime`** — New optional parameter: `failure_message` (str).

docs/concepts/agent.md

Lines changed: 37 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -12,24 +12,28 @@ The `Agent` class is a fluent builder for configuring AI agent properties. It co
1212

1313
<!-- snippet: executable -->
1414
```python
15-
from agora_agent import Agent
16-
17-
agent = Agent(
18-
name='support-assistant',
19-
instructions='You are a helpful voice assistant.',
20-
greeting='Hello! How can I help you?',
21-
failure_message='Sorry, something went wrong.',
22-
max_history=20,
15+
from agora_agent import Agent, OpenAI
16+
17+
agent = Agent(name='support-assistant').with_llm(
18+
OpenAI(
19+
api_key='your-openai-key',
20+
base_url='https://api.openai.com/v1/chat/completions',
21+
model='gpt-4o-mini',
22+
system_messages=[{'role': 'system', 'content': 'You are a helpful voice assistant.'}],
23+
greeting_message='Hello! How can I help you?',
24+
failure_message='Sorry, something went wrong.',
25+
max_history=20,
26+
)
2327
)
2428
```
2529

2630
| Parameter | Type | Required | Description |
2731
|---|---|---|---|
2832
| `name` | `str` | No | Agent display name (used as session name if not overridden) |
29-
| `instructions` | `str` | No | System prompt for the LLM |
30-
| `greeting` | `str` | No | Message spoken when the agent joins |
31-
| `failure_message` | `str` | No | Message spoken on error |
32-
| `max_history` | `int` | No | Maximum conversation history length |
33+
| `instructions` | `str` | No | Deprecated. Use LLM vendor `system_messages` instead. |
34+
| `greeting` | `str` | No | Deprecated. Use LLM/MLLM vendor `greeting_message` instead. |
35+
| `failure_message` | `str` | No | Deprecated. Use LLM/MLLM vendor `failure_message` instead. |
36+
| `max_history` | `int` | No | Deprecated. Use LLM vendor `max_history` instead. |
3337
| `turn_detection` | `TurnDetectionConfig` | No | Turn detection settings |
3438
| `sal` | `SalConfig` | No | SAL (Speech Activity Level) configuration |
3539
| `advanced_features` | `Dict[str, Any]` | No | Advanced features (e.g., `{'enable_rtm': True}`) |
@@ -57,15 +61,15 @@ Each `with_*` method returns a **new** `Agent` instance — the original is unch
5761

5862
| Method | Accepts | Purpose |
5963
|---|---|---|
60-
| `with_instructions(text)` | `str` | Override the system prompt |
61-
| `with_greeting(text)` | `str` | Override the greeting message |
64+
| `with_instructions(text)` | `str` | Deprecated. Use LLM vendor `system_messages` instead. |
65+
| `with_greeting(text)` | `str` | Deprecated. Use LLM/MLLM vendor `greeting_message` instead. |
6266
| `with_name(name)` | `str` | Override the agent name |
63-
| `with_turn_detection(config)` | `TurnDetectionConfig` | Override cascading-flow SOS/EOS detection; use `with_interruption()` for interruption behavior |
67+
| `with_turn_detection(config)` | `TurnDetectionConfig` | Configure `turn_detection.language` and cascading-flow SOS/EOS detection; use `with_interruption()` for interruption behavior |
6468
| `with_sal(config)` | `SalConfig` | Set SAL configuration |
6569
| `with_advanced_features(features)` | `Dict[str, Any]` | Set advanced features |
6670
| `with_parameters(parameters)` | `SessionParams` | Set session parameters |
67-
| `with_failure_message(message)` | `str` | Set failure message |
68-
| `with_max_history(max_history)` | `int` | Set max history length |
71+
| `with_failure_message(message)` | `str` | Deprecated. Use LLM/MLLM vendor `failure_message` instead. |
72+
| `with_max_history(max_history)` | `int` | Deprecated. Use LLM vendor `max_history` instead. |
6973
| `with_geofence(geofence)` | `GeofenceConfig` | Set geofence configuration |
7074
| `with_labels(labels)` | `Dict[str, str]` | Set custom labels |
7175
| `with_rtc(rtc)` | `RtcConfig` | Set RTC configuration |
@@ -79,9 +83,14 @@ from agora_agent import Agent
7983
from agora_agent import OpenAI, ElevenLabsTTS, DeepgramSTT
8084

8185
agent = (
82-
Agent(name='my-agent', instructions='You are a helpful assistant.')
83-
.with_llm(OpenAI(api_key='your-openai-key', model='gpt-4o-mini'))
84-
.with_tts(ElevenLabsTTS(key='your-elevenlabs-key', model_id='eleven_flash_v2_5', voice_id='your-voice-id'))
86+
Agent(name='my-agent')
87+
.with_llm(OpenAI(
88+
api_key='your-openai-key',
89+
base_url='https://api.openai.com/v1/chat/completions',
90+
model='gpt-4o-mini',
91+
system_messages=[{'role': 'system', 'content': 'You are a helpful assistant.'}],
92+
))
93+
.with_tts(ElevenLabsTTS(key='your-elevenlabs-key', model_id='eleven_flash_v2_5', voice_id='your-voice-id', base_url='wss://api.elevenlabs.io/v1'))
8594
.with_stt(DeepgramSTT(api_key='your-deepgram-key', language='en-US'))
8695
)
8796
```
@@ -97,9 +106,14 @@ from agora_agent import Agent, Agora, Area, OpenAI, ElevenLabsTTS, DeepgramSTT
97106
client = Agora(area=Area.US, app_id='your-app-id', app_certificate='your-app-certificate')
98107

99108
base = (
100-
Agent(instructions='You are a helpful assistant.')
101-
.with_llm(OpenAI(api_key='your-openai-key', model='gpt-4o-mini'))
102-
.with_tts(ElevenLabsTTS(key='your-elevenlabs-key', model_id='eleven_flash_v2_5', voice_id='your-voice-id'))
109+
Agent()
110+
.with_llm(OpenAI(
111+
api_key='your-openai-key',
112+
base_url='https://api.openai.com/v1/chat/completions',
113+
model='gpt-4o-mini',
114+
system_messages=[{'role': 'system', 'content': 'You are a helpful assistant.'}],
115+
))
116+
.with_tts(ElevenLabsTTS(key='your-elevenlabs-key', model_id='eleven_flash_v2_5', voice_id='your-voice-id', base_url='wss://api.elevenlabs.io/v1'))
103117
.with_stt(DeepgramSTT(api_key='your-deepgram-key', language='en-US'))
104118
)
105119

docs/concepts/session.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,9 +40,14 @@ from agora_agent import Agent, Agora, Area, OpenAI, ElevenLabsTTS, DeepgramSTT
4040
client = Agora(area=Area.US, app_id='your-app-id', app_certificate='your-app-certificate')
4141

4242
agent = (
43-
Agent(name='my-agent', instructions='You are helpful.')
44-
.with_llm(OpenAI(api_key='your-openai-key', model='gpt-4o-mini'))
45-
.with_tts(ElevenLabsTTS(key='your-elevenlabs-key', model_id='eleven_flash_v2_5', voice_id='your-voice-id'))
43+
Agent(name='my-agent')
44+
.with_llm(OpenAI(
45+
api_key='your-openai-key',
46+
base_url='https://api.openai.com/v1/chat/completions',
47+
model='gpt-4o-mini',
48+
system_messages=[{'role': 'system', 'content': 'You are helpful.'}],
49+
))
50+
.with_tts(ElevenLabsTTS(key='your-elevenlabs-key', model_id='eleven_flash_v2_5', voice_id='your-voice-id', base_url='wss://api.elevenlabs.io/v1'))
4651
.with_stt(DeepgramSTT(api_key='your-deepgram-key', language='en-US'))
4752
)
4853

docs/concepts/vendors.md

Lines changed: 25 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -21,21 +21,21 @@ Used with `agent.with_llm()` for the cascading flow (ASR → LLM → TTS).
2121

2222
| Class | Provider | Required Parameters |
2323
|---|---|---|
24-
| `OpenAI` | OpenAI | `api_key` |
25-
| `AzureOpenAI` | Azure OpenAI | `api_key`, `endpoint`, `deployment_name` |
26-
| `Anthropic` | Anthropic | `api_key` |
27-
| `Gemini` | Google Gemini | `api_key` |
28-
| `Groq` | Groq | `api_key` |
29-
| `VertexAILLM` | Google Vertex AI | `api_key`, `project_id`, `location` |
30-
| `AmazonBedrock` | Amazon Bedrock | `api_key`, `url`, `model` |
31-
| `Dify` | Dify | `api_key`, `url` |
24+
| `OpenAI` | OpenAI | `model` for Agora-managed models; `api_key`, `base_url`, `model` for BYOK |
25+
| `AzureOpenAI` | Azure OpenAI | `api_key`, `model`, `endpoint`, `deployment_name` |
26+
| `Anthropic` | Anthropic | `api_key`, `model`, `url`, `headers`, `max_tokens` |
27+
| `Gemini` | Google Gemini | `api_key`, `model` |
28+
| `Groq` | Groq | `api_key`, `model`, `base_url` |
29+
| `VertexAILLM` | Google Vertex AI | `api_key`, `model`, `project_id`, `location` |
30+
| `AmazonBedrock` | Amazon Bedrock | `access_key`, `secret_key`, `region`, `model` |
31+
| `Dify` | Dify | `api_key`, `url`, `model` |
3232
| `CustomLLM` | OpenAI-compatible LLM | `api_key`, `base_url`, `model` |
3333

3434
<!-- snippet: executable -->
3535
```python
3636
from agora_agent import OpenAI
3737

38-
llm = OpenAI(api_key='your-openai-key', model='gpt-4o-mini')
38+
llm = OpenAI(api_key='your-openai-key', base_url='https://api.openai.com/v1/chat/completions', model='gpt-4o-mini')
3939
```
4040

4141
## TTS Vendors
@@ -44,17 +44,17 @@ Used with `agent.with_tts()`. Each TTS vendor produces audio at a specific sampl
4444

4545
| Class | Provider | Required Parameters | Sample Rate |
4646
|---|---|---|---|
47-
| `ElevenLabsTTS` | ElevenLabs | `key`, `model_id`, `voice_id` | 16000, 22050, 24000, or 44100 Hz |
47+
| `ElevenLabsTTS` | ElevenLabs | `key`, `model_id`, `voice_id`, `base_url` | 16000, 22050, 24000, or 44100 Hz |
4848
| `MicrosoftTTS` | Microsoft Azure | `key`, `region`, `voice_name` | 8000, 16000, 24000, or 48000 Hz |
49-
| `OpenAITTS` | OpenAI | `key`, `voice` | 24000 Hz (fixed) |
50-
| `CartesiaTTS` | Cartesia | `key`, `voice_id` | 8000–48000 Hz |
49+
| `OpenAITTS` | OpenAI | `voice` for Agora-managed `tts-1`; `api_key`, `model`, `base_url`, `voice` for BYOK | 24000 Hz (fixed) |
50+
| `CartesiaTTS` | Cartesia | `api_key`, `voice_id`, `model_id` | 8000–48000 Hz |
5151
| `GoogleTTS` | Google Cloud | `key`, `voice_name` ||
52-
| `AmazonTTS` | Amazon Polly | `access_key`, `secret_key`, `region`, `voice_id` ||
53-
| `HumeAITTS` | Hume AI | `key` ||
54-
| `RimeTTS` | Rime | `key`, `speaker` ||
55-
| `FishAudioTTS` | Fish Audio | `key`, `reference_id` ||
52+
| `AmazonTTS` | Amazon Polly | `access_key`, `secret_key`, `region`, `voice_id`, `engine` ||
53+
| `HumeAITTS` | Hume AI | `key`, `voice_id`, `provider` ||
54+
| `RimeTTS` | Rime | `key`, `speaker`, `model_id` ||
55+
| `FishAudioTTS` | Fish Audio | `key`, `reference_id`, `backend` ||
5656
| `GroqTTS` | Groq | `key` ||
57-
| `MiniMaxTTS` | MiniMax | `key` ||
57+
| `MiniMaxTTS` | MiniMax | `model` for supported Agora-managed models; `key`, `group_id`, `model`, `voice_id`, `url` for BYOK ||
5858
| `DeepgramTTS` | Deepgram | `api_key`, `model` | Configurable |
5959
| `SarvamTTS` | Sarvam | `api_key` ||
6060

@@ -66,6 +66,7 @@ tts = ElevenLabsTTS(
6666
key='your-elevenlabs-key',
6767
model_id='eleven_flash_v2_5',
6868
voice_id='your-voice-id',
69+
base_url='wss://api.elevenlabs.io/v1',
6970
sample_rate=24000,
7071
)
7172
```
@@ -74,15 +75,17 @@ tts = ElevenLabsTTS(
7475

7576
Used with `agent.with_stt()`.
7677

78+
Use `turn_detection.language` for Agora interaction language; it defaults to `en-US`. STT vendor `language` options are serialized under `asr.params` using each provider's own format.
79+
7780
| Class | Provider | Required Parameters |
7881
|---|---|---|
7982
| `SpeechmaticsSTT` | Speechmatics | `api_key`, `language` |
80-
| `DeepgramSTT` | Deepgram | — (all optional) |
81-
| `MicrosoftSTT` | Microsoft Azure | `key`, `region` |
83+
| `DeepgramSTT` | Deepgram | `model` for Agora-managed `nova-2`/`nova-3`; `api_key` for BYOK |
84+
| `MicrosoftSTT` | Microsoft Azure | `key`, `region`, `language` |
8285
| `OpenAISTT` | OpenAI | `api_key` |
83-
| `GoogleSTT` | Google Cloud | `api_key` |
84-
| `AmazonSTT` | Amazon Transcribe | `access_key`, `secret_key`, `region` |
85-
| `AssemblyAISTT` | AssemblyAI | `api_key` |
86+
| `GoogleSTT` | Google Cloud | `project_id`, `location`, `adc_credentials_string`, `language` |
87+
| `AmazonSTT` | Amazon Transcribe | `access_key`, `secret_key`, `region`, `language` |
88+
| `AssemblyAISTT` | AssemblyAI | `api_key`, `language` |
8689
| `AresSTT` | Ares | — (all optional) |
8790
| `SarvamSTT` | Sarvam | `api_key`, `language` |
8891

0 commit comments

Comments
 (0)