Skip to content

Commit 583eccc

Browse files
Move AgentKit language to turn detection
Move the Agora interaction language setting from the unlaunched top-level AgentKit API into turn_detection.language across TypeScript, Python, and Go. Remove the top-level interaction language helpers and STT vendor override fields, keep provider-specific STT language under asr.params, and default turn_detection.language to en-US when omitted. Update tests, READMEs, docs, and changelogs to reflect the final v2.1.0 API surface.
1 parent 299e4bd commit 583eccc

11 files changed

Lines changed: 96 additions & 111 deletions

File tree

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ pip install agora-agents
2020
## Quick Start
2121

2222
Start with the `Agent` builder: create a client with app credentials, choose your ASR, LLM, and TTS providers, then start a session. Omit vendor API keys for supported Agora-managed models, or provide keys when you want BYOK.
23-
Use `with_interaction_language()` for Agora `asr.language`; provider-specific STT language values remain under `asr.params`.
23+
Set Agora interaction language with `turn_detection.language`; provider-specific STT language values remain under `asr.params`.
2424

2525
```python
2626
import os
@@ -54,7 +54,7 @@ def start_conversation() -> str:
5454
app_certificate=app_certificate,
5555
)
5656

57-
agent = Agent(name=f"conversation-{int(time.time())}").with_interaction_language("en-US").with_stt(
57+
agent = Agent(name=f"conversation-{int(time.time())}", turn_detection={"language": "en-US"}).with_stt(
5858
DeepgramSTT(
5959
model="nova-3",
6060
language="en",
@@ -101,7 +101,7 @@ def start_conversation() -> str:
101101
Use the same `Agent` builder shape, but provide credentials explicitly when you want vendor-managed billing and routing instead of Agora-managed models.
102102

103103
```python
104-
agent = Agent().with_interaction_language("en-US").with_stt(
104+
agent = Agent(turn_detection={"language": "en-US"}).with_stt(
105105
DeepgramSTT(
106106
api_key=os.environ["DEEPGRAM_API_KEY"],
107107
model="nova-3",

changelog.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).
88

99
### Added
1010

11-
- **ASR interaction language** — AgentKit now manages Agora `asr.language` through `interaction_language` / `Agent.with_interaction_language()`, validates it against the supported BCP-47 interaction language list, and sends the default `en-US` when no language is provided.
11+
- **Turn detection language** — AgentKit now manages Agora interaction language through `turn_detection.language`, validates it against the supported BCP-47 language list, and sends the default `en-US` when no language is provided.
1212
- **Provider parameter parity** — ASR, LLM, MLLM, TTS, and avatar wrappers expose typed provider parameters plus passthrough fields where the generated core supports additional properties.
1313

1414
### Changed
@@ -21,7 +21,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).
2121
### Fixed
2222

2323
- **Managed-provider validation** — AgentKit validation now distinguishes preset-backed providers from BYOK providers so required provider fields are only required when credentials are caller-supplied.
24-
- **ASR language separation** — Provider-specific STT language values remain under `asr.params`, while Agora interaction language is emitted separately as `asr.language`.
24+
- **Language placement** — Provider-specific STT language values remain under `asr.params`, while Agora interaction language is emitted separately as `turn_detection.language`.
2525

2626
## [v2.0.0] — 2026-05-21
2727

docs/concepts/agent.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ Each `with_*` method returns a **new** `Agent` instance — the original is unch
6464
| `with_instructions(text)` | `str` | Deprecated. Use LLM vendor `system_messages` instead. |
6565
| `with_greeting(text)` | `str` | Deprecated. Use LLM/MLLM vendor `greeting_message` instead. |
6666
| `with_name(name)` | `str` | Override the agent name |
67-
| `with_turn_detection(config)` | `TurnDetectionConfig` | Override cascading-flow SOS/EOS detection; use `with_interruption()` for interruption behavior |
67+
| `with_turn_detection(config)` | `TurnDetectionConfig` | Configure `turn_detection.language` and cascading-flow SOS/EOS detection; use `with_interruption()` for interruption behavior |
6868
| `with_sal(config)` | `SalConfig` | Set SAL configuration |
6969
| `with_advanced_features(features)` | `Dict[str, Any]` | Set advanced features |
7070
| `with_parameters(parameters)` | `SessionParams` | Set session parameters |

docs/concepts/vendors.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ tts = ElevenLabsTTS(
7575

7676
Used with `agent.with_stt()`.
7777

78-
Use `agent.with_interaction_language()` for Agora `asr.language`; it defaults to `en-US`. STT vendor `language` options are serialized under `asr.params` using each provider's own format.
78+
Use `turn_detection.language` for Agora interaction language; it defaults to `en-US`. STT vendor `language` options are serialized under `asr.params` using each provider's own format.
7979

8080
| Class | Provider | Required Parameters |
8181
|---|---|---|

docs/reference/agent.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Agent(
3434
|---|---|---|---|
3535
| `name` | `Optional[str]` | `None` | Agent name, used as default session name |
3636
| `instructions` | `Optional[str]` | `None` | Deprecated. Use LLM vendor `system_messages` instead. |
37-
| `turn_detection` | `Optional[TurnDetectionConfig]` | `None` | Turn detection configuration |
37+
| `turn_detection` | `Optional[TurnDetectionConfig]` | `None` | Interaction language and turn detection configuration |
3838
| `interruption` | `Optional[InterruptionConfig]` | `None` | Unified interruption control configuration |
3939
| `sal` | `Optional[SalConfig]` | `None` | Speech Activity Level configuration |
4040
| `advanced_features` | `Optional[Dict[str, Any]]` | `None` | Advanced features dict (e.g., `{'enable_rtm': True}`) |
@@ -109,7 +109,7 @@ agent = agent.with_avatar(HeyGenAvatar(api_key='your-key', quality='medium', ago
109109

110110
### `with_turn_detection(config: TurnDetectionConfig) -> Agent`
111111

112-
Override cascading-flow turn detection settings. Use `config.start_of_speech` and `config.end_of_speech` for SOS/EOS detection. Use `with_interruption()` for interruption behavior and MLLM vendor `turn_detection` for MLLM turn detection.
112+
Override cascading-flow turn detection settings. Use `language` for the Agora interaction language, `config.start_of_speech` and `config.end_of_speech` for SOS/EOS detection, `with_interruption()` for interruption behavior, and MLLM vendor `turn_detection` for MLLM turn detection.
113113

114114
Pause-state detection is configured under semantic end-of-speech:
115115

@@ -257,7 +257,7 @@ to_properties(
257257
| `stt` | `Optional[Dict[str, Any]]` | STT config dict |
258258
| `mllm` | `Optional[Dict[str, Any]]` | MLLM config dict |
259259
| `avatar` | `Optional[Dict[str, Any]]` | Avatar config dict |
260-
| `turn_detection` | `Optional[TurnDetectionConfig]` | Turn detection settings |
260+
| `turn_detection` | `Optional[TurnDetectionConfig]` | Interaction language and turn detection settings |
261261
| `sal` | `Optional[SalConfig]` | SAL configuration |
262262
| `advanced_features` | `Optional[Dict[str, Any]]` | Advanced features |
263263
| `parameters` | `Optional[SessionParams]` | Session parameters |

docs/reference/vendors.md

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -318,15 +318,14 @@ The SDK also includes named helpers for the remaining Agora-supported LLM provid
318318

319319
## STT Vendors
320320

321-
Use `agent.with_interaction_language()` for Agora `asr.language`; it defaults to `en-US`. Provider-specific language values remain under `asr.params` and may use a different format.
321+
Use `turn_detection.language` for Agora interaction language; it defaults to `en-US`. Provider-specific language values remain under `asr.params` and may use a different format.
322322

323323
### `SpeechmaticsSTT`
324324

325325
| Parameter | Type | Required | Default | Description |
326326
|---|---|---|---|---|
327327
| `api_key` | `str` | Yes || Speechmatics API key |
328328
| `language` | `str` | Yes || Language code (e.g., `en`) |
329-
| `interaction_language` | `str` | No | `None` | Agora `asr.language` override |
330329
| `uri` | `str` | No | `None` | Speechmatics streaming WebSocket URL |
331330
| `additional_params` | `Dict[str, Any]` | No | `None` | Additional parameters |
332331

@@ -337,7 +336,6 @@ Use `agent.with_interaction_language()` for Agora `asr.language`; it defaults to
337336
| `api_key` | `str` | BYOK only | `None` | Deepgram API key. Optional only for Agora-managed `nova-2` and `nova-3`. |
338337
| `model` | `str` | No | `None` | Model (e.g., `nova-2`) |
339338
| `language` | `str` | No | `None` | Language code (e.g., `en-US`) |
340-
| `interaction_language` | `str` | No | `None` | Agora `asr.language` override |
341339
| `smart_format` | `bool` | No | `None` | Enable smart formatting |
342340
| `punctuation` | `bool` | No | `None` | Enable punctuation |
343341
| `additional_params` | `Dict[str, Any]` | No | `None` | Additional parameters |
@@ -351,7 +349,6 @@ For `nova-2` and `nova-3`, omit `api_key` to use Agora-managed credentials. For
351349
| `key` | `str` | Yes || Azure subscription key |
352350
| `region` | `str` | Yes || Azure region (e.g., `eastus`) |
353351
| `language` | `str` | Yes || Language code (e.g., `en-US`) |
354-
| `interaction_language` | `str` | No | `None` | Agora `asr.language` override |
355352
| `additional_params` | `Dict[str, Any]` | No | `None` | Additional parameters |
356353

357354
### `OpenAISTT`
@@ -363,7 +360,6 @@ For `nova-2` and `nova-3`, omit `api_key` to use Agora-managed credentials. For
363360
| `language` | `str` | No | `None` | Language code |
364361
| `prompt` | `str` | No | `None` | Prompt for OpenAI transcription |
365362
| `input_audio_transcription` | `Dict[str, Any]` | No | `None` | OpenAI transcription settings |
366-
| `interaction_language` | `str` | No | `None` | Agora `asr.language` override |
367363
| `additional_params` | `Dict[str, Any]` | No | `None` | Additional parameters |
368364

369365
### `GoogleSTT`
@@ -374,7 +370,6 @@ For `nova-2` and `nova-3`, omit `api_key` to use Agora-managed credentials. For
374370
| `location` | `str` | Yes || Google Cloud region |
375371
| `adc_credentials_string` | `str` | Yes || Google service account credentials JSON string |
376372
| `language` | `str` | Yes || Language code (e.g., `en-US`) |
377-
| `interaction_language` | `str` | No | `None` | Agora `asr.language` override |
378373
| `model` | `str` | No | `None` | Recognition model |
379374
| `additional_params` | `Dict[str, Any]` | No | `None` | Additional parameters |
380375

@@ -386,7 +381,6 @@ For `nova-2` and `nova-3`, omit `api_key` to use Agora-managed credentials. For
386381
| `secret_key` | `str` | Yes || AWS Secret Access Key |
387382
| `region` | `str` | Yes || AWS region (e.g., `us-east-1`) |
388383
| `language` | `str` | Yes || Amazon `language_code` |
389-
| `interaction_language` | `str` | No | `None` | Agora `asr.language` override |
390384
| `additional_params` | `Dict[str, Any]` | No | `None` | Additional parameters |
391385

392386
### `AssemblyAISTT`
@@ -395,7 +389,6 @@ For `nova-2` and `nova-3`, omit `api_key` to use Agora-managed credentials. For
395389
|---|---|---|---|---|
396390
| `api_key` | `str` | Yes || AssemblyAI API key |
397391
| `language` | `str` | Yes || Language code |
398-
| `interaction_language` | `str` | No | `None` | Agora `asr.language` override |
399392
| `uri` | `str` | No | `None` | AssemblyAI streaming WebSocket URL |
400393
| `additional_params` | `Dict[str, Any]` | No | `None` | Additional parameters |
401394

src/agora_agent/agentkit/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
AgentConfig,
44
AgentConfigUpdate,
55
AsrConfig,
6-
InteractionLanguage,
6+
TurnDetectionLanguage,
77
ConversationHistory,
88
ConversationRole,
99
ConversationSessionTurn,
@@ -205,7 +205,7 @@
205205
"LlmStyle",
206206
"SttConfig",
207207
"AsrConfig",
208-
"InteractionLanguage",
208+
"TurnDetectionLanguage",
209209
"SttVendor",
210210
"TtsConfig",
211211
"MllmConfig",

src/agora_agent/agentkit/agent.py

Lines changed: 29 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,7 @@ class SessionOptions(typing_extensions.TypedDict, total=False):
210210

211211
from .token import generate_convo_ai_token, _parse_numeric_uid, _validate_expires_in
212212

213-
InteractionLanguage = typing_extensions.Literal[
213+
TurnDetectionLanguage = typing_extensions.Literal[
214214
"ar-EG",
215215
"ar-JO",
216216
"ar-SA",
@@ -245,8 +245,8 @@ class SessionOptions(typing_extensions.TypedDict, total=False):
245245
"vi-VN",
246246
]
247247

248-
DEFAULT_INTERACTION_LANGUAGE: InteractionLanguage = "en-US"
249-
INTERACTION_LANGUAGE_VALUES: typing.Tuple[InteractionLanguage, ...] = (
248+
DEFAULT_TURN_DETECTION_LANGUAGE: TurnDetectionLanguage = "en-US"
249+
TURN_DETECTION_LANGUAGE_VALUES: typing.Tuple[TurnDetectionLanguage, ...] = (
250250
"ar-EG",
251251
"ar-JO",
252252
"ar-SA",
@@ -280,7 +280,7 @@ class SessionOptions(typing_extensions.TypedDict, total=False):
280280
"tr-TR",
281281
"vi-VN",
282282
)
283-
_INTERACTION_LANGUAGES = set(INTERACTION_LANGUAGE_VALUES)
283+
_TURN_DETECTION_LANGUAGES = set(TURN_DETECTION_LANGUAGE_VALUES)
284284

285285

286286
def _dump_optional_model(value: typing.Any) -> typing.Any:
@@ -291,12 +291,12 @@ def _dump_optional_model(value: typing.Any) -> typing.Any:
291291
return value
292292

293293

294-
def _is_interaction_language(value: typing.Any) -> bool:
295-
return isinstance(value, str) and value in _INTERACTION_LANGUAGES
294+
def _is_turn_detection_language(value: typing.Any) -> bool:
295+
return isinstance(value, str) and value in _TURN_DETECTION_LANGUAGES
296296

297297

298-
def _validate_interaction_language(value: typing.Any) -> InteractionLanguage:
299-
if not _is_interaction_language(value):
298+
def _validate_turn_detection_language(value: typing.Any) -> TurnDetectionLanguage:
299+
if not _is_turn_detection_language(value):
300300
raise ValueError(f"Invalid interaction language: {value}")
301301
return value # type: ignore[return-value]
302302

@@ -335,7 +335,6 @@ def __init__(
335335
sal: typing.Optional[SalConfig] = None,
336336
advanced_features: typing.Optional[AdvancedFeatures] = None,
337337
parameters: typing.Optional[typing.Union[SessionParams, SessionParamsInput]] = None,
338-
interaction_language: typing.Optional[InteractionLanguage] = None,
339338
greeting: typing.Optional[str] = None,
340339
failure_message: typing.Optional[str] = None,
341340
max_history: typing.Optional[int] = None,
@@ -362,11 +361,6 @@ def __init__(
362361
self._sal = sal
363362
self._advanced_features = advanced_features
364363
self._parameters = parameters
365-
self._interaction_language = (
366-
_validate_interaction_language(interaction_language)
367-
if interaction_language is not None
368-
else None
369-
)
370364
self._geofence = geofence
371365
self._labels = labels
372366
self._rtc = rtc
@@ -400,16 +394,6 @@ def with_stt(self, vendor: BaseSTT) -> "Agent":
400394
new_agent._stt = vendor.to_config()
401395
return new_agent
402396

403-
def with_interaction_language(self, language: InteractionLanguage) -> "Agent":
404-
"""Returns a new Agent with the Agora interaction language.
405-
406-
This serializes to ``asr.language``. Vendor-specific language values
407-
remain under ``asr.params``, for example ``asr.params.language``.
408-
"""
409-
new_agent = self._clone()
410-
new_agent._interaction_language = _validate_interaction_language(language)
411-
return new_agent
412-
413397
def with_mllm(self, vendor: BaseMLLM) -> "Agent":
414398
# Note: avatars are not supported with MLLM. The combination is rejected
415399
# at ``to_properties`` / ``AgentSession.start`` so callers can still
@@ -705,10 +689,6 @@ def rtc(self) -> typing.Optional[RtcConfig]:
705689
def filler_words(self) -> typing.Optional[FillerWordsConfig]:
706690
return self._filler_words
707691

708-
@property
709-
def interaction_language(self) -> typing.Optional[InteractionLanguage]:
710-
return self._interaction_language
711-
712692
@property
713693
def config(self) -> typing.Dict[str, typing.Any]:
714694
return {
@@ -727,7 +707,6 @@ def config(self) -> typing.Dict[str, typing.Any]:
727707
"avatar": self._avatar,
728708
"advanced_features": self._advanced_features,
729709
"parameters": self._parameters,
730-
"interaction_language": self._interaction_language,
731710
"geofence": self._geofence,
732711
"labels": self._labels,
733712
"rtc": self._rtc,
@@ -909,6 +888,7 @@ def to_properties(
909888
return StartAgentsRequestProperties(**base_kwargs)
910889

911890
base_kwargs["asr"] = self._resolve_asr_config()
891+
base_kwargs["turn_detection"] = self._resolve_turn_detection_config()
912892

913893
if skip_vendor_validation:
914894
return StartAgentsRequestProperties(**base_kwargs)
@@ -940,13 +920,28 @@ def to_properties(
940920

941921
def _resolve_asr_config(self) -> typing.Dict[str, typing.Any]:
942922
asr_config = dict(self._stt or {})
943-
existing_language = asr_config.get("language")
944-
language = self._interaction_language
945-
if language is None:
946-
language = existing_language if _is_interaction_language(existing_language) else DEFAULT_INTERACTION_LANGUAGE
947-
asr_config["language"] = language
923+
asr_config.pop("language", None)
924+
if not asr_config:
925+
asr_config["vendor"] = "ares"
948926
return asr_config
949927

928+
def _resolve_turn_detection_config(self) -> TurnDetectionConfig:
929+
existing_stt_language = self._stt.get("language") if self._stt is not None else None
930+
existing_turn_detection_language = self._field_value(self._turn_detection, "language")
931+
language = (
932+
existing_turn_detection_language
933+
if existing_turn_detection_language is not None
934+
else existing_stt_language
935+
if _is_turn_detection_language(existing_stt_language)
936+
else DEFAULT_TURN_DETECTION_LANGUAGE
937+
)
938+
language = _validate_turn_detection_language(language)
939+
if self._turn_detection is None:
940+
return StartAgentsRequestPropertiesTurnDetection(language=language)
941+
if isinstance(self._turn_detection, dict):
942+
return typing.cast(TurnDetectionConfig, {**self._turn_detection, "language": language})
943+
return self._copy_model_update(self._turn_detection, {"language": language})
944+
950945
def _clone(self) -> "Agent":
951946
new_agent = Agent.__new__(Agent)
952947
new_agent._name = self._name
@@ -962,7 +957,6 @@ def _clone(self) -> "Agent":
962957
new_agent._sal = self._sal
963958
new_agent._advanced_features = self._advanced_features
964959
new_agent._parameters = self._parameters
965-
new_agent._interaction_language = self._interaction_language
966960
new_agent._instructions = self._instructions
967961
new_agent._greeting = self._greeting
968962
new_agent._failure_message = self._failure_message

0 commit comments

Comments
 (0)