Skip to content

Commit fdfa4ac

Browse files
fix(agentkit): source ASR language from turn detection
Optional longer body: Keep provider STT language settings inside asr.params, populate REST asr.language from turn_detection.language, and treat Ares as provider-only.
1 parent 1ca34d3 commit fdfa4ac

7 files changed

Lines changed: 47 additions & 127 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ pip install agora-agents
2020
## Quick Start
2121

2222
Start with the `Agent` builder: create a client with app credentials, choose your ASR, LLM, and TTS providers, then start a session. Omit vendor API keys for supported Agora-managed models, or provide keys when you want BYOK.
23-
Set Agora interaction language with `turn_detection.language`; provider-specific STT language values remain under `asr.params`.
23+
Set Agora interaction language with `turn_detection.language`; provider-specific STT language values remain under `asr.params`. Ares uses only the REST `asr.language` value sourced from `turn_detection.language`.
2424

2525
```python
2626
import os

changelog.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).
88

99
### Added
1010

11-
- **Turn detection language** — AgentKit now manages Agora interaction language through `turn_detection.language`, validates it against the supported BCP-47 language list, and sends the default `en` when no language is provided.
11+
- **Turn detection language** — AgentKit now manages Agora interaction language through `turn_detection.language`, validates it against the supported BCP-47 language list, and sends the default `en-US` when no language is provided.
1212
- **Provider parameter parity** — ASR, LLM, MLLM, TTS, and avatar wrappers expose typed provider parameters plus passthrough fields where the generated core supports additional properties.
1313

1414
### Changed
@@ -21,7 +21,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).
2121
### Fixed
2222

2323
- **Managed-provider validation** — AgentKit validation now distinguishes preset-backed providers from BYOK providers so required provider fields are only required when credentials are caller-supplied.
24-
- **Language placement** — Provider-specific STT language values remain under `asr.params`, while Agora interaction language is emitted separately as `turn_detection.language`.
24+
- **Language placement** — Provider-specific STT language values remain under `asr.params`; the REST `asr.language` field is populated from `turn_detection.language`.
2525

2626
## [v2.0.0] — 2026-05-21
2727

@@ -114,7 +114,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).
114114

115115
### Fixed
116116

117-
- **`AresSTT`** — Removed redundant `language` key from the `params` dict. Language is now emitted only at the top level. `params` is only included when `additional_params` is provided.
117+
- **`AresSTT`** — Removed redundant `language` key from the `params` dict. Ares only selects the provider; AgentKit populates REST `asr.language` from `turn_detection.language`. `params` is only included when `additional_params` is provided.
118118
- **`OpenAIRealtime` / `VertexAI` (MLLM)** — Agent-level `greeting` and `failure_message` defaults are now correctly applied when missing in MLLM mode. Previously these values were silently dropped.
119119
- **`VertexAI` (MLLM)**`messages` is emitted at the MLLM top level, matching the generated core SDK contract.
120120

docs/concepts/vendors.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ tts = ElevenLabsTTS(
7575

7676
Used with `agent.with_stt()`.
7777

78-
Use `turn_detection.language` for Agora interaction language; it defaults to `en`. STT vendor `language` options are serialized under `asr.params` using each provider's own format.
78+
Use `turn_detection.language` for Agora interaction language; it defaults to `en-US`. STT vendor `language` options are serialized under `asr.params` using each provider's own format. Ares does not take a provider language option; AgentKit uses `turn_detection.language` for REST `asr.language`.
7979

8080
| Class | Provider | Required Parameters |
8181
|---|---|---|

docs/reference/vendors.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -318,7 +318,7 @@ The SDK also includes named helpers for the remaining Agora-supported LLM provid
318318

319319
## STT Vendors
320320

321-
Use `turn_detection.language` for Agora interaction language; it defaults to `en`. Provider-specific language values remain under `asr.params` and may use a different format.
321+
Use `turn_detection.language` for Agora interaction language; it defaults to `en-US`. Provider-specific language values remain under `asr.params` and may use a different format. AgentKit populates REST `asr.language` from `turn_detection.language`.
322322

323323
### `SpeechmaticsSTT`
324324

@@ -396,7 +396,6 @@ For `nova-2` and `nova-3`, omit `api_key` to use Agora-managed credentials. For
396396

397397
| Parameter | Type | Required | Default | Description |
398398
|---|---|---|---|---|
399-
| `language` | `str` | No | `None` | Language code |
400399
| `additional_params` | `Dict[str, Any]` | No | `None` | Additional parameters |
401400

402401
### `SarvamSTT`

src/agora_agent/agentkit/agent.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -247,7 +247,7 @@ class SessionOptions(typing_extensions.TypedDict, total=False):
247247
"vi-VN",
248248
]
249249

250-
DEFAULT_TURN_DETECTION_LANGUAGE: TurnDetectionLanguage = "en"
250+
DEFAULT_TURN_DETECTION_LANGUAGE: TurnDetectionLanguage = "en-US"
251251
TURN_DETECTION_LANGUAGE_VALUES: typing.Tuple[TurnDetectionLanguage, ...] = (
252252
"ar-EG",
253253
"ar-JO",
@@ -921,9 +921,10 @@ def to_properties(
921921
allow_missing_llm = "llm" in allow_missing_categories
922922
allow_missing_tts = "tts" in allow_missing_categories
923923

924+
turn_detection_config = self._resolve_turn_detection_config()
924925
if not skip_asr_validation and (self._stt is not None or not allow_missing_asr):
925-
base_kwargs["asr"] = self._resolve_asr_config()
926-
base_kwargs["turn_detection"] = self._resolve_turn_detection_config()
926+
base_kwargs["asr"] = self._resolve_asr_config(turn_detection_config)
927+
base_kwargs["turn_detection"] = turn_detection_config
927928

928929
if skip_vendor_validation:
929930
return StartAgentsRequestProperties(**base_kwargs)
@@ -957,11 +958,11 @@ def _resolve_llm_config(self) -> typing.Dict[str, typing.Any]:
957958
llm_config["max_history"] = self._max_history
958959
return llm_config
959960

960-
def _resolve_asr_config(self) -> typing.Dict[str, typing.Any]:
961+
def _resolve_asr_config(self, turn_detection_config: TurnDetectionConfig) -> typing.Dict[str, typing.Any]:
961962
asr_config = dict(self._stt or {})
962-
asr_config.pop("language", None)
963963
if not asr_config:
964964
asr_config["vendor"] = "ares"
965+
asr_config["language"] = self._field_value(turn_detection_config, "language")
965966
return asr_config
966967

967968
def _resolve_turn_detection_config(self) -> TurnDetectionConfig:

src/agora_agent/agentkit/vendors/stt.py

Lines changed: 1 addition & 105 deletions
Original file line numberDiff line numberDiff line change
@@ -1,89 +1,12 @@
1-
from typing import Any, Dict, Optional, Tuple
1+
from typing import Any, Dict, Optional
22

33
from pydantic import BaseModel, ConfigDict, Field, model_validator
4-
from typing_extensions import Literal
54

65
from .base import BaseSTT
76

8-
TurnDetectionLanguage = Literal[
9-
"ar-EG",
10-
"ar-JO",
11-
"ar-SA",
12-
"ar-AE",
13-
"bn-IN",
14-
"zh-CN",
15-
"zh-HK",
16-
"zh-TW",
17-
"nl-NL",
18-
"en-IN",
19-
"en-US",
20-
"fil-PH",
21-
"fr-FR",
22-
"de-DE",
23-
"gu-IN",
24-
"he-IL",
25-
"hi-IN",
26-
"id-ID",
27-
"it-IT",
28-
"ja-JP",
29-
"kn-IN",
30-
"ko-KR",
31-
"ms-MY",
32-
"fa-IR",
33-
"pt-PT",
34-
"ru-RU",
35-
"es-ES",
36-
"ta-IN",
37-
"te-IN",
38-
"th-TH",
39-
"tr-TR",
40-
"vi-VN",
41-
]
42-
43-
TURN_DETECTION_LANGUAGE_VALUES: Tuple[TurnDetectionLanguage, ...] = (
44-
"ar-EG",
45-
"ar-JO",
46-
"ar-SA",
47-
"ar-AE",
48-
"bn-IN",
49-
"zh-CN",
50-
"zh-HK",
51-
"zh-TW",
52-
"nl-NL",
53-
"en-IN",
54-
"en-US",
55-
"fil-PH",
56-
"fr-FR",
57-
"de-DE",
58-
"gu-IN",
59-
"he-IL",
60-
"hi-IN",
61-
"id-ID",
62-
"it-IT",
63-
"ja-JP",
64-
"kn-IN",
65-
"ko-KR",
66-
"ms-MY",
67-
"fa-IR",
68-
"pt-PT",
69-
"ru-RU",
70-
"es-ES",
71-
"ta-IN",
72-
"te-IN",
73-
"th-TH",
74-
"tr-TR",
75-
"vi-VN",
76-
)
77-
_TURN_DETECTION_LANGUAGES = set(TURN_DETECTION_LANGUAGE_VALUES)
787
_DEEPGRAM_MANAGED_MODELS = {"nova-2", "nova-3"}
798

809

81-
def _turn_detection_language(language: Optional[str]) -> Optional[TurnDetectionLanguage]:
82-
if language in _TURN_DETECTION_LANGUAGES:
83-
return language # type: ignore[return-value]
84-
return None
85-
86-
8710
class SpeechmaticsSTTOptions(BaseModel):
8811
model_config = ConfigDict(extra="forbid")
8912

@@ -112,9 +35,6 @@ def to_config(self) -> Dict[str, Any]:
11235
"vendor": "speechmatics",
11336
"params": params,
11437
}
115-
turn_detection_language = _turn_detection_language(self.options.language)
116-
if turn_detection_language is not None:
117-
config["language"] = turn_detection_language
11838
return config
11939

12040

@@ -155,9 +75,6 @@ def to_config(self) -> Dict[str, Any]:
15575
"vendor": "deepgram",
15676
"params": params,
15777
}
158-
turn_detection_language = _turn_detection_language(self.options.language)
159-
if turn_detection_language is not None:
160-
config["language"] = turn_detection_language
16178
return config
16279

16380

@@ -186,9 +103,6 @@ def to_config(self) -> Dict[str, Any]:
186103
"vendor": "microsoft",
187104
"params": params,
188105
}
189-
turn_detection_language = _turn_detection_language(self.options.language)
190-
if turn_detection_language is not None:
191-
config["language"] = turn_detection_language
192106
return config
193107

194108

@@ -223,9 +137,6 @@ def to_config(self) -> Dict[str, Any]:
223137
"vendor": "openai",
224138
"params": params,
225139
}
226-
turn_detection_language = _turn_detection_language(self.options.language)
227-
if turn_detection_language is not None:
228-
config["language"] = turn_detection_language
229140
return config
230141

231142

@@ -260,9 +171,6 @@ def to_config(self) -> Dict[str, Any]:
260171
"vendor": "google",
261172
"params": params,
262173
}
263-
turn_detection_language = _turn_detection_language(self.options.language)
264-
if turn_detection_language is not None:
265-
config["language"] = turn_detection_language
266174
return config
267175

268176

@@ -293,9 +201,6 @@ def to_config(self) -> Dict[str, Any]:
293201
"vendor": "amazon",
294202
"params": params,
295203
}
296-
turn_detection_language = _turn_detection_language(self.options.language)
297-
if turn_detection_language is not None:
298-
config["language"] = turn_detection_language
299204
return config
300205

301206

@@ -323,16 +228,12 @@ def to_config(self) -> Dict[str, Any]:
323228
"vendor": "assemblyai",
324229
"params": params,
325230
}
326-
turn_detection_language = _turn_detection_language(self.options.language)
327-
if turn_detection_language is not None:
328-
config["language"] = turn_detection_language
329231
return config
330232

331233

332234
class AresSTTOptions(BaseModel):
333235
model_config = ConfigDict(extra="forbid")
334236

335-
language: Optional[TurnDetectionLanguage] = Field(default=None, description="Language code")
336237
additional_params: Optional[Dict[str, Any]] = Field(default=None)
337238

338239
class AresSTT(BaseSTT):
@@ -341,8 +242,6 @@ def __init__(self, **kwargs: Any):
341242

342243
def to_config(self) -> Dict[str, Any]:
343244
config: Dict[str, Any] = {"vendor": "ares"}
344-
if self.options.language is not None:
345-
config["language"] = self.options.language
346245
if self.options.additional_params:
347246
config["params"] = self.options.additional_params
348247
return config
@@ -373,7 +272,4 @@ def to_config(self) -> Dict[str, Any]:
373272
"vendor": "sarvam",
374273
"params": params,
375274
}
376-
turn_detection_language = _turn_detection_language(self.options.language)
377-
if turn_detection_language is not None:
378-
config["language"] = turn_detection_language
379275
return config

tests/custom/test_stt_language.py

Lines changed: 34 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -43,17 +43,17 @@ def test_bcp47_stt_language_stays_in_asr_params_and_defaults_turn_detection() ->
4343
props = properties(base_agent().with_stt(SpeechmaticsSTT(api_key="stt-key", language="en")))
4444

4545
assert props["asr"]["vendor"] == "speechmatics"
46-
assert "language" not in props["asr"]
47-
assert props["turn_detection"]["language"] == "en"
46+
assert props["asr"]["language"] == "en-US"
47+
assert props["turn_detection"]["language"] == "en-US"
4848
assert props["asr"]["params"]["language"] == "en"
4949

5050

51-
def test_provider_language_defaults_turn_detection_language_when_not_supported_by_ares() -> None:
51+
def test_provider_language_does_not_set_turn_detection_language() -> None:
5252
props = properties(base_agent().with_stt(SpeechmaticsSTT(api_key="stt-key", language="en")))
5353

5454
assert props["asr"]["vendor"] == "speechmatics"
55-
assert "language" not in props["asr"]
56-
assert props["turn_detection"]["language"] == "en"
55+
assert props["asr"]["language"] == "en-US"
56+
assert props["turn_detection"]["language"] == "en-US"
5757
assert props["asr"]["params"]["language"] == "en"
5858

5959

@@ -66,7 +66,7 @@ def test_turn_detection_language_can_differ_from_provider_language() -> None:
6666
)
6767

6868
assert props["turn_detection"]["language"] == "fr-FR"
69-
assert "language" not in props["asr"]
69+
assert props["asr"]["language"] == "fr-FR"
7070
assert props["asr"]["params"]["language"] == "en"
7171

7272

@@ -78,12 +78,14 @@ def test_invalid_turn_detection_language_is_rejected() -> None:
7878
def test_default_turn_detection_language_is_sent_without_stt() -> None:
7979
props = properties(base_agent())
8080

81-
assert props["asr"] == {"vendor": "ares"}
82-
assert props["turn_detection"] == {"language": "en"}
81+
assert props["asr"] == {"vendor": "ares", "language": "en-US"}
82+
assert props["turn_detection"] == {"language": "en-US"}
8383

8484

8585
def test_stt_vendor_params_match_documented_shapes() -> None:
86-
assert DeepgramSTT(model="nova-3", language="en-US").to_config()["params"] == {
86+
deepgram_managed = DeepgramSTT(model="nova-3", language="en-US").to_config()
87+
assert "language" not in deepgram_managed
88+
assert deepgram_managed["params"] == {
8789
"model": "nova-3",
8890
"language": "en-US",
8991
}
@@ -132,8 +134,30 @@ def test_stt_vendor_params_match_documented_shapes() -> None:
132134
"language_code": "en-US",
133135
}
134136

135-
assert AssemblyAISTT(api_key="assembly-key", language="en-US", uri="wss://example.test/ws").to_config()["params"] == {
137+
assemblyai_config = AssemblyAISTT(api_key="assembly-key", language="en-US", uri="wss://example.test/ws").to_config()
138+
assert "language" not in assemblyai_config
139+
assert assemblyai_config["params"] == {
136140
"api_key": "assembly-key",
137141
"language": "en-US",
138142
"uri": "wss://example.test/ws",
139143
}
144+
145+
146+
def test_assemblyai_params_stay_nested_and_asr_language_comes_from_turn_detection() -> None:
147+
props = properties(
148+
Agent(turn_detection=TurnDetectionConfig(language="fr-FR"))
149+
.with_llm(OpenAI(api_key="llm-key", model="gpt-4o-mini", base_url="https://api.openai.com/v1/chat/completions"))
150+
.with_tts(ElevenLabsTTS(key="tts-key", voice_id="voice", model_id="eleven_flash_v2_5", base_url="wss://api.elevenlabs.io/v1"))
151+
.with_stt(AssemblyAISTT(api_key="assembly-key", language="en-US", uri="wss://example.test/ws"))
152+
)
153+
154+
assert props["asr"] == {
155+
"vendor": "assemblyai",
156+
"language": "fr-FR",
157+
"params": {
158+
"api_key": "assembly-key",
159+
"language": "en-US",
160+
"uri": "wss://example.test/ws",
161+
},
162+
}
163+
assert props["turn_detection"] == {"language": "fr-FR"}

0 commit comments

Comments
 (0)