Skip to content

Commit 9bf4404

Browse files
royalfigclaudechenghao-mou
authored
Update starter to use new turn detection model (#85)
* Update starter to use new turn detection model * rm redundant para. * Untrack uv.lock and loosen livekit-agents version constraint uv.lock should not be tracked in the template (enforced by template-check.yml CI). Also change livekit-agents from ==1.6.0 to >=1.6.0 so the starter picks up newer compatible releases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Update livekit-agents dependency to version 1.6.1 --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Chenghao Mou <chenghao.mou@livekit.io>
1 parent a7575e8 commit 9bf4404

4 files changed

Lines changed: 12 additions & 24 deletions

File tree

README.md

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ The starter project includes:
1212
- A voice AI pipeline built on [LiveKit Inference](https://docs.livekit.io/agents/models/inference)
1313
with [models](https://docs.livekit.io/agents/models) from OpenAI, Cartesia, and Deepgram. More than 50 other model providers are supported, including [Realtime models](https://docs.livekit.io/agents/models/realtime)
1414
- Eval suite based on the LiveKit Agents [testing & evaluation framework](https://docs.livekit.io/agents/start/testing/)
15-
- [LiveKit Turn Detector](https://docs.livekit.io/agents/logic/turns/turn-detector/) for contextually-aware speaker detection, with multilingual support
15+
- [LiveKit Turn Detector](https://docs.livekit.io/agents/logic/turns/turn-detector/), an end-of-turn model that listens to the user's audio directly, combining semantic understanding with acoustic cues for state-of-the-art accuracy across 14 languages
1616
- [Background voice cancellation](https://docs.livekit.io/transport/media/noise-cancellation/)
1717
- Deep session insights from LiveKit [Agent Observability](https://docs.livekit.io/deploy/observability/)
1818
- A Dockerfile ready for [production deployment to LiveKit Cloud](https://docs.livekit.io/deploy/agents/)
@@ -92,13 +92,7 @@ lk app env --write --destination .env.local
9292

9393
## Run the agent
9494

95-
Before your first run, you must download certain models such as [Silero VAD](https://docs.livekit.io/agents/logic/turns/vad/) and the [LiveKit turn detector](https://docs.livekit.io/agents/logic/turns/turn-detector/):
96-
97-
```console
98-
uv run python src/agent.py download-files
99-
```
100-
101-
Next, run this command to speak to your agent directly in your terminal:
95+
Run this command to speak to your agent directly in your terminal:
10296

10397
```console
10498
uv run python src/agent.py console

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ description = "Simple voice AI assistant built with LiveKit Agents for Python"
99
requires-python = ">=3.10, <3.15"
1010

1111
dependencies = [
12-
"livekit-agents[silero,turn-detector]>=1.6.0",
12+
"livekit-agents>=1.6.1",
1313
"livekit-plugins-ai-coustics~=0.2",
1414
"python-dotenv",
1515
]

src/agent.py

Lines changed: 9 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,12 @@
77
AgentServer,
88
AgentSession,
99
JobContext,
10-
JobProcess,
10+
TurnHandlingOptions,
1111
cli,
1212
inference,
1313
room_io,
1414
)
15-
from livekit.plugins import ai_coustics, silero
16-
from livekit.plugins.turn_detector.multilingual import MultilingualModel
15+
from livekit.plugins import ai_coustics
1716

1817
logger = logging.getLogger("agent")
1918

@@ -92,13 +91,6 @@ def __init__(self) -> None:
9291
server = AgentServer()
9392

9493

95-
def prewarm(proc: JobProcess):
96-
proc.userdata["vad"] = silero.VAD.load()
97-
98-
99-
server.setup_fnc = prewarm
100-
101-
10294
@server.rtc_session(agent_name="my-agent")
10395
async def my_agent(ctx: JobContext):
10496
# Logging setup
@@ -117,10 +109,14 @@ async def my_agent(ctx: JobContext):
117109
tts=inference.TTS(
118110
model="cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"
119111
),
120-
# VAD and turn detection are used to determine when the user is speaking and when the agent should respond
112+
# The LiveKit turn detector determines when the user is done speaking and the agent should respond.
113+
# TurnDetector is an end-of-turn model that listens to the user's audio directly, combining
114+
# semantic understanding with acoustic cues (intonation, pitch, rhythm) for state-of-the-art accuracy.
115+
# AgentSession supplies the required VAD automatically.
121116
# See more at https://docs.livekit.io/agents/build/turns
122-
turn_detection=MultilingualModel(),
123-
vad=ctx.proc.userdata["vad"],
117+
turn_handling=TurnHandlingOptions(
118+
turn_detection=inference.TurnDetector(),
119+
),
124120
# allow the LLM to generate a response while waiting for the end of turn
125121
# See more at https://docs.livekit.io/agents/build/audio/#preemptive-generation
126122
preemptive_generation=True,

taskfile.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,6 @@ tasks:
4343
- echo ''
4444
- echo '{{ indent .INDENT "cd" }} {{ .REL_PATH }}'
4545
- echo '{{ indent .INDENT "uv sync" }}'
46-
- echo '{{ indent .INDENT "uv run" }} {{ .PYTHON_MAIN }} download-files'
4746
- echo '{{ indent .INDENT "uv run" }} {{ .PYTHON_MAIN }} console'
4847

4948
help_open_web_console:
@@ -57,7 +56,6 @@ tasks:
5756
- echo ''
5857
- echo '{{ indent .INDENT "cd" }} {{ .REL_PATH }}'
5958
- echo '{{ indent .INDENT "uv sync" }}'
60-
- echo '{{ indent .INDENT "uv run" }} {{ .PYTHON_MAIN }} download-files'
6159
- echo '{{ indent .INDENT "uv run" }} {{ .PYTHON_MAIN }} dev'
6260
- echo ''
6361
- echo 'Then visit:'

0 commit comments

Comments
 (0)