Update starter to use new turn detection model (#85)

royalfig · claude · chenghao-mou · web-flow · commit 9bf440410184 · 2026-06-17T17:53:16.000+01:00
* Update starter to use new turn detection model

* rm redundant para.

* Untrack uv.lock and loosen livekit-agents version constraint

uv.lock should not be tracked in the template (enforced by
template-check.yml CI). Also change livekit-agents from ==1.6.0 to
&gt;=1.6.0 so the starter picks up newer compatible releases.

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;

* Update livekit-agents dependency to version 1.6.1

---------

Co-authored-by: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
Co-authored-by: Chenghao Mou &lt;chenghao.mou@livekit.io&gt;
diff --git a/README.md b/README.md
@@ -12,7 +12,7 @@ The starter project includes:
 - A voice AI pipeline built on [LiveKit Inference](https://docs.livekit.io/agents/models/inference)
   with [models](https://docs.livekit.io/agents/models) from OpenAI, Cartesia, and Deepgram. More than 50 other model providers are supported, including [Realtime models](https://docs.livekit.io/agents/models/realtime)
 - Eval suite based on the LiveKit Agents [testing & evaluation framework](https://docs.livekit.io/agents/start/testing/)
-- [LiveKit Turn Detector](https://docs.livekit.io/agents/logic/turns/turn-detector/) for contextually-aware speaker detection, with multilingual support
+- [LiveKit Turn Detector](https://docs.livekit.io/agents/logic/turns/turn-detector/), an end-of-turn model that listens to the user's audio directly, combining semantic understanding with acoustic cues for state-of-the-art accuracy across 14 languages
 - [Background voice cancellation](https://docs.livekit.io/transport/media/noise-cancellation/)
 - Deep session insights from LiveKit [Agent Observability](https://docs.livekit.io/deploy/observability/)
 - A Dockerfile ready for [production deployment to LiveKit Cloud](https://docs.livekit.io/deploy/agents/)
@@ -92,13 +92,7 @@ lk app env --write --destination .env.local
 
 ## Run the agent
 
-Before your first run, you must download certain models such as [Silero VAD](https://docs.livekit.io/agents/logic/turns/vad/) and the [LiveKit turn detector](https://docs.livekit.io/agents/logic/turns/turn-detector/):
-
-```console
-uv run python src/agent.py download-files
-```
-
-Next, run this command to speak to your agent directly in your terminal:
+Run this command to speak to your agent directly in your terminal:
 
 ```console
 uv run python src/agent.py console
diff --git a/pyproject.toml b/pyproject.toml
@@ -9,7 +9,7 @@ description = "Simple voice AI assistant built with LiveKit Agents for Python"
 requires-python = ">=3.10, <3.15"
 
 dependencies = [
-    "livekit-agents[silero,turn-detector]>=1.6.0",
+    "livekit-agents>=1.6.1",
     "livekit-plugins-ai-coustics~=0.2",
     "python-dotenv",
 ]
diff --git a/src/agent.py b/src/agent.py
@@ -7,13 +7,12 @@
     AgentServer,
     AgentSession,
     JobContext,
-    JobProcess,
+    TurnHandlingOptions,
     cli,
     inference,
     room_io,
 )
-from livekit.plugins import ai_coustics, silero
-from livekit.plugins.turn_detector.multilingual import MultilingualModel
+from livekit.plugins import ai_coustics
 
 logger = logging.getLogger("agent")
 
@@ -92,13 +91,6 @@ def __init__(self) -> None:
 server = AgentServer()
 
 
-def prewarm(proc: JobProcess):
-    proc.userdata["vad"] = silero.VAD.load()
-
-
-server.setup_fnc = prewarm
-
-
 @server.rtc_session(agent_name="my-agent")
 async def my_agent(ctx: JobContext):
     # Logging setup
@@ -117,10 +109,14 @@ async def my_agent(ctx: JobContext):
         tts=inference.TTS(
             model="cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"
         ),
-        # VAD and turn detection are used to determine when the user is speaking and when the agent should respond
+        # The LiveKit turn detector determines when the user is done speaking and the agent should respond.
+        # TurnDetector is an end-of-turn model that listens to the user's audio directly, combining
+        # semantic understanding with acoustic cues (intonation, pitch, rhythm) for state-of-the-art accuracy.
+        # AgentSession supplies the required VAD automatically.
         # See more at https://docs.livekit.io/agents/build/turns
-        turn_detection=MultilingualModel(),
-        vad=ctx.proc.userdata["vad"],
+        turn_handling=TurnHandlingOptions(
+            turn_detection=inference.TurnDetector(),
+        ),
         # allow the LLM to generate a response while waiting for the end of turn
         # See more at https://docs.livekit.io/agents/build/audio/#preemptive-generation
         preemptive_generation=True,
diff --git a/taskfile.yaml b/taskfile.yaml
@@ -43,7 +43,6 @@ tasks:
       - echo ''
       - echo '{{ indent .INDENT "cd" }} {{ .REL_PATH }}'
       - echo '{{ indent .INDENT "uv sync" }}'
-      - echo '{{ indent .INDENT "uv run" }} {{ .PYTHON_MAIN }} download-files'
       - echo '{{ indent .INDENT "uv run" }} {{ .PYTHON_MAIN }} console'
 
   help_open_web_console:
@@ -57,7 +56,6 @@ tasks:
       - echo ''
       - echo '{{ indent .INDENT "cd" }} {{ .REL_PATH }}'
       - echo '{{ indent .INDENT "uv sync" }}'
-      - echo '{{ indent .INDENT "uv run" }} {{ .PYTHON_MAIN }} download-files'
       - echo '{{ indent .INDENT "uv run" }} {{ .PYTHON_MAIN }} dev'
       - echo ''
       - echo 'Then visit:'

Original file line number	Diff line number	Diff line change
`@@ -9,7 +9,7 @@ description = "Simple voice AI assistant built with LiveKit Agents for Python"`
`9`	`9`	`requires-python = ">=3.10, <3.15"`
`10`	`10`
`11`	`11`	`dependencies = [`
`12`		`- "livekit-agents[silero,turn-detector]>=1.6.0",`
	`12`	`+ "livekit-agents>=1.6.1",`
`13`	`13`	`"livekit-plugins-ai-coustics~=0.2",`
`14`	`14`	`"python-dotenv",`
`15`	`15`	`]`