Skip to content

fix(voice): clear stale paused-speech state across generation steps (ports livekit/agents#5594)#1349

Open
toubatbrian wants to merge 1 commit intomainfrom
claude/quirky-galileo-xFWXo
Open

fix(voice): clear stale paused-speech state across generation steps (ports livekit/agents#5594)#1349
toubatbrian wants to merge 1 commit intomainfrom
claude/quirky-galileo-xFWXo

Conversation

@toubatbrian
Copy link
Copy Markdown
Contributor

Summary

Automated Claude Code port of Python PR livekit/agents#5594"fix: clear stale paused speech state across generation steps" (merged 2026-04-30).

cc @toubatbrian @livekit/agent-devs

This is a core runtime fix to AgentActivity's scheduling loop. The base resume_false_interruption feature it builds on was already ported in #1320, so this fix can be applied 1:1.

What the Python PR fixes

When a SpeechHandle advances through multiple generation steps within a single turn — for example a silent tool-call step (LLM produces only a tool call, no spoken preamble or audio) followed by the tool-reply step that speaks — the false-interruption _paused_speech state captured during the earlier silent step could leak into the next step.

Concretely, if user audio activity overlapped with the silent tool-call step, the activity captured _paused_speech = (handle, agent_state="thinking", timeout=…). Because the silent step never produces audio, the paused-state never gets cleared. When the tool reply starts on the same SpeechHandle, the leaked entry causes the wrong agent state to be restored on resume (e.g. "thinking" instead of "speaking") and the false-interruption timer to fire against stale state.

What this PR ports

In agents/src/voice/agent_activity.ts the scheduling loop already mirrors the Python _scheduling_task:

this._currentSpeech = speechHandle;
speechHandle._authorizeGeneration();
await speechHandle.waitIfNotInterrupted([speechHandle._waitForGeneration()]);
this._currentSpeech = undefined;

This PR adds the cleanup block right after _waitForGeneration() resolves and before _currentSpeech is cleared, mirroring the Python diff (lines 1365–1373):

if (this.pausedSpeech && this.pausedSpeech.handle === this._currentSpeech) {
  this.pausedSpeech = undefined;
  if (this.falseInterruptionTimer) {
    clearTimeout(this.falseInterruptionTimer);
    this.falseInterruptionTimer = undefined;
  }
  const audioOutput = this.agentSession.output.audio;
  if (audioOutput && audioOutput.canPause) {
    audioOutput.resume();
  }
}

Behavior parity with the Python fix:

  • Resets pausedSpeech so the next generation step on the same SpeechHandle records fresh paused state with the correct agentState.
  • Cancels the false-interruption timer (if any) so it can't fire against stale state from the previous step.
  • Calls resume() on the paused audio output if it supports canPause so any pause taken on the prior step is undone before the next step starts emitting audio.

Implementation nuances (Python ↔ TS mapping)

Python (agent_activity.py) TS (agent_activity.ts)
self._paused_speech this.pausedSpeech
self._false_interruption_timer this.falseInterruptionTimer
self._false_interruption_timer.cancel() clearTimeout(this.falseInterruptionTimer) (NodeJS timer handle)
self._session.output.audio this.agentSession.output.audio
audio_output.can_pause / audio_output.resume() audioOutput.canPause / audioOutput.resume()
self._current_speech is SpeechHandle | None this._currentSpeech is SpeechHandle | undefined (uses === instead of is)

A // Ref: python livekit-agents/livekit/agents/voice/agent_activity.py - 1365-1373 lines comment marks the cross-reference per CLAUDE.md porting guidelines.

Tests

The Python PR adds test_silent_tool_call_pause_state_does_not_leak_into_tool_reply (uses FakeActions, FakeAudioOutput(can_pause=True), an LLM step with empty content + a FunctionToolCall, then a follow-up LLM/TTS step). Equivalent JS test infrastructure is more limited — the JS FakeAudioOutput was just extended with canPause support in #1320, but there is no direct counterpart to Python's FakeActions builder for scripting the silent-tool-call → tool-reply timeline used by this regression test.

Rather than expand the test surface in this automated port, this PR ships the production fix only and relies on the existing agent_activity.test.ts suite (8/8 passing) to guard against regressions in the surrounding scheduling/pause logic. A follow-up can add a JS-side regression test once richer FakeActions-style scripting lands.

Verification

  • pnpm install --prefer-offline — completed.
  • pnpm --filter @livekit/agents build — passes.
  • pnpm --filter @livekit/agents lint — no new errors (pre-existing warnings only on unrelated files).
  • pnpm format:check — clean.
  • pnpm --filter @livekit/agents exec vitest run src/voice/agent_activity.test.ts — 8/8 passed.

Changeset

Adds .changeset/clear-paused-speech-leak.md as a patch change against @livekit/agents.

Provenance


Generated by Claude Code

Ports livekit/agents#5594. Resets pausedSpeech, the false-interruption
timer, and the paused audio output at the scheduling-loop boundary in
AgentActivity after each generation step finishes, so paused state
captured during an earlier silent step (e.g. a silent tool call) does
not leak into the next step on the same SpeechHandle (e.g. the tool
reply).

https://claude.ai/code/session_01Vc9BFUveAn3hMEfN3m1FNs
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 30, 2026

🦋 Changeset detected

Latest commit: 0d951ec

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 28 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants