fix(voice): clear stale paused-speech state across generation steps (ports livekit/agents#5594)#1349
Open
toubatbrian wants to merge 1 commit intomainfrom
Open
fix(voice): clear stale paused-speech state across generation steps (ports livekit/agents#5594)#1349toubatbrian wants to merge 1 commit intomainfrom
toubatbrian wants to merge 1 commit intomainfrom
Conversation
Ports livekit/agents#5594. Resets pausedSpeech, the false-interruption timer, and the paused audio output at the scheduling-loop boundary in AgentActivity after each generation step finishes, so paused state captured during an earlier silent step (e.g. a silent tool call) does not leak into the next step on the same SpeechHandle (e.g. the tool reply). https://claude.ai/code/session_01Vc9BFUveAn3hMEfN3m1FNs
|
|
🦋 Changeset detectedLatest commit: 0d951ec The changes in this PR will be included in the next version bump. This PR includes changesets to release 28 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Automated Claude Code port of Python PR livekit/agents#5594 — "fix: clear stale paused speech state across generation steps" (merged 2026-04-30).
cc @toubatbrian @livekit/agent-devs
This is a core runtime fix to
AgentActivity's scheduling loop. The baseresume_false_interruptionfeature it builds on was already ported in #1320, so this fix can be applied 1:1.What the Python PR fixes
When a
SpeechHandleadvances through multiple generation steps within a single turn — for example a silent tool-call step (LLM produces only a tool call, no spoken preamble or audio) followed by the tool-reply step that speaks — the false-interruption_paused_speechstate captured during the earlier silent step could leak into the next step.Concretely, if user audio activity overlapped with the silent tool-call step, the activity captured
_paused_speech = (handle, agent_state="thinking", timeout=…). Because the silent step never produces audio, the paused-state never gets cleared. When the tool reply starts on the sameSpeechHandle, the leaked entry causes the wrong agent state to be restored on resume (e.g. "thinking" instead of "speaking") and the false-interruption timer to fire against stale state.What this PR ports
In
agents/src/voice/agent_activity.tsthe scheduling loop already mirrors the Python_scheduling_task:This PR adds the cleanup block right after
_waitForGeneration()resolves and before_currentSpeechis cleared, mirroring the Python diff (lines 1365–1373):Behavior parity with the Python fix:
pausedSpeechso the next generation step on the sameSpeechHandlerecords fresh paused state with the correctagentState.resume()on the paused audio output if it supportscanPauseso any pause taken on the prior step is undone before the next step starts emitting audio.Implementation nuances (Python ↔ TS mapping)
agent_activity.py)agent_activity.ts)self._paused_speechthis.pausedSpeechself._false_interruption_timerthis.falseInterruptionTimerself._false_interruption_timer.cancel()clearTimeout(this.falseInterruptionTimer)(NodeJS timer handle)self._session.output.audiothis.agentSession.output.audioaudio_output.can_pause/audio_output.resume()audioOutput.canPause/audioOutput.resume()self._current_speechisSpeechHandle | Nonethis._currentSpeechisSpeechHandle | undefined(uses===instead ofis)A
// Ref: python livekit-agents/livekit/agents/voice/agent_activity.py - 1365-1373 linescomment marks the cross-reference per CLAUDE.md porting guidelines.Tests
The Python PR adds
test_silent_tool_call_pause_state_does_not_leak_into_tool_reply(usesFakeActions,FakeAudioOutput(can_pause=True), an LLM step with emptycontent+ aFunctionToolCall, then a follow-up LLM/TTS step). Equivalent JS test infrastructure is more limited — the JSFakeAudioOutputwas just extended withcanPausesupport in #1320, but there is no direct counterpart to Python'sFakeActionsbuilder for scripting the silent-tool-call → tool-reply timeline used by this regression test.Rather than expand the test surface in this automated port, this PR ships the production fix only and relies on the existing
agent_activity.test.tssuite (8/8 passing) to guard against regressions in the surrounding scheduling/pause logic. A follow-up can add a JS-side regression test once richerFakeActions-style scripting lands.Verification
pnpm install --prefer-offline— completed.pnpm --filter @livekit/agents build— passes.pnpm --filter @livekit/agents lint— no new errors (pre-existing warnings only on unrelated files).pnpm format:check— clean.pnpm --filter @livekit/agents exec vitest run src/voice/agent_activity.test.ts— 8/8 passed.Changeset
Adds
.changeset/clear-paused-speech-leak.mdas a patch change against@livekit/agents.Provenance
feat: port resume false interruption feature from Python to JS).Generated by Claude Code