feat(agents): expose interrupted speech data on SpeechHandle#1328
Open
pouya-commos wants to merge 5 commits intolivekit:mainfrom
Open
feat(agents): expose interrupted speech data on SpeechHandle#1328pouya-commos wants to merge 5 commits intolivekit:mainfrom
pouya-commos wants to merge 5 commits intolivekit:mainfrom
Conversation
Add `generatedText` and `spokenText` getters on `SpeechHandle` so that application code can detect what the agent intended to say versus what the listener actually heard when a speech is interrupted. This enables recovery flows (e.g. re-stating an undelivered tool result on the next turn) that were previously impossible because `chatItems` only retained the spoken portion and the full generated text was discarded. Both the traditional STT->LLM->TTS pipeline and the realtime pipeline populate the new fields at the same point where `forwardedText` is finalized, so they remain `null` for non-interrupted speeches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 771aeac The changes in this PR will be included in the next version bump. This PR includes changesets to release 26 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
…utput
If a realtime speech is interrupted before any message output is produced,
the existing call to _setInterruptionData was skipped because it lived
inside `if (messageOutputs.length > 0)`. That left `generatedText` and
`spokenText` as null on an interrupted SpeechHandle, contradicting the
JSDoc contract on `spokenText` ("Null if not interrupted") and breaking
the parity with the traditional pipeline path, which always populates
the fields when interrupted.
Add an else branch that calls _setInterruptionData('', '') so the
contract holds whenever speechHandle.interrupted is true.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…anch Lift the realtime-path call to a single unconditional invocation after the messageOutputs check by tracking generatedText / forwardedText in the outer scope. Behavior is unchanged: when messageOutputs is empty both values are still '', so the call is equivalent to the prior explicit else branch — but there is now only one call site to keep in sync if this block evolves. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The JSDoc on `generatedText` documents `null` for "interrupted but no text was generated", but the field was being assigned the empty string since all call sites already pass `textOut?.text || ''` (and the no-message-output branch passes literal `''`). That violated the contract — `handle.generatedText !== null` returned a false positive whenever the model produced no text before interruption. Coerce empty strings to null inside `_setInterruptionData` for both fields so truthy / nullish checks behave consistently. Consumers that need to disambiguate "not interrupted" from "interrupted with nothing heard" can still check `handle.interrupted`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pipeline path was passing `textOut?.text` (the post-tee
text-forwarded transcript) as `generatedText`. That can lag the actual
LLM output at interruption time — the LLM task can append to the raw
accumulator in the same loop iteration that the abort fires, before the
downstream forwarder reads the chunk — and `textOut` may even be null.
Use `llmGenData.generatedText` instead so consumers receive the full
raw model output regardless of forwarding state.
Also align the `spokenText` JSDoc with `generatedText`: since
`_setInterruptionData` coerces empty strings to null, the doc now
reflects the actual contract ("Null if not interrupted or nothing was
spoken"). Callers needing to disambiguate can still check
`handle.interrupted`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Expose two new getters on
SpeechHandleso application code can recover from agent interruptions:generatedText— full text the model produced before the speech was interrupted (nullif not interrupted or no text was generated).spokenText— text that was actually heard by the listener before interruption (nullif not interrupted).Motivation
When an agent is interrupted mid-response (especially while delivering a tool result), the caller may have heard nothing or only a fragment. Today the SDK correctly truncates the server-side conversation, but
SpeechHandle.chatItemsonly retains the spoken portion — the full generated text is discarded, so application code has no way to detect what was lost and re-state it on the next turn.This data already exists internally in
agent_activity.tsat the moment of interruption (textOut.text= full model output,playbackEv.synchronizedTranscript= what was spoken). This PR just plumbs it onto the handle.Changes
agents/src/voice/speech_handle.ts— add_generatedText/_spokenTextprivate fields,generatedText/spokenTextpublic getters, and an internal_setInterruptionData(generatedText, spokenText)setter (marked@internal, matching_markDone,_itemAdded, etc.).agents/src/voice/agent_activity.ts— populate the fields in both_pipelineReplyTaskImpl(traditional STT→LLM→TTS) and_realtimeGenerationTaskImpl(realtime / multimodal) at the point whereforwardedTextis finalized, so non-interrupted speeches incur no overhead and both pipelines behave the same.Usage
Test plan
pnpm buildsucceedspnpm api:checksucceeds (orpnpm api:updateis run if the public API report needs regenerating)pnpm testpasses existing voice/agent_activity testsexamples/src/basic_agent.ts), interrupt it mid-utterance, and verifygeneratedText/spokenTextreflect the model's full output and the truncated heard portion respectivelyopenai/realtime) populate the fields🤖 Generated with Claude Code