Skip to content

feat(agents): expose interrupted speech data on SpeechHandle#1328

Open
pouya-commos wants to merge 5 commits intolivekit:mainfrom
pouya-commos:feat/expose-interrupted-speech-data
Open

feat(agents): expose interrupted speech data on SpeechHandle#1328
pouya-commos wants to merge 5 commits intolivekit:mainfrom
pouya-commos:feat/expose-interrupted-speech-data

Conversation

@pouya-commos
Copy link
Copy Markdown

Summary

Expose two new getters on SpeechHandle so application code can recover from agent interruptions:

  • generatedText — full text the model produced before the speech was interrupted (null if not interrupted or no text was generated).
  • spokenText — text that was actually heard by the listener before interruption (null if not interrupted).

Motivation

When an agent is interrupted mid-response (especially while delivering a tool result), the caller may have heard nothing or only a fragment. Today the SDK correctly truncates the server-side conversation, but SpeechHandle.chatItems only retains the spoken portion — the full generated text is discarded, so application code has no way to detect what was lost and re-state it on the next turn.

This data already exists internally in agent_activity.ts at the moment of interruption (textOut.text = full model output, playbackEv.synchronizedTranscript = what was spoken). This PR just plumbs it onto the handle.

Changes

  • agents/src/voice/speech_handle.ts — add _generatedText / _spokenText private fields, generatedText / spokenText public getters, and an internal _setInterruptionData(generatedText, spokenText) setter (marked @internal, matching _markDone, _itemAdded, etc.).
  • agents/src/voice/agent_activity.ts — populate the fields in both _pipelineReplyTaskImpl (traditional STT→LLM→TTS) and _realtimeGenerationTaskImpl (realtime / multimodal) at the point where forwardedText is finalized, so non-interrupted speeches incur no overhead and both pipelines behave the same.

Usage

session.on(AgentSessionEventTypes.SpeechCreated, (ev) => {
  ev.speechHandle.addDoneCallback(() => {
    if (ev.speechHandle.interrupted) {
      const generated = ev.speechHandle.generatedText; // "Yes, we rent skid steers. We have mini, medium, and large units."
      const spoken = ev.speechHandle.spokenText;       // "Yes," (or "" if nothing was heard)
      // Application can now implement recovery logic
    }
  });
});

Test plan

  • pnpm build succeeds
  • pnpm api:check succeeds (or pnpm api:update is run if the public API report needs regenerating)
  • pnpm test passes existing voice/agent_activity tests
  • Manual: run an agent (e.g. examples/src/basic_agent.ts), interrupt it mid-utterance, and verify generatedText / spokenText reflect the model's full output and the truncated heard portion respectively
  • Verify both traditional pipeline and a realtime plugin (e.g. openai/realtime) populate the fields

🤖 Generated with Claude Code

Add `generatedText` and `spokenText` getters on `SpeechHandle` so that
application code can detect what the agent intended to say versus what
the listener actually heard when a speech is interrupted. This enables
recovery flows (e.g. re-stating an undelivered tool result on the next
turn) that were previously impossible because `chatItems` only retained
the spoken portion and the full generated text was discarded.

Both the traditional STT->LLM->TTS pipeline and the realtime pipeline
populate the new fields at the same point where `forwardedText` is
finalized, so they remain `null` for non-interrupted speeches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 28, 2026

🦋 Changeset detected

Latest commit: 771aeac

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 26 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 28, 2026

CLA assistant check
All committers have signed the CLA.

devin-ai-integration[bot]

This comment was marked as resolved.

pouya-commos and others added 2 commits April 28, 2026 10:17
…utput

If a realtime speech is interrupted before any message output is produced,
the existing call to _setInterruptionData was skipped because it lived
inside `if (messageOutputs.length > 0)`. That left `generatedText` and
`spokenText` as null on an interrupted SpeechHandle, contradicting the
JSDoc contract on `spokenText` ("Null if not interrupted") and breaking
the parity with the traditional pipeline path, which always populates
the fields when interrupted.

Add an else branch that calls _setInterruptionData('', '') so the
contract holds whenever speechHandle.interrupted is true.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…anch

Lift the realtime-path call to a single unconditional invocation after
the messageOutputs check by tracking generatedText / forwardedText in
the outer scope. Behavior is unchanged: when messageOutputs is empty
both values are still '', so the call is equivalent to the prior
explicit else branch — but there is now only one call site to keep in
sync if this block evolves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

The JSDoc on `generatedText` documents `null` for "interrupted but no
text was generated", but the field was being assigned the empty string
since all call sites already pass `textOut?.text || ''` (and the
no-message-output branch passes literal `''`). That violated the
contract — `handle.generatedText !== null` returned a false positive
whenever the model produced no text before interruption.

Coerce empty strings to null inside `_setInterruptionData` for both
fields so truthy / nullish checks behave consistently. Consumers that
need to disambiguate "not interrupted" from "interrupted with nothing
heard" can still check `handle.interrupted`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 7 additional findings in Devin Review.

Open in Devin Review

Comment thread agents/src/voice/agent_activity.ts Outdated
Comment thread agents/src/voice/speech_handle.ts
The pipeline path was passing `textOut?.text` (the post-tee
text-forwarded transcript) as `generatedText`. That can lag the actual
LLM output at interruption time — the LLM task can append to the raw
accumulator in the same loop iteration that the abort fires, before the
downstream forwarder reads the chunk — and `textOut` may even be null.
Use `llmGenData.generatedText` instead so consumers receive the full
raw model output regardless of forwarding state.

Also align the `spokenText` JSDoc with `generatedText`: since
`_setInterruptionData` coerces empty strings to null, the doc now
reflects the actual contract ("Null if not interrupted or nothing was
spoken"). Callers needing to disambiguate can still check
`handle.interrupted`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants