feat(voice): expose item_id on UserInputTranscribedEvent (closes #6109)#6127
feat(voice): expose item_id on UserInputTranscribedEvent (closes #6109)#6127tsushanth wants to merge 1 commit into
Conversation
β¦kit#6109) `AgentSession`'s `user_input_transcribed` event fires once per interim transcript on realtime models (every streamed delta produces a new `is_final=False` event), so a single user utterance produces many events. The internal `llm.InputTranscriptionCompleted` already carries an `item_id` that uniquely identifies the utterance, but when `AgentActivity._on_input_audio_transcription_completed` re-emits upward as `UserInputTranscribedEvent`, the id is dropped: self._session._user_input_transcribed( UserInputTranscribedEvent(transcript=ev.transcript, is_final=ev.is_final) ) # ev.item_id is dropped Consequence today: consumers that need to react exactly once per utterance β e.g. notify the frontend "user speech received" so it renders a placeholder before the agent responds β must either keep manual `last_item_id` state the event can't actually provide, or bypass the provider-agnostic event entirely and read item_id from the raw provider stream (e.g. `openai_server_event_received`). The raw-event escape hatch isn't portable β Gemini's realtime plugin doesn't emit `openai_server_event_received` at all, so the same consumer code can't work across realtime backends. This commit adds `item_id: str | None = None` to `UserInputTranscribedEvent` and threads it through the realtime emission site. STT paths leave it at the default `None` because there's no corresponding upstream item id there. Pydantic's optional default keeps the field fully backwards-compatible: existing event subscribers reading `transcript` / `is_final` / `language` / `speaker_id` / `created_at` see no change. - livekit-agents/livekit/agents/voice/events.py: add `item_id` field with a docstring documenting the realtime-stable / STT-none semantics - livekit-agents/livekit/agents/voice/agent_activity.py: thread `ev.item_id` through `_on_input_audio_transcription_completed` - tests/test_user_input_transcribed_event.py: new unit test module pinning (1) the field round-trips on the schema, (2) it defaults to None on omission, (3) it survives `model_dump` (relevant for the cross-process transport in `test_session_host.py`), and (4) the realtime-path data flow can thread `InputTranscriptionCompleted.item_id` through without modification. All four fail on unpatched source because Pydantic rejects the unknown `item_id` kwarg.
| UserInputTranscribedEvent( | ||
| transcript=ev.transcript, is_final=ev.is_final, item_id=ev.item_id | ||
| ) | ||
| ) |
There was a problem hiding this comment.
π© Remote session transport does not forward item_id
The _on_user_input_transcribed handler in livekit-agents/livekit/agents/voice/remote_session.py:466-474 constructs the protobuf UserInputTranscribed message with only transcript and is_final β the new item_id field is not forwarded. This means remote session consumers won't receive the item_id for dedup purposes. This is not a regression (the protobuf schema would need a separate update), but it limits the utility of the feature for remote/distributed deployments.
Was this helpful? React with π or π to provide feedback.
|
Thanks @devin-ai-integration β good catch and you're right that this is a separate concern from the local-event change. As you noted, propagating Happy to:
Let me know which fits the release plan better. |
Why
Closes #6109.
`AgentSession`'s `user_input_transcribed` event fires once per interim transcript on realtime models β every streamed delta produces a new `is_final=False` event, so a single user utterance produces many events with no stable correlation key. The internal `llm.InputTranscriptionCompleted` already carries an `item_id` that uniquely identifies the utterance, but when `AgentActivity._on_input_audio_transcription_completed` re-emits it upward as `UserInputTranscribedEvent`, the id is dropped on the floor:
```python
livekit-agents/livekit/agents/voice/agent_activity.py β before
def _on_input_audio_transcription_completed(self, ev: llm.InputTranscriptionCompleted) -> None:
self._session._user_input_transcribed(
UserInputTranscribedEvent(transcript=ev.transcript, is_final=ev.is_final)
) # ev.item_id is dropped
```
Consequence: consumers that need per-utterance dedup β e.g. "notify the frontend 'user speech received' so it renders a placeholder exactly once" β must either keep manual `last_item_id` state the event can't actually provide, or bypass the provider-agnostic event entirely and read `item_id` from `openai_server_event_received`. That escape hatch isn't portable: Gemini's realtime plugin doesn't emit `openai_server_event_received` at all.
Fix
Add `item_id: str | None = None` to `UserInputTranscribedEvent` and thread it through the realtime emission site:
```python
livekit-agents/livekit/agents/voice/events.py β added field
class UserInputTranscribedEvent(BaseModel):
...
item_id: str | None = None
"""Stable id identifying the user utterance this transcript belongs to. On
realtime models, every interim and final UserInputTranscribedEvent for a
single utterance shares the same item_id, so consumers can dedup interim
transcripts and react exactly once per utterance using the provider-agnostic
event surface. None on STT paths where no upstream item id exists."""
```
```python
livekit-agents/livekit/agents/voice/agent_activity.py β threaded through
def _on_input_audio_transcription_completed(self, ev: llm.InputTranscriptionCompleted) -> None:
self._session._user_input_transcribed(
UserInputTranscribedEvent(
transcript=ev.transcript, is_final=ev.is_final, item_id=ev.item_id
)
)
```
STT paths (`on_interim_transcript` / `on_final_transcript` in the same file) leave `item_id` at the default `None` because the STT layer has no corresponding upstream id concept. Existing subscribers reading `transcript` / `is_final` / `language` / `speaker_id` / `created_at` see no behavioral change.
Test
New `tests/test_user_input_transcribed_event.py` (4 unit tests):
All four fail on unpatched source because Pydantic rejects the unknown `item_id` kwarg.
```
$ uv run pytest tests/test_user_input_transcribed_event.py --unit -v
PASSED tests/test_user_input_transcribed_event.py::test_user_input_transcribed_event_carries_item_id
PASSED tests/test_user_input_transcribed_event.py::test_user_input_transcribed_event_item_id_defaults_to_none
PASSED tests/test_user_input_transcribed_event.py::test_user_input_transcribed_event_serialises_item_id
PASSED tests/test_user_input_transcribed_event.py::test_input_transcription_completed_item_id_can_thread_to_event
4 passed in 0.05s
```
Backwards compatibility
Strict additive change. `item_id` is optional and defaults to `None`, so: