Skip to content

Commit c24ea70

Browse files
authored
fix(inworld-tts): don't poison receive stream with stale-context errors (#539)
Two related issues caused TTS calls to fail with `Context not found during sendText` after >60 seconds of inactivity, leaving the agent unable to speak its proactive responses. 1. **`_keepalive_loop` sends `send_text` without a `contextId`.** When `_active_context_id` is `None`, the keepalive payload still contains `{"send_text": {"text": ""}}` with no `contextId`. The server cannot route this to a context and responds with an error message. The error then sits in the WebSocket receive buffer and surfaces on the next valid TTS call, breaking it. Fix: skip the keepalive iteration when no active context exists. The websockets library handles TCP-level keepalive via PING/PONG independently. 2. **`_receive_audio` did not filter errors by `contextId`.** Audio chunks were filtered by `msg_context_id != context_id`, but the error/status check ran first and raised regardless of which context the message was for. Fix: pull the `contextId` mismatch check above the status/error checks so messages addressed to a different (or stale) context are dropped early. Server-wide errors with no `contextId` (e.g. "max contexts limit reached") still pass through.
1 parent 57d0923 commit c24ea70

1 file changed

Lines changed: 22 additions & 8 deletions

File tree

  • plugins/inworld/vision_agents/plugins/inworld

plugins/inworld/vision_agents/plugins/inworld/tts.py

Lines changed: 22 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -208,8 +208,18 @@ async def _receive_audio(
208208
continue
209209

210210
result = data.get("result", {})
211+
msg_context_id = result.get("contextId") or result.get("context_id")
211212
status = result.get("status", {})
212-
if status.get("code", 0) != 0:
213+
status_code = status.get("code", 0)
214+
215+
# Drop messages addressed to a different context: they belong
216+
# to a stale or already-closed call (or a keepalive whose
217+
# context the server doesn't know about). Server-wide errors
218+
# with no contextId still pass through.
219+
if msg_context_id and msg_context_id != context_id:
220+
continue
221+
222+
if status_code != 0:
213223
error_message = status.get("message", "Unknown Inworld error")
214224
if "max contexts limit reached" in error_message.lower():
215225
logger.warning(
@@ -221,10 +231,6 @@ async def _receive_audio(
221231
if "error" in data:
222232
raise RuntimeError(f"Inworld TTS websocket error: {data['error']}")
223233

224-
msg_context_id = result.get("contextId") or result.get("context_id")
225-
if msg_context_id and msg_context_id != context_id:
226-
continue
227-
228234
audio_chunk = result.get("audioChunk", {})
229235
audio_b64 = audio_chunk.get("audioContent")
230236
if audio_b64:
@@ -362,9 +368,17 @@ async def _keepalive_loop(self) -> None:
362368
if self._websocket is websocket:
363369
self._websocket = None
364370
return
365-
payload: dict[str, object] = {"send_text": {"text": ""}}
366-
if self._active_context_id:
367-
payload["contextId"] = self._active_context_id
371+
# Without an active context the server has nothing to attach a
372+
# `send_text` to and responds with "Context not found", which then
373+
# corrupts the next valid TTS call's receive stream. The websockets
374+
# library handles TCP-level keepalive via PING/PONG on its own, so
375+
# skipping iterations here is safe.
376+
if self._active_context_id is None:
377+
continue
378+
payload: dict[str, object] = {
379+
"send_text": {"text": ""},
380+
"contextId": self._active_context_id,
381+
}
368382
try:
369383
await websocket.send(json.dumps(payload))
370384
except (websockets.exceptions.WebSocketException, OSError):

0 commit comments

Comments
 (0)