Skip to content

fix(google): surface context exhaustion errors#1841

Open
rosetta-livekit-bot[bot] wants to merge 2 commits into
mainfrom
glens-paragon-hippies
Open

fix(google): surface context exhaustion errors#1841
rosetta-livekit-bot[bot] wants to merge 2 commits into
mainfrom
glens-paragon-hippies

Conversation

@rosetta-livekit-bot

@rosetta-livekit-bot rosetta-livekit-bot Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Surface Gemini Live close code 1007 as an unrecoverable context exhaustion error.
  • Propagate realtime callback/task errors through the session main loop so retry handling can classify them.
  • Add a Google plugin patch changeset.

Testing

  • pnpm exec prettier --check plugins/google/src/realtime/realtime_api.ts .changeset/google-context-exhaustion.md
  • pnpm build:agents
  • pnpm --filter @livekit/agents-plugins-test build
  • pnpm --filter @livekit/agents-plugin-silero build
  • pnpm --filter @livekit/agents-plugin-openai build:types
  • pnpm --filter @livekit/agents-plugin-google build:types
  • pnpm --filter @livekit/agents-plugin-google lint (warnings only: pre-existing no-explicit-any warnings in realtime logging helpers)

Ported from livekit/agents#6144

Original PR description

No description.

@changeset-bot

changeset-bot Bot commented Jun 19, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: bf0905d

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 35 packages
Name Type
@livekit/agents-plugin-google Patch
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-did Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-fishaudio Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-hume Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-perplexity Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugin-soniox Patch
@livekit/agents-plugin-tavus Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch
@livekit/agents-plugins-test Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

Open in Devin Review

@devin-ai-integration devin-ai-integration Bot Jun 19, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Stale sessionError not cleared between retry iterations can terminate a healthy session

The new sessionError field is set by onclose, sendTask, and onReceiveMessage callbacks, and is intended to be consumed (and cleared) at plugins/google/src/realtime/realtime_api.ts:1097-1101. However, if cancelAndWait at line 1095 throws (e.g., due to a 2-second timeout), execution jumps directly to the catch block, skipping the sessionError check. The stale sessionError is never cleared — neither by closeActiveSession() (plugins/google/src/realtime/realtime_api.ts:523-544) nor at the top of the next while iteration (line 993-997). On the next iteration, after a potentially successful new session, the stale sessionError is discovered at line 1097, thrown, and processed. If the stale error was a 1007 context-exhaustion error, isContextExhaustedError would return true at line 1110, causing the perfectly healthy new session to be terminated.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


// WebSocket close codes (RFC 6455)
const WS_CLOSE_NORMAL = 1000;
const WS_CLOSE_CONTEXT_EXHAUSTED = 1007;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 WebSocket code 1007 is RFC 6455 'Invalid frame payload data', not a standard context exhaustion code

The constant WS_CLOSE_CONTEXT_EXHAUSTED = 1007 is placed under the comment // WebSocket close codes (RFC 6455). However, RFC 6455 defines code 1007 as 'Invalid frame payload data' (non-UTF-8 data in a text frame). Google's use of 1007 for context exhaustion is a non-standard application-specific meaning. The naming and comment could mislead future maintainers into thinking this is a standard code. Consider adding a note like // Google-specific: Gemini uses 1007 for context exhaustion (RFC 6455 defines 1007 as 'Invalid frame payload data').

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@rosetta-livekit-bot rosetta-livekit-bot Bot changed the base branch from 1.5.0 to main June 23, 2026 17:44
@rosetta-livekit-bot rosetta-livekit-bot Bot force-pushed the glens-paragon-hippies branch from 502a2b8 to b8d5af0 Compare June 23, 2026 17:45

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

Open in Devin Review

Comment on lines +566 to +574
private isContextExhaustedError(error: unknown): boolean {
return (
(typeof error === 'object' &&
error !== null &&
'statusCode' in error &&
error.statusCode === WS_CLOSE_CONTEXT_EXHAUSTED) ||
String(error).includes(String(WS_CLOSE_CONTEXT_EXHAUSTED))
);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Overly broad string-based check in isContextExhaustedError can cause false-positive session termination

The fallback branch String(error).includes(String(WS_CLOSE_CONTEXT_EXHAUSTED)) expands to String(error).includes('1007'), which matches ANY error whose string representation contains the substring "1007" — not just context exhaustion errors. This is problematic because sessionError can be set from sendTask (line 1234) or onReceiveMessage (line 1348), where the error could be an arbitrary SDK/network error. If such an error's message or stringified form incidentally contains "1007" (e.g., token counts like 10071, request IDs, byte counts like "frame size 10070"), the session will be incorrectly terminated as "context exhausted" instead of retried.

How the false positive leads to session termination
  1. sendTask catches a non-context-exhaustion error (e.g., from session.sendRealtimeInput)
  2. Sets this.sessionError = this.toError(e) (a plain Error without statusCode)
  3. In #mainTask, the error is thrown and caught
  4. isContextExhaustedError(err) — first branch fails (no statusCode), but String(err).includes('1007') matches
  5. Session terminates with "context exhausted" instead of retrying

The primary onclose code path already handles the real 1007 case via the statusCode property check, making the string fallback unnecessary for the intended flow but dangerous for other errors.

Suggested change
private isContextExhaustedError(error: unknown): boolean {
return (
(typeof error === 'object' &&
error !== null &&
'statusCode' in error &&
error.statusCode === WS_CLOSE_CONTEXT_EXHAUSTED) ||
String(error).includes(String(WS_CLOSE_CONTEXT_EXHAUSTED))
);
}
private isContextExhaustedError(error: unknown): boolean {
return (
typeof error === 'object' &&
error !== null &&
'statusCode' in error &&
error.statusCode === WS_CLOSE_CONTEXT_EXHAUSTED
);
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +1234 to 1236
this.sessionError = this.toError(e);
this.markRestartNeeded();
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Behavioral change: sendTask/onReceiveMessage errors now count against retry budget

Prior to this PR, errors in sendTask and onReceiveMessage were logged and caused a silent restart (via markRestartNeeded()) without entering the catch block or incrementing numRetries. With the new sessionError mechanism (lines 1234, 1348), these errors are now thrown into the catch block (plugins/google/src/realtime/realtime_api.ts:1108-1111) and go through the full retry logic including numRetries++ at line 1159. This means repeated transient send/receive failures (e.g., network blips mid-session) will now accumulate toward maxRetries and eventually terminate the session. Previously they would retry indefinitely. The numRetries counter is reset at plugins/google/src/realtime/realtime_api.ts:1342-1343 only when a message is successfully received, so a session that repeatedly fails mid-send before receiving any response from the new connection will now terminate. This may be intentional (preventing infinite retry loops, as the existing TODO(brian): handle error from tasks comment at line 1100 suggests), but it's a significant behavioral change beyond the stated scope of handling 1007 errors.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

Open in Devin Review

});
}

this.emitError(err, true);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 New emitError(err, true) call emits recoverable errors on every retry attempt

Line 1147 adds this.emitError(err, true) which is new behavior — previously, errors were only emitted as non-recoverable when retries were exhausted. Now every retry attempt emits a recoverable error event to consumers. This is likely intentional for better observability, but callers subscribed to the 'error' event will now receive more frequent error notifications during transient network issues. If downstream consumers take action on every error event (e.g., logging prominently, alerting), this could increase noise.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant