Skip to content

feat(amd): port tunable params and postpone-termination tool from python#1368

Open
toubatbrian wants to merge 1 commit intomainfrom
claude/quirky-galileo-51AGi
Open

feat(amd): port tunable params and postpone-termination tool from python#1368
toubatbrian wants to merge 1 commit intomainfrom
claude/quirky-galileo-51AGi

Conversation

@toubatbrian
Copy link
Copy Markdown
Contributor

Summary

Automated port of livekit/agents#5584 (fix(amd): amd improvement (AGT-2777)) into agents-js.

Note

This is an automated Claude Code Routine created by @toubatbrian. Right now it is in experimentation stage.

cc @toubatbrian @livekit/agent-devs for review.

Ported features

All listed below land in agents/src/voice/amd.ts and are wired through the existing two-gate (verdict + silence) AMD architecture.

1. Expose all tunable parameters

New optional fields on AMDOptions:

Option Default Notes
humanSpeechThresholdMs 2_500 Speech longer than this is treated as machine-like and skips the short-greeting heuristic.
humanSilenceThresholdMs 500 Silence after a short greeting before settling as HUMAN.
machineSilenceThresholdMs 1_500 Silence after machine-like speech before opening the silence gate.
prompt bundled AMD_PROMPT Override the AMD classification system prompt.
participantIdentity undefined Currently informational (used for span attribution / logs).
suppressCompatibilityWarning false Silences the "model not evaluated" warning.

noSpeechTimeoutMs, detectionTimeoutMs, and maxTranscriptTurns were already exposed.

2. Use LLM when a transcript is available

Mirrors the python change in classifier.py::on_user_speech_ended. If the user just spoke for ≤ humanSpeechThresholdMs and a transcript is already on the record, AMD now waits machineSilenceThresholdMs (instead of the shorter humanSilenceThresholdMs + automatic HUMAN verdict) so the LLM gets the final word.

3. save_prediction + postpone_termination tools

detect() now exposes two tools to the LLM via toolCtx and toolChoice: 'required':

  • save_prediction({ label }) — commits the verdict (mirrors python save_prediction).
  • postpone_termination({ seconds }) — extends the silence window; capped at MAX_EXTENSIONS = 3 × MAX_EXTENSION_MS = 10_000. On expiration, opens the silence gate and re-runs classification with the latest transcript; with extensions exhausted, the tool is no longer offered, forcing the LLM to commit.

If the LLM doesn't emit tool calls (for example, the in-tree StaticLLM test mock or providers that ignore toolChoice='required'), AMD falls back to the previous JSON-content parsing path so the existing 4 unit tests remain green.

4. Compatibility warning for evaluated LLM models

EVALUATED_LLM_MODELS (the same 12 inference IDs from python) is checked against LLM.model once at construction; a warning is logged when the resolved model isn't in the list, suppressible via suppressCompatibilityWarning: true.

What was intentionally not ported

These pieces of agents#5584 are tightly coupled to the python AudioRecognition/RoomIO pipeline and don't have direct counterparts in agents-js today. Skipping them avoids a much larger architectural change and keeps the JS AMD compatible with its current session-event model.

Python change JS status Reason
Dedicated stt parameter on AMD Skipped The JS AMD listens to AgentSession UserInputTranscribed events; it has no audio-frame channel comparable to python's audio_recognition.push_audio → AMD path. Adding a parallel STT pipeline is a larger redesign and out of scope for this porting PR.
wait_for_track_publication(wait_for_subscription=True) + start() / start_timers() split Skipped The JS AMD starts timers inside execute(), which the user calls after session.start({ agent, room }). The "start before SIP participant joins, then wait for subscription" pattern requires async lifecycle changes to AMD that don't have an analogue in JS.
EVALUATED_STT_MODELS warning Skipped Paired with the dedicated STT pipeline above.
Python-only examples/telephony/amd.py rewrite Adapted examples/src/telephony_amd.ts gets a comment block showing the new tunable options; the SIP-participant-creation choreography is not duplicated since JS doesn't have the same room_io.set_participant API surface.
NO_SPEECH_THRESHOLD = 10.0 / TIMEOUT = 20.0 defaults Already matched JS already defaulted to 10_000 / 20_000 ms.

If a follow-up needs the dedicated AMD STT pipeline or the participant-track lifecycle, that can be tracked as a separate issue — please flag in review if you'd like me to file one.

Implementation nuances

  • Python signals "more audio expected" by sending "" into a channel; JS uses scheduleLLMClassification() to re-trigger classification with the joined transcript.
  • Python's tool_choice='required' is passed through; if a provider ignores it, the JSON-content fallback in parseDetection() keeps behavior reasonable.
  • Time units follow CLAUDE.md: all new fields are milliseconds. MAX_EXTENSION_MS is the JS analogue of MAX_EXTENSION_SECS (10s → 10_000 ms).
  • All ported sections carry // Ref: python <path> - <line range> comments per CLAUDE.md guidance.

Test plan

  • pnpm --filter @livekit/agents build — passes
  • pnpm --filter @livekit/agents lintamd.ts / amd.test.ts clean (0 errors, 0 warnings)
  • pnpm exec prettier --check on changed files — passes
  • pnpm exec vitest run agents/src/voice/amd.test.ts — 6/6 pass (4 existing + 2 new)
  • Manual smoke test against a real SIP call (left to reviewer with phone-number infra)

Changeset

patch for @livekit/agents (per the routine's standing instructions).


Generated by Claude Code

Ports python livekit/agents#5584 (AMD improvement) into agents-js.

- Expose `humanSpeechThresholdMs`, `humanSilenceThresholdMs`,
  `machineSilenceThresholdMs`, and `prompt` as `AMDOptions` fields.
- Defer to the LLM (instead of forcing HUMAN) when a transcript is
  already available after a short greeting.
- Add `postpone_termination` LLM tool (capped at 3 extensions × 10s)
  alongside `save_prediction`; fall back to JSON-content parsing when
  the LLM does not emit tool calls.
- Add `participantIdentity` and `suppressCompatibilityWarning` options.
- Warn once when the resolved LLM is not in `EVALUATED_LLM_MODELS`.

Skipped (architectural divergence — see PR description): dedicated AMD
STT pipeline, track-subscription wait, and the `start()` /
`start_timers()` lifecycle split.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 1, 2026

🦋 Changeset detected

Latest commit: 12540e3

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 28 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12540e3329

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread agents/src/voice/amd.ts
Comment on lines +543 to +545
this.extensionCount += 1;
this.clearTimer('silence');
this.silenceTimer = setTimeout(() => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent stale tool calls from mutating active detection state

detect() now executes postpone_termination side effects (extensionCount increment and silenceTimer reset) immediately, but staleness is only checked later in classifyCurrentTranscript via the generation guard. When multiple final transcripts arrive close together, an older in-flight classification can be discarded for verdict purposes yet still overwrite timers/budget for the newer classification, causing unnecessary delays and inconsistent settling behavior. Tool side effects should be gated by the same generation token so stale classifications are no-ops.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment thread agents/src/voice/amd.ts
const lower = modelName.toLowerCase();
for (const candidate of evaluated) {
const c = candidate.toLowerCase();
if (lower === c || c.includes(lower)) return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Reversed includes check in warnIfNotEvaluated fails for model names with date suffixes

The check c.includes(lower) at agents/src/voice/amd.ts:133 tests whether the evaluated candidate contains the user's model name as a substring. This works when the user provides a shorter name (e.g., 'gpt-4.1-mini' matches 'openai/gpt-4.1-mini'), but fails when the user's model name is longer than the evaluated entry — which is common with date-suffixed models like 'openai/gpt-4.1-mini-2025-04-14'. In that case, 'openai/gpt-4.1-mini'.includes('openai/gpt-4.1-mini-2025-04-14') is false, producing a spurious warning even though the base model is evaluated. The check should also include lower.includes(c) to handle this direction.

Suggested change
if (lower === c || c.includes(lower)) return;
if (lower === c || c.includes(lower) || lower.includes(c)) return;
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread agents/src/voice/amd.ts
Comment on lines +591 to +603
let parsedArgs: unknown = {};
try {
parsedArgs = JSON.parse(tc.args);
} catch {
// ignore malformed args; the tool execute() will receive {}
}
try {
// eslint-disable-next-line @typescript-eslint/no-explicit-any -- AMD tools are loosely typed
await fnTool.execute(parsedArgs as any, {
ctx: undefined as never,
toolCallId: tc.callId,
abortSignal: undefined as unknown as AbortSignal,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Tool arguments bypass Zod validation allowing invalid category labels in save_prediction

In detect(), tool call arguments are JSON-parsed and passed directly to the tool's execute function (agents/src/voice/amd.ts:591-603) without running them through the Zod schema defined in the tool's parameters. If the LLM provides an invalid label value (not one of the AMDCategory enum values), save_prediction's execute would accept it without error because label !== AMDCategory.UNCERTAIN would be true, producing an AMDResult with an invalid category string. While unlikely with well-behaved LLMs, this bypasses the runtime validation that the Zod z.enum() schema was intended to provide.

Affected code path

The Zod schema at line 512 defines z.enum([...AMDCategory values...]) but the args are parsed at line 593 as raw JSON and cast via as any at line 599. The normal tool execution pipeline in the framework validates arguments against the schema, but this manual invocation skips that step.

Prompt for agents
In the detect() method of agents/src/voice/amd.ts, tool call arguments (line 591-603) are JSON-parsed and passed directly to execute() without being validated against the Zod schema defined in the tool's parameters. For the save_prediction tool, this means an invalid label from the LLM would be accepted, creating an AMDResult with an invalid category string.

To fix: after JSON-parsing the args at line 593, validate them against the tool's Zod schema before calling execute(). You can use the tool's parameters schema (which is a Zod object) to parse/validate:

  const schema = fnTool.parameters;
  if (isZodSchema(schema)) {
    parsedArgs = schema.parse(parsedArgs);
  }

Alternatively, add a validation check inside the save_prediction execute function itself using parseCategory() to normalize the label.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants