feat(amd): port tunable params and postpone-termination tool from python#1368
feat(amd): port tunable params and postpone-termination tool from python#1368toubatbrian wants to merge 1 commit intomainfrom
Conversation
Ports python livekit/agents#5584 (AMD improvement) into agents-js. - Expose `humanSpeechThresholdMs`, `humanSilenceThresholdMs`, `machineSilenceThresholdMs`, and `prompt` as `AMDOptions` fields. - Defer to the LLM (instead of forcing HUMAN) when a transcript is already available after a short greeting. - Add `postpone_termination` LLM tool (capped at 3 extensions × 10s) alongside `save_prediction`; fall back to JSON-content parsing when the LLM does not emit tool calls. - Add `participantIdentity` and `suppressCompatibilityWarning` options. - Warn once when the resolved LLM is not in `EVALUATED_LLM_MODELS`. Skipped (architectural divergence — see PR description): dedicated AMD STT pipeline, track-subscription wait, and the `start()` / `start_timers()` lifecycle split.
🦋 Changeset detectedLatest commit: 12540e3 The changes in this PR will be included in the next version bump. This PR includes changesets to release 28 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 12540e3329
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| this.extensionCount += 1; | ||
| this.clearTimer('silence'); | ||
| this.silenceTimer = setTimeout(() => { |
There was a problem hiding this comment.
Prevent stale tool calls from mutating active detection state
detect() now executes postpone_termination side effects (extensionCount increment and silenceTimer reset) immediately, but staleness is only checked later in classifyCurrentTranscript via the generation guard. When multiple final transcripts arrive close together, an older in-flight classification can be discarded for verdict purposes yet still overwrite timers/budget for the newer classification, causing unnecessary delays and inconsistent settling behavior. Tool side effects should be gated by the same generation token so stale classifications are no-ops.
Useful? React with 👍 / 👎.
| const lower = modelName.toLowerCase(); | ||
| for (const candidate of evaluated) { | ||
| const c = candidate.toLowerCase(); | ||
| if (lower === c || c.includes(lower)) return; |
There was a problem hiding this comment.
🟡 Reversed includes check in warnIfNotEvaluated fails for model names with date suffixes
The check c.includes(lower) at agents/src/voice/amd.ts:133 tests whether the evaluated candidate contains the user's model name as a substring. This works when the user provides a shorter name (e.g., 'gpt-4.1-mini' matches 'openai/gpt-4.1-mini'), but fails when the user's model name is longer than the evaluated entry — which is common with date-suffixed models like 'openai/gpt-4.1-mini-2025-04-14'. In that case, 'openai/gpt-4.1-mini'.includes('openai/gpt-4.1-mini-2025-04-14') is false, producing a spurious warning even though the base model is evaluated. The check should also include lower.includes(c) to handle this direction.
| if (lower === c || c.includes(lower)) return; | |
| if (lower === c || c.includes(lower) || lower.includes(c)) return; |
Was this helpful? React with 👍 or 👎 to provide feedback.
| let parsedArgs: unknown = {}; | ||
| try { | ||
| parsedArgs = JSON.parse(tc.args); | ||
| } catch { | ||
| // ignore malformed args; the tool execute() will receive {} | ||
| } | ||
| try { | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any -- AMD tools are loosely typed | ||
| await fnTool.execute(parsedArgs as any, { | ||
| ctx: undefined as never, | ||
| toolCallId: tc.callId, | ||
| abortSignal: undefined as unknown as AbortSignal, | ||
| }); |
There was a problem hiding this comment.
🟡 Tool arguments bypass Zod validation allowing invalid category labels in save_prediction
In detect(), tool call arguments are JSON-parsed and passed directly to the tool's execute function (agents/src/voice/amd.ts:591-603) without running them through the Zod schema defined in the tool's parameters. If the LLM provides an invalid label value (not one of the AMDCategory enum values), save_prediction's execute would accept it without error because label !== AMDCategory.UNCERTAIN would be true, producing an AMDResult with an invalid category string. While unlikely with well-behaved LLMs, this bypasses the runtime validation that the Zod z.enum() schema was intended to provide.
Affected code path
The Zod schema at line 512 defines z.enum([...AMDCategory values...]) but the args are parsed at line 593 as raw JSON and cast via as any at line 599. The normal tool execution pipeline in the framework validates arguments against the schema, but this manual invocation skips that step.
Prompt for agents
In the detect() method of agents/src/voice/amd.ts, tool call arguments (line 591-603) are JSON-parsed and passed directly to execute() without being validated against the Zod schema defined in the tool's parameters. For the save_prediction tool, this means an invalid label from the LLM would be accepted, creating an AMDResult with an invalid category string.
To fix: after JSON-parsing the args at line 593, validate them against the tool's Zod schema before calling execute(). You can use the tool's parameters schema (which is a Zod object) to parse/validate:
const schema = fnTool.parameters;
if (isZodSchema(schema)) {
parsedArgs = schema.parse(parsedArgs);
}
Alternatively, add a validation check inside the save_prediction execute function itself using parseCategory() to normalize the label.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Automated port of livekit/agents#5584 (
fix(amd): amd improvement (AGT-2777)) intoagents-js.Note
This is an automated Claude Code Routine created by @toubatbrian. Right now it is in experimentation stage.
cc @toubatbrian @livekit/agent-devs for review.
Ported features
All listed below land in
agents/src/voice/amd.tsand are wired through the existing two-gate (verdict + silence) AMD architecture.1. Expose all tunable parameters
New optional fields on
AMDOptions:humanSpeechThresholdMs2_500humanSilenceThresholdMs500HUMAN.machineSilenceThresholdMs1_500promptAMD_PROMPTparticipantIdentityundefinedsuppressCompatibilityWarningfalsenoSpeechTimeoutMs,detectionTimeoutMs, andmaxTranscriptTurnswere already exposed.2. Use LLM when a transcript is available
Mirrors the python change in
classifier.py::on_user_speech_ended. If the user just spoke for ≤humanSpeechThresholdMsand a transcript is already on the record, AMD now waitsmachineSilenceThresholdMs(instead of the shorterhumanSilenceThresholdMs+ automaticHUMANverdict) so the LLM gets the final word.3.
save_prediction+postpone_terminationtoolsdetect()now exposes two tools to the LLM viatoolCtxandtoolChoice: 'required':save_prediction({ label })— commits the verdict (mirrors pythonsave_prediction).postpone_termination({ seconds })— extends the silence window; capped atMAX_EXTENSIONS = 3×MAX_EXTENSION_MS = 10_000. On expiration, opens the silence gate and re-runs classification with the latest transcript; with extensions exhausted, the tool is no longer offered, forcing the LLM to commit.If the LLM doesn't emit tool calls (for example, the in-tree
StaticLLMtest mock or providers that ignoretoolChoice='required'), AMD falls back to the previous JSON-content parsing path so the existing 4 unit tests remain green.4. Compatibility warning for evaluated LLM models
EVALUATED_LLM_MODELS(the same 12 inference IDs from python) is checked againstLLM.modelonce at construction; a warning is logged when the resolved model isn't in the list, suppressible viasuppressCompatibilityWarning: true.What was intentionally not ported
These pieces of
agents#5584are tightly coupled to the pythonAudioRecognition/RoomIOpipeline and don't have direct counterparts in agents-js today. Skipping them avoids a much larger architectural change and keeps the JS AMD compatible with its current session-event model.sttparameter onAMDAgentSessionUserInputTranscribedevents; it has no audio-frame channel comparable to python'saudio_recognition.push_audio→ AMD path. Adding a parallel STT pipeline is a larger redesign and out of scope for this porting PR.wait_for_track_publication(wait_for_subscription=True)+start()/start_timers()splitexecute(), which the user calls aftersession.start({ agent, room }). The "start before SIP participant joins, then wait for subscription" pattern requires async lifecycle changes toAMDthat don't have an analogue in JS.EVALUATED_STT_MODELSwarningexamples/telephony/amd.pyrewriteexamples/src/telephony_amd.tsgets a comment block showing the new tunable options; the SIP-participant-creation choreography is not duplicated since JS doesn't have the sameroom_io.set_participantAPI surface.NO_SPEECH_THRESHOLD = 10.0/TIMEOUT = 20.0defaults10_000/20_000ms.If a follow-up needs the dedicated AMD STT pipeline or the participant-track lifecycle, that can be tracked as a separate issue — please flag in review if you'd like me to file one.
Implementation nuances
""into a channel; JS usesscheduleLLMClassification()to re-trigger classification with the joined transcript.tool_choice='required'is passed through; if a provider ignores it, the JSON-content fallback inparseDetection()keeps behavior reasonable.CLAUDE.md: all new fields are milliseconds.MAX_EXTENSION_MSis the JS analogue ofMAX_EXTENSION_SECS(10s → 10_000 ms).// Ref: python <path> - <line range>comments perCLAUDE.mdguidance.Test plan
pnpm --filter @livekit/agents build— passespnpm --filter @livekit/agents lint—amd.ts/amd.test.tsclean (0 errors, 0 warnings)pnpm exec prettier --checkon changed files — passespnpm exec vitest run agents/src/voice/amd.test.ts— 6/6 pass (4 existing + 2 new)Changeset
patchfor@livekit/agents(per the routine's standing instructions).Generated by Claude Code