fix(toolcalling): synthesize tool_calls finish_reason when stream lacks one#11045
Conversation
This comment has been minimized.
This comment has been minimized.
WalkthroughThree stream converters (Anthropic, OpenAI chat-completions jail/tool_parser_v2, Responses) are refactored to buffer tool-call blocks and emit them only when a terminal ChangesTool-call finish-reason buffering across protocol converters
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
Comment |
|
Addressed the non-threaded CodeRabbit findings on the current head:
The PR description now follows the repository template and identifies the stacked #10988 scope. All five inline review threads are resolved. |
e984721 to
bb41f0f
Compare
bb41f0f to
bb093c0
Compare
bb093c0 to
2404e76
Compare
…ks one When the engine emits a complete tool call but the SSE stream ends without any finish_reason chunk (e.g. speculative decoding folded EOS into the content chunks, or the terminal signal was dropped), strict OpenAI clients wait for a non-null finish_reason before considering the tool call complete and hang until their client-side timeout (Codex's 30s limit). This is DGH-967. Both tool-call streaming paths had the gap: - v2 parser path (tool_parser_v2::apply_stream): the end-of-stream backstop flushed pending tool calls with finish_reason: None. Now synthesizes ToolCalls for a choice that emitted tool calls; text-only output stays None. - v1 jail path (JailedStream::fix_finish_reason): only rewrote Stop->ToolCalls when a finish_reason chunk arrived; if none arrived, finalize() emitted the tool call with the (absent) upstream reason and no ToolCalls was ever set. Added an end-of-stream backstop that emits a trailing ToolCalls chunk for choices that emitted tool calls but never received a finish_reason. Adds regression tests on both paths: a tool-call stream ending without a finish_reason must yield a terminal ToolCalls; a text-only stream must not get a synthesized finish_reason. Signed-off-by: Matej Kosec <mkosec@nvidia.com>
`filter(|(_, &has)| has)` is rejected under Rust 2024 binding modes (cannot explicitly dereference within an implicitly-borrowing pattern). Rewrite as `filter(|(_, has)| *has)`. Signed-off-by: Matej Kosec <mkosec@nvidia.com>
DGH-967 requires the synthesized terminal finish_reason chunk to precede the usage-only chunk (OpenAI stream ordering — the terminal finish_reason comes before usage). The prior fix emitted it at stream end, after the usage chunk had already passed through, violating that ordering on the jail path. Restructure fix_finish_reason to detect the empty-choices/usage chunk as it arrives and synthesize the missing ToolCalls chunks *before* yielding it; keep an end-of-stream backstop for streams that never emit a usage chunk. Extract a synthesize_tool_calls_chunk helper. Add a regression test mirroring the ticket repro (tool call + usage-only chunk, no finish_reason) asserting the ToolCalls chunk precedes the usage chunk. Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Compile the jail-path synthesis helper and use valid Hermes payloads in its regression tests. Emit a terminal ToolCalls chunk on the v2 path when the parser already streamed the call and finish() has no additional output. Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Update jail integration tests that omit an upstream finish_reason to assert the additional empty ToolCalls terminal chunk required by DGH-967. Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Clear usage and LLM metrics copied from usage-only templates, stop replay fixtures before decoding [DONE], and describe the regression without internal ticket references. Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Clear copied usage and metrics from v2 synthesized finish chunks. Cover the legacy FunctionCall finish reason and derive replay call IDs from the emitted response. Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Do not consume the one-time synthesized-finish path when a heartbeat or comment arrives without response data. Cover a leading heartbeat before a tool-call and usage stream. Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Share v2 unfinished-choice finalization between the usage-only path and EOF fallback. Assert one accounting-free ToolCalls terminal chunk appears before usage. Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Signed-off-by: Matej Kosec <mkosec@nvidia.com>
2404e76 to
ced9bb8
Compare
…IS-2296) dynamo-parsers 3.1.0 (carrying the moved jail + #11045 finish_reason synthesis) is published, so pin it from the registry and drop the local path patch. This lifts the release gate on removing the jail from lib/llm. Built + all jail-consuming tests pass against the published crate (registry source): tool_choice (12), test_streaming_tool_parsers (37), test_reasoning_parser (36), tool_choice_finish_reasons (6).
Overview:
A streamed OpenAI chat completion can contain a complete tool call and then end without any
finish_reasonchunk. Strict clients wait for a non-null finish reason before treating the call as complete, which produced the 30-second timeout reported in DGH-967. This PR makes both chat-completions tool-call paths emitfinish_reason: "tool_calls"when tool calls were emitted but the upstream stream omitted its terminal reason. Text-only streams do not receive an invented finish reason.This PR follows merged PR #10988. Its diff now contains only the DGH-967 chat-completions follow-up.
Details:
JailedStream::fix_finish_reasonnow emits a syntheticToolCallschunk before an empty-choices usage chunk. An end-of-stream path covers streams without a usage chunk. Synthetic terminal chunks do not copy usage or internal LLM metrics from their template.tool_parser_v2::apply_streamnow emits a finish-only terminal chunk before a usage-only response for every unterminated choice that previously emitted tool calls. End of stream remains the fallback when no usage response arrives, and synthetic chunks clear copied usage and metrics.ToolCallsand legacyFunctionCallfinish reasons, and the affected HTTP replay paths.Where should the reviewer start?
Start with
JailedStream::fix_finish_reasoninlib/llm/src/protocols/openai/chat_completions/jail.rs, then compare the v2 terminal-finalization path inlib/llm/src/protocols/openai/chat_completions/tool_parser_v2.rs. The focused regressions are adjacent to those implementations and inlib/llm/tests/test_jail.rs.Related Issues
🚫 This PR is NOT linked to a GitHub issue:
Validation
The required GitHub checks on the latest commit are the authoritative validation record. The affected coverage is GPU-free and runs within the existing
dynamo-llmRust test jobs.