Skip to content

fix(toolcalling): synthesize tool_calls finish_reason when stream lacks one#11045

Merged
MatejKosec merged 11 commits into
mainfrom
codex/dgh-967-tool-call-finish-reason
Jul 1, 2026
Merged

fix(toolcalling): synthesize tool_calls finish_reason when stream lacks one#11045
MatejKosec merged 11 commits into
mainfrom
codex/dgh-967-tool-call-finish-reason

Conversation

@MatejKosec

@MatejKosec MatejKosec commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Overview:

A streamed OpenAI chat completion can contain a complete tool call and then end without any finish_reason chunk. Strict clients wait for a non-null finish reason before treating the call as complete, which produced the 30-second timeout reported in DGH-967. This PR makes both chat-completions tool-call paths emit finish_reason: "tool_calls" when tool calls were emitted but the upstream stream omitted its terminal reason. Text-only streams do not receive an invented finish reason.

This PR follows merged PR #10988. Its diff now contains only the DGH-967 chat-completions follow-up.

Details:

  1. JailedStream::fix_finish_reason now emits a synthetic ToolCalls chunk before an empty-choices usage chunk. An end-of-stream path covers streams without a usage chunk. Synthetic terminal chunks do not copy usage or internal LLM metrics from their template.
  2. tool_parser_v2::apply_stream now emits a finish-only terminal chunk before a usage-only response for every unterminated choice that previously emitted tool calls. End of stream remains the fallback when no usage response arrives, and synthetic chunks clear copied usage and metrics.
  3. Regression coverage exercises missing finish reasons, text-only streams, usage-chunk ordering, both ToolCalls and legacy FunctionCall finish reasons, and the affected HTTP replay paths.

Where should the reviewer start?

Start with JailedStream::fix_finish_reason in lib/llm/src/protocols/openai/chat_completions/jail.rs, then compare the v2 terminal-finalization path in lib/llm/src/protocols/openai/chat_completions/tool_parser_v2.rs. The focused regressions are adjacent to those implementations and in lib/llm/tests/test_jail.rs.

Related Issues

🚫 This PR is NOT linked to a GitHub issue:

Validation

The required GitHub checks on the latest commit are the authoritative validation record. The affected coverage is GPU-free and runs within the existing dynamo-llm Rust test jobs.

@MatejKosec MatejKosec requested a review from a team as a code owner June 29, 2026 16:09
@MatejKosec MatejKosec temporarily deployed to external_collaborator June 29, 2026 16:09 — with GitHub Actions Inactive
@github-actions github-actions Bot added fix documentation Improvements or additions to documentation frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Jun 29, 2026
@datadog-official

This comment has been minimized.

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

Three stream converters (Anthropic, OpenAI chat-completions jail/tool_parser_v2, Responses) are refactored to buffer tool-call blocks and emit them only when a terminal finish_reason is observed, with EOF fallbacks. Synthetic finish_reason=ToolCalls backstops are added in jail.rs and tool_parser_v2.rs. A shared HTTP test harness, scripted chat engine, SSE fixtures, and end-to-end HTTP replay tests are introduced.

Changes

Tool-call finish-reason buffering across protocol converters

Layer / File(s) Summary
AnthropicStreamConverter: ToolCallState and buffering helpers
lib/llm/src/protocols/anthropic/stream_converter.rs
ToolCallState is restructured with argument_fragments and is_emit_ready(); record_tool_call, drain_buffered_tool_events, and append_buffered_tool_events are added to accumulate and flush tool blocks. tool_blocks_flushed field added to converter state.
AnthropicStreamConverter: finish-reason flush and EOF fallback
lib/llm/src/protocols/anthropic/stream_converter.rs
append_chunk_events detects terminal finish_reason to set should_flush_tool_blocks and records tool calls; append_end_events uses drain_buffered_tool_events as EOF fallback; test-only paths mirror production logic.
AnthropicStreamConverter: updated unit tests
lib/llm/src/protocols/anthropic/stream_converter.rs
finish_chunk helper nulls tool_calls in delta; all test assertions updated for buffered-flush semantics covering text/tool ordering, fragmented args, incomplete identity, deduplication, and parallel flushing.
jail.rs: synthetic ToolCalls finish-reason backstop
lib/llm/src/protocols/openai/chat_completions/jail.rs
fix_finish_reason tracks terminated choices and stores a template response; at stream end it synthesizes a trailing chunk with finish_reason=ToolCalls for choices that emitted tool calls without a terminal finish reason. Tests cover synthesis and text-only absence.
tool_parser_v2.rs: per-choice finish_reason in backstop flush
lib/llm/src/protocols/openai/chat_completions/tool_parser_v2.rs
The end-of-stream backstop now sets finish_reason=ToolCalls per choice when tool_emitted contains that index. Two new tests cover synthesized and absent finish_reason.
ResponseStreamConverter: FunctionCallState and identity/delta queuing
lib/llm/src/protocols/openai/responses/stream_converter.rs
FunctionCallState gains pending_arg_deltas and optional output_index; has_identity() gates output_item.added and deferred delta emission; should_finish_function_calls controls when done events fire.
ResponseStreamConverter: done-event helpers and end-of-stream path
lib/llm/src/protocols/openai/responses/stream_converter.rs
append_pending_function_call_done_events finalizes tool calls sorted by output_index; completed_output builds the ordered output list; append_end_events delegates to the shared helper and emits response.completed.
ResponseStreamConverter: updated unit tests
lib/llm/src/protocols/openai/responses/stream_converter.rs
finish_chunk helper added; tests assert done events appear only after finish_reason, identity-only and deferred-delta cases, EOF fallback, multiple tool calls, ordering, and no duplicate done events.
Shared HTTP test harness: HarnessService, ScriptedChatEngine, SSE utils
lib/llm/tests/common/http_harness.rs, lib/llm/tests/common/scripted_chat_engine.rs
HarnessService starts/stops a local HttpService with a ScriptedChatEngine; ScriptedChatEngine queues immediate and gated-tail scripts; IncrementalSseParser, parse_json_sse, and canonicalize utilities support HTTP replay tests.
SSE replay fixtures and README
lib/llm/tests/data/replays/agent-harness/*
Four agent-harness SSE fixtures (text, fragmented-tool, parallel-tools, thinking-tool) and a README documenting their construction and coverage are added.
Anthropic HTTP replay integration tests
lib/llm/tests/anthropic_http_replay.rs
End-to-end tests covering unary and streaming text, fragmented tool arguments, finish-signal ordering (gated tail), parallel tools, tool-result round trip, and count_tokens.
Responses API HTTP replay integration tests
lib/llm/tests/responses_http_replay.rs
End-to-end tests covering unary and streaming text, fragmented tool arguments with early-completion guard, gated-tail finish-signal ordering, parallel function calls, and round-trip tool output.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the main change: synthesizing ToolCalls finish_reason when streams end without one.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description includes all required sections, a clear summary, reviewer guidance, and a completed Related Issues choice.

Comment @coderabbitai help to get the list of available commands.

coderabbitai[bot]

This comment was marked as resolved.

@MatejKosec

Copy link
Copy Markdown
Contributor Author

Addressed the non-threaded CodeRabbit findings on the current head:

  1. The v2 EOF path now emits a finish-only ToolCalls chunk when a choice emitted tool calls earlier and parser.finish() has no additional output. It determines the finish reason after recording calls emitted by the EOF flush. Commit 34850187f6b also clears usage and LLM metrics copied from a usage-only template.
  2. ResponseStreamConverter now has mirrored coverage for FinishReason::FunctionCall as well as FinishReason::ToolCalls.
  3. The Responses round-trip replay now derives the follow-up call_id from the first response.
  4. I did not add duplicate Anthropic HTTP fixtures for EOF without a finish reason. The exact converter cases are already covered by test_tool_only_response_flushes_at_eof_without_finish_reason and test_text_only_response_stop_in_end_events; the DGH-967 chat-completions behavior is covered separately in the v2 and jail regressions. The existing HTTP replays retain their narrower role of checking cross-protocol conversion and finish-signal ordering.

The PR description now follows the repository template and identifies the stacked #10988 scope. All five inline review threads are resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@MatejKosec MatejKosec force-pushed the codex/dgh-967-tool-call-finish-reason branch from e984721 to bb41f0f Compare June 30, 2026 21:32
@MatejKosec MatejKosec force-pushed the codex/dgh-967-tool-call-finish-reason branch from bb41f0f to bb093c0 Compare June 30, 2026 21:34
@MatejKosec MatejKosec force-pushed the codex/dgh-967-tool-call-finish-reason branch from bb093c0 to 2404e76 Compare June 30, 2026 22:39
MatejKosec added 11 commits July 1, 2026 09:16
…ks one

When the engine emits a complete tool call but the SSE stream ends without any
finish_reason chunk (e.g. speculative decoding folded EOS into the content
chunks, or the terminal signal was dropped), strict OpenAI clients wait for a
non-null finish_reason before considering the tool call complete and hang
until their client-side timeout (Codex's 30s limit). This is DGH-967.

Both tool-call streaming paths had the gap:

- v2 parser path (tool_parser_v2::apply_stream): the end-of-stream backstop
  flushed pending tool calls with finish_reason: None. Now synthesizes
  ToolCalls for a choice that emitted tool calls; text-only output stays None.

- v1 jail path (JailedStream::fix_finish_reason): only rewrote Stop->ToolCalls
  when a finish_reason chunk arrived; if none arrived, finalize() emitted the
  tool call with the (absent) upstream reason and no ToolCalls was ever set.
  Added an end-of-stream backstop that emits a trailing ToolCalls chunk for
  choices that emitted tool calls but never received a finish_reason.

Adds regression tests on both paths: a tool-call stream ending without a
finish_reason must yield a terminal ToolCalls; a text-only stream must not get
a synthesized finish_reason.

Signed-off-by: Matej Kosec <mkosec@nvidia.com>
`filter(|(_, &has)| has)` is rejected under Rust 2024 binding modes
(cannot explicitly dereference within an implicitly-borrowing pattern).
Rewrite as `filter(|(_, has)| *has)`.

Signed-off-by: Matej Kosec <mkosec@nvidia.com>
DGH-967 requires the synthesized terminal finish_reason chunk to precede the
usage-only chunk (OpenAI stream ordering — the terminal finish_reason comes
before usage). The prior fix emitted it at stream end, after the usage chunk
had already passed through, violating that ordering on the jail path.

Restructure fix_finish_reason to detect the empty-choices/usage chunk as it
arrives and synthesize the missing ToolCalls chunks *before* yielding it; keep
an end-of-stream backstop for streams that never emit a usage chunk. Extract a
synthesize_tool_calls_chunk helper. Add a regression test mirroring the ticket
repro (tool call + usage-only chunk, no finish_reason) asserting the ToolCalls
chunk precedes the usage chunk.

Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Compile the jail-path synthesis helper and use valid Hermes payloads in its regression tests. Emit a terminal ToolCalls chunk on the v2 path when the parser already streamed the call and finish() has no additional output.

Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Update jail integration tests that omit an upstream finish_reason to assert the additional empty ToolCalls terminal chunk required by DGH-967.

Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Clear usage and LLM metrics copied from usage-only templates, stop replay fixtures before decoding [DONE], and describe the regression without internal ticket references.

Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Clear copied usage and metrics from v2 synthesized finish chunks. Cover the legacy FunctionCall finish reason and derive replay call IDs from the emitted response.

Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Do not consume the one-time synthesized-finish path when a heartbeat or comment arrives without response data. Cover a leading heartbeat before a tool-call and usage stream.

Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Share v2 unfinished-choice finalization between the usage-only path and EOF fallback. Assert one accounting-free ToolCalls terminal chunk appears before usage.

Signed-off-by: Matej Kosec <mkosec@nvidia.com>
Signed-off-by: Matej Kosec <mkosec@nvidia.com>
@MatejKosec MatejKosec force-pushed the codex/dgh-967-tool-call-finish-reason branch from 2404e76 to ced9bb8 Compare July 1, 2026 07:16
@MatejKosec MatejKosec merged commit 076ce9d into main Jul 1, 2026
165 of 167 checks passed
@MatejKosec MatejKosec deleted the codex/dgh-967-tool-call-finish-reason branch July 1, 2026 09:22
keivenchang added a commit that referenced this pull request Jul 2, 2026
…IS-2296)

dynamo-parsers 3.1.0 (carrying the moved jail + #11045 finish_reason synthesis)
is published, so pin it from the registry and drop the local path patch. This
lifts the release gate on removing the jail from lib/llm.

Built + all jail-consuming tests pass against the published crate (registry
source): tool_choice (12), test_streaming_tool_parsers (37), test_reasoning_parser
(36), tool_choice_finish_reasons (6).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation fix frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants