fix(toolcalling): synthesize tool_calls finish_reason when stream lacks one by MatejKosec · Pull Request #11045 · ai-dynamo/dynamo

MatejKosec · 2026-06-29T16:09:36Z

Overview:

A streamed OpenAI chat completion can contain a complete tool call and then end without any finish_reason chunk. Strict clients wait for a non-null finish reason before treating the call as complete, which produced the 30-second timeout reported in DGH-967. This PR makes both chat-completions tool-call paths emit finish_reason: "tool_calls" when tool calls were emitted but the upstream stream omitted its terminal reason. Text-only streams do not receive an invented finish reason.

This PR follows merged PR #10988. Its diff now contains only the DGH-967 chat-completions follow-up.

Details:

JailedStream::fix_finish_reason now emits a synthetic ToolCalls chunk before an empty-choices usage chunk. An end-of-stream path covers streams without a usage chunk. Synthetic terminal chunks do not copy usage or internal LLM metrics from their template.
tool_parser_v2::apply_stream now emits a finish-only terminal chunk before a usage-only response for every unterminated choice that previously emitted tool calls. End of stream remains the fallback when no usage response arrives, and synthetic chunks clear copied usage and metrics.
Regression coverage exercises missing finish reasons, text-only streams, usage-chunk ordering, both ToolCalls and legacy FunctionCall finish reasons, and the affected HTTP replay paths.

Where should the reviewer start?

Start with JailedStream::fix_finish_reason in lib/llm/src/protocols/openai/chat_completions/jail.rs, then compare the v2 terminal-finalization path in lib/llm/src/protocols/openai/chat_completions/tool_parser_v2.rs. The focused regressions are adjacent to those implementations and in lib/llm/tests/test_jail.rs.

Related Issues

🚫 This PR is NOT linked to a GitHub issue:

Confirmed — no related issue
Internal tracker: DGH-967
Follow-on to fix(toolcalling): publish fragmented tool calls on finish instead of end-of-stream #10988

Validation

The required GitHub checks on the latest commit are the authoritative validation record. The affected coverage is GPU-free and runs within the existing dynamo-llm Rust test jobs.

github-actions · 2026-06-29T16:12:23Z

🌿 Fern Docs Preview: https://nvidia-preview-69b37bb1-5afa-4e2a-bad0-aa9e6072d69a.docs.buildwithfern.com/dynamo/dev

coderabbitai · 2026-06-29T16:20:16Z

Walkthrough

Three stream converters (Anthropic, OpenAI chat-completions jail/tool_parser_v2, Responses) are refactored to buffer tool-call blocks and emit them only when a terminal finish_reason is observed, with EOF fallbacks. Synthetic finish_reason=ToolCalls backstops are added in jail.rs and tool_parser_v2.rs. A shared HTTP test harness, scripted chat engine, SSE fixtures, and end-to-end HTTP replay tests are introduced.

Changes

Tool-call finish-reason buffering across protocol converters

Layer / File(s)	Summary
AnthropicStreamConverter: ToolCallState and buffering helpers `lib/llm/src/protocols/anthropic/stream_converter.rs`	`ToolCallState` is restructured with `argument_fragments` and `is_emit_ready()`; `record_tool_call`, `drain_buffered_tool_events`, and `append_buffered_tool_events` are added to accumulate and flush tool blocks. `tool_blocks_flushed` field added to converter state.
AnthropicStreamConverter: finish-reason flush and EOF fallback `lib/llm/src/protocols/anthropic/stream_converter.rs`	`append_chunk_events` detects terminal `finish_reason` to set `should_flush_tool_blocks` and records tool calls; `append_end_events` uses `drain_buffered_tool_events` as EOF fallback; test-only paths mirror production logic.
AnthropicStreamConverter: updated unit tests `lib/llm/src/protocols/anthropic/stream_converter.rs`	`finish_chunk` helper nulls `tool_calls` in delta; all test assertions updated for buffered-flush semantics covering text/tool ordering, fragmented args, incomplete identity, deduplication, and parallel flushing.
jail.rs: synthetic ToolCalls finish-reason backstop `lib/llm/src/protocols/openai/chat_completions/jail.rs`	`fix_finish_reason` tracks `terminated` choices and stores a `template` response; at stream end it synthesizes a trailing chunk with `finish_reason=ToolCalls` for choices that emitted tool calls without a terminal finish reason. Tests cover synthesis and text-only absence.
tool_parser_v2.rs: per-choice finish_reason in backstop flush `lib/llm/src/protocols/openai/chat_completions/tool_parser_v2.rs`	The end-of-stream backstop now sets `finish_reason=ToolCalls` per choice when `tool_emitted` contains that index. Two new tests cover synthesized and absent finish_reason.
ResponseStreamConverter: FunctionCallState and identity/delta queuing `lib/llm/src/protocols/openai/responses/stream_converter.rs`	`FunctionCallState` gains `pending_arg_deltas` and optional `output_index`; `has_identity()` gates `output_item.added` and deferred delta emission; `should_finish_function_calls` controls when done events fire.
ResponseStreamConverter: done-event helpers and end-of-stream path `lib/llm/src/protocols/openai/responses/stream_converter.rs`	`append_pending_function_call_done_events` finalizes tool calls sorted by `output_index`; `completed_output` builds the ordered output list; `append_end_events` delegates to the shared helper and emits `response.completed`.
ResponseStreamConverter: updated unit tests `lib/llm/src/protocols/openai/responses/stream_converter.rs`	`finish_chunk` helper added; tests assert done events appear only after `finish_reason`, identity-only and deferred-delta cases, EOF fallback, multiple tool calls, ordering, and no duplicate done events.
Shared HTTP test harness: HarnessService, ScriptedChatEngine, SSE utils `lib/llm/tests/common/http_harness.rs`, `lib/llm/tests/common/scripted_chat_engine.rs`	`HarnessService` starts/stops a local `HttpService` with a `ScriptedChatEngine`; `ScriptedChatEngine` queues immediate and gated-tail scripts; `IncrementalSseParser`, `parse_json_sse`, and `canonicalize` utilities support HTTP replay tests.
SSE replay fixtures and README `lib/llm/tests/data/replays/agent-harness/*`	Four agent-harness SSE fixtures (`text`, `fragmented-tool`, `parallel-tools`, `thinking-tool`) and a README documenting their construction and coverage are added.
Anthropic HTTP replay integration tests `lib/llm/tests/anthropic_http_replay.rs`	End-to-end tests covering unary and streaming text, fragmented tool arguments, finish-signal ordering (gated tail), parallel tools, tool-result round trip, and `count_tokens`.
Responses API HTTP replay integration tests `lib/llm/tests/responses_http_replay.rs`	End-to-end tests covering unary and streaming text, fragmented tool arguments with early-completion guard, gated-tail finish-signal ordering, parallel function calls, and round-trip tool output.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly matches the main change: synthesizing ToolCalls finish_reason when streams end without one.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The description includes all required sections, a clear summary, reviewer guidance, and a completed Related Issues choice.

_{Comment @coderabbitai help to get the list of available commands.}

MatejKosec · 2026-06-30T07:28:46Z

Addressed the non-threaded CodeRabbit findings on the current head:

The v2 EOF path now emits a finish-only ToolCalls chunk when a choice emitted tool calls earlier and parser.finish() has no additional output. It determines the finish reason after recording calls emitted by the EOF flush. Commit 34850187f6b also clears usage and LLM metrics copied from a usage-only template.
ResponseStreamConverter now has mirrored coverage for FinishReason::FunctionCall as well as FinishReason::ToolCalls.
The Responses round-trip replay now derives the follow-up call_id from the first response.
I did not add duplicate Anthropic HTTP fixtures for EOF without a finish reason. The exact converter cases are already covered by test_tool_only_response_flushes_at_eof_without_finish_reason and test_text_only_response_stop_in_end_events; the DGH-967 chat-completions behavior is covered separately in the v2 and jail regressions. The existing HTTP replays retain their narrower role of checking cross-protocol conversion and finish-signal ordering.

The PR description now follows the repository template and identifies the stacked #10988 scope. All five inline review threads are resolved.

…ks one When the engine emits a complete tool call but the SSE stream ends without any finish_reason chunk (e.g. speculative decoding folded EOS into the content chunks, or the terminal signal was dropped), strict OpenAI clients wait for a non-null finish_reason before considering the tool call complete and hang until their client-side timeout (Codex's 30s limit). This is DGH-967. Both tool-call streaming paths had the gap: - v2 parser path (tool_parser_v2::apply_stream): the end-of-stream backstop flushed pending tool calls with finish_reason: None. Now synthesizes ToolCalls for a choice that emitted tool calls; text-only output stays None. - v1 jail path (JailedStream::fix_finish_reason): only rewrote Stop->ToolCalls when a finish_reason chunk arrived; if none arrived, finalize() emitted the tool call with the (absent) upstream reason and no ToolCalls was ever set. Added an end-of-stream backstop that emits a trailing ToolCalls chunk for choices that emitted tool calls but never received a finish_reason. Adds regression tests on both paths: a tool-call stream ending without a finish_reason must yield a terminal ToolCalls; a text-only stream must not get a synthesized finish_reason. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

`filter(|(_, &has)| has)` is rejected under Rust 2024 binding modes (cannot explicitly dereference within an implicitly-borrowing pattern). Rewrite as `filter(|(_, has)| *has)`. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

DGH-967 requires the synthesized terminal finish_reason chunk to precede the usage-only chunk (OpenAI stream ordering — the terminal finish_reason comes before usage). The prior fix emitted it at stream end, after the usage chunk had already passed through, violating that ordering on the jail path. Restructure fix_finish_reason to detect the empty-choices/usage chunk as it arrives and synthesize the missing ToolCalls chunks *before* yielding it; keep an end-of-stream backstop for streams that never emit a usage chunk. Extract a synthesize_tool_calls_chunk helper. Add a regression test mirroring the ticket repro (tool call + usage-only chunk, no finish_reason) asserting the ToolCalls chunk precedes the usage chunk. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

Compile the jail-path synthesis helper and use valid Hermes payloads in its regression tests. Emit a terminal ToolCalls chunk on the v2 path when the parser already streamed the call and finish() has no additional output. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

Update jail integration tests that omit an upstream finish_reason to assert the additional empty ToolCalls terminal chunk required by DGH-967. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

Clear usage and LLM metrics copied from usage-only templates, stop replay fixtures before decoding [DONE], and describe the regression without internal ticket references. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

Clear copied usage and metrics from v2 synthesized finish chunks. Cover the legacy FunctionCall finish reason and derive replay call IDs from the emitted response. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

Do not consume the one-time synthesized-finish path when a heartbeat or comment arrives without response data. Cover a leading heartbeat before a tool-call and usage stream. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

Signed-off-by: Matej Kosec <mkosec@nvidia.com>

Share v2 unfinished-choice finalization between the usage-only path and EOF fallback. Assert one accounting-free ToolCalls terminal chunk appears before usage. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

Signed-off-by: Matej Kosec <mkosec@nvidia.com>

…IS-2296) dynamo-parsers 3.1.0 (carrying the moved jail + #11045 finish_reason synthesis) is published, so pin it from the registry and drop the local path patch. This lifts the release gate on removing the jail from lib/llm. Built + all jail-consuming tests pass against the published crate (registry source): tool_choice (12), test_streaming_tool_parsers (37), test_reasoning_parser (36), tool_choice_finish_reasons (6).

MatejKosec requested a review from a team as a code owner June 29, 2026 16:09

MatejKosec temporarily deployed to external_collaborator June 29, 2026 16:09 — with GitHub Actions Inactive

pull-request-size Bot added the size/XXL label Jun 29, 2026

github-actions Bot added fix documentation Improvements or additions to documentation frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Jun 29, 2026

This comment has been minimized.

Sign in to view

This comment was marked as resolved.

Sign in to view

MatejKosec temporarily deployed to external_collaborator June 29, 2026 16:28 — with GitHub Actions Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 29, 2026 16:28 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 29, 2026 16:33 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 29, 2026 16:54 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 29, 2026 20:03 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 29, 2026 20:21 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 30, 2026 06:52 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 30, 2026 06:53 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 30, 2026 07:03 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 30, 2026 07:12 Inactive

MatejKosec requested review from ishandhanani, keivenchang and rmccorm4 June 30, 2026 07:15

copy-pr-bot Bot temporarily deployed to GITLAB June 30, 2026 07:27 Inactive

This comment was marked as resolved.

Sign in to view

copy-pr-bot Bot temporarily deployed to GITLAB June 30, 2026 07:40 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 30, 2026 07:46 Inactive

This comment was marked as resolved.

Sign in to view

copy-pr-bot Bot temporarily deployed to GITLAB June 30, 2026 07:59 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 30, 2026 19:01 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 30, 2026 20:10 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 30, 2026 20:16 Inactive

MatejKosec force-pushed the codex/dgh-967-tool-call-finish-reason branch from e984721 to bb41f0f Compare June 30, 2026 21:32

copy-pr-bot Bot temporarily deployed to GITLAB June 30, 2026 21:32 Inactive

MatejKosec force-pushed the codex/dgh-967-tool-call-finish-reason branch from bb41f0f to bb093c0 Compare June 30, 2026 21:34

copy-pr-bot Bot temporarily deployed to GITLAB June 30, 2026 21:34 Inactive

MatejKosec force-pushed the codex/dgh-967-tool-call-finish-reason branch from bb093c0 to 2404e76 Compare June 30, 2026 22:39

copy-pr-bot Bot temporarily deployed to GITLAB June 30, 2026 22:39 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB July 1, 2026 00:07 Inactive

MatejKosec added 11 commits July 1, 2026 09:16

test(toolcalling): expect synthesized finish chunks

9cb6f6a

Update jail integration tests that omit an upstream finish_reason to assert the additional empty ToolCalls terminal chunk required by DGH-967. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

fix(toolcalling): finalize terminal stream handling

b1db135

Clear usage and LLM metrics copied from usage-only templates, stop replay fixtures before decoding [DONE], and describe the regression without internal ticket references. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

fix(toolcalling): keep terminal chunks accounting-free

f26a813

Clear copied usage and metrics from v2 synthesized finish chunks. Cover the legacy FunctionCall finish reason and derive replay call IDs from the emitted response. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

fix(toolcalling): ignore non-data events for finish synthesis

29307dd

Do not consume the one-time synthesized-finish path when a heartbeat or comment arrives without response data. Cover a leading heartbeat before a tool-call and usage stream. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

style(toolcalling): collapse finish synthesis guard

b543730

Signed-off-by: Matej Kosec <mkosec@nvidia.com>

fix(toolcalling): emit v2 finish before usage

0caa602

Share v2 unfinished-choice finalization between the usage-only path and EOF fallback. Assert one accounting-free ToolCalls terminal chunk appears before usage. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

fix(toolcalling): address terminal stream review findings

ced9bb8

Signed-off-by: Matej Kosec <mkosec@nvidia.com>

MatejKosec force-pushed the codex/dgh-967-tool-call-finish-reason branch from 2404e76 to ced9bb8 Compare July 1, 2026 07:16

copy-pr-bot Bot temporarily deployed to GITLAB July 1, 2026 07:17 Inactive

MatejKosec merged commit 076ce9d into main Jul 1, 2026
165 of 167 checks passed

MatejKosec deleted the codex/dgh-967-tool-call-finish-reason branch July 1, 2026 09:22

keivenchang mentioned this pull request Jul 2, 2026

feat(parsers): move streaming tool-call jail into dynamo-parsers (DIS-2296, part 1) ai-dynamo/frontend-crates#92

Merged

keivenchang mentioned this pull request Jul 2, 2026

refactor(llm): remove tool-call jail, depend on dynamo-parsers #11112

Merged

rmccorm4 mentioned this pull request Jul 3, 2026

refactor(llm): remove tool-call jail, depend on dynamo-parsers #11204

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(toolcalling): synthesize tool_calls finish_reason when stream lacks one#11045

fix(toolcalling): synthesize tool_calls finish_reason when stream lacks one#11045
MatejKosec merged 11 commits into
mainfrom
codex/dgh-967-tool-call-finish-reason

MatejKosec commented Jun 29, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

github-actions Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

MatejKosec commented Jun 30, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

MatejKosec commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues

Validation

Uh oh!

This comment has been minimized.

github-actions Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

This comment was marked as resolved.

Uh oh!

MatejKosec commented Jun 30, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MatejKosec commented Jun 29, 2026 •

edited

Loading

github-actions Bot commented Jun 29, 2026 •

edited

Loading

coderabbitai Bot commented Jun 29, 2026 •

edited

Loading