Skip to content

Commit e1a782b

Browse files
fix(openai): stop streaming tool-call double-emission when autoparser is active (#10055)
Streaming /v1/chat/completions could emit the same logical tool call at multiple `index` values. In processStreamWithTools the Go-side iterative parser (ParseXMLIterative / ParseJSONIterative) runs on every token and emits tool-call deltas, while the C++ chat-template autoparser delivers its own tool calls via ChatDeltas that are flushed at end-of-stream by ToolCallsFromChatDeltas -> buildDeferredToolCallChunks. With both paths active the same call is emitted twice at different indices, so OpenAI clients that accumulate tool calls by `index` dispatch the tool N times. Skip the Go-side iterative parser once the autoparser is producing tool calls (hasChatDeltaToolCalls). The deferred flush stays guarded by lastEmittedCount, so the race where the Go parser emitted before the flag flipped also remains single-emission. Backends without an autoparser (e.g. vLLM) keep hasChatDeltaToolCalls=false and are unaffected. Refs #9722 Signed-off-by: bozhouDev <259759010+bozhouDev@users.noreply.github.com> Co-authored-by: bozhouDev <259759010+bozhouDev@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 73cfedc commit e1a782b

1 file changed

Lines changed: 13 additions & 0 deletions

File tree

core/http/endpoints/openai/chat_stream_workers.go

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -341,6 +341,19 @@ func processStreamWithTools(
341341
}
342342
}
343343

344+
// Issue #9722: when the C++ autoparser is already producing tool
345+
// calls (it delivers them via ChatDeltas, which are flushed at
346+
// end-of-stream by ToolCallsFromChatDeltas -> buildDeferredToolCallChunks),
347+
// skip the Go-side iterative parser below. Running both parsers makes
348+
// the same logical tool call surface at multiple `index` values.
349+
// The deferred flush is guarded by lastEmittedCount, so the race where
350+
// the Go parser already emitted before this flag flipped also stays
351+
// single-emission. Backends without an autoparser (e.g. vLLM) keep
352+
// hasChatDeltaToolCalls=false and are unaffected.
353+
if hasChatDeltaToolCalls {
354+
return true
355+
}
356+
344357
// Try incremental XML parsing for streaming support using iterative parser
345358
// This allows emitting partial tool calls as they're being generated
346359
cleanedResult := functions.CleanupLLMResult(result, cfg.FunctionsConfig)

0 commit comments

Comments
 (0)