Skip to content

[bot] OpenAI Chat Completions streaming aggregation corrupts output when n > 1 (multiple choices) #121

@braintrust-bot

Description

@braintrust-bot

Summary

The OpenAI Chat Completions streaming aggregator does not track choice indices from streamed chunks. When n > 1 is set (requesting multiple completions), deltas from all choices are concatenated into a single output entry, producing garbled content. The non-streaming path correctly preserves all choices separately.

What is missing

1. Streaming aggregation uses a single accumulator for all choice indices

In trace/contrib/openai/chatcompletions.go, postprocessStreamingResults() (lines 170–275) maintains one set of accumulators for all chunks:

var role *string
var content string
var toolCalls []interface{}
var finishReason interface{}

Each streaming chunk contains a choices array with a single element whose index field identifies which logical choice it belongs to. The code processes choices[0] from every chunk (line 183) without checking the index field:

if choiceMap, ok := choices[0].(map[string]any); ok {
    delta, ok := choiceMap["delta"].(map[string]any)
    // ...
    if deltaContent, ok := delta["content"].(string); ok {
        content += deltaContent  // all choice indices mixed together
    }

When n=2, streaming chunks arrive interleaved:

choices: [{"index": 0, "delta": {"content": "Hello"}}]
choices: [{"index": 1, "delta": {"content": "Goodbye"}}]
choices: [{"index": 0, "delta": {"content": " world"}}]
choices: [{"index": 1, "delta": {"content": " world"}}]

The aggregated content becomes "HelloGoodbye world world" — garbled output from both choices concatenated.

2. Output hardcodes index 0

The final output (lines 263–274) always returns a single choice with hardcoded "index": 0:

return []map[string]interface{}{
    {
        "index": 0,
        "message": map[string]interface{}{
            "role":       finalRole,
            "content":    content,        // garbled when n > 1
            "tool_calls": finalToolCalls, // also garbled when n > 1
        },
        "logprobs":      nil,
        "finish_reason": finishReason,
    },
}

3. Non-streaming path works correctly

handleChatCompletionResponse() (lines 290–328) passes the full choices array from the API response as-is, correctly preserving all n choices with their separate content, tool_calls, and finish_reason.

4. The n parameter is captured in request metadata

The metadata fields list (line 58) includes "n", confirming this is a recognized request parameter. The streaming output should faithfully represent the response for all supported request configurations.

Braintrust docs status

Braintrust docs describe capturing "inputs, outputs, model parameters, token usage, and costs" for LLM calls but do not specifically address multi-choice (n > 1) output handling. Status: unclear.

Upstream sources

  • OpenAI Chat Completions API reference: https://platform.openai.com/docs/api-reference/chat/create — documents n parameter: "How many chat completion choices to generate for each input message"
  • OpenAI streaming format: each SSE chunk contains choices[].index identifying which completion the delta belongs to
  • openai-go SDK v1.12.0 (used by this repo) supports N parameter in ChatCompletionNewParams

Braintrust docs sources

Local repo files inspected

  • trace/contrib/openai/chatcompletions.gopostprocessStreamingResults() lines 170–275: single accumulator set for all choices, choices[0] processed without checking index field; output hardcodes "index": 0 at line 265
  • trace/contrib/openai/chatcompletions.gohandleChatCompletionResponse() lines 290–328: non-streaming path correctly passes all choices as-is
  • trace/contrib/openai/chatcompletions.go — metadata fields line 58: "n" is captured in request metadata
  • trace/contrib/openai/traceopenai_test.go — no streaming test with n > 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions