Skip to content

[bot] OpenAI Chat Completions tracer does not capture audio modality parameters or aggregate streaming audio data #117

@braintrust-bot

Description

@braintrust-bot

Summary

The OpenAI Chat Completions tracer does not capture the modalities and audio request parameters in metadata, and does not aggregate delta.audio from streaming chunks. When using GPT-4o audio models (e.g. gpt-4o-audio-preview) with audio output enabled, the trace is missing both the audio configuration and the audio response data in streaming mode.

The non-streaming path captures audio response data correctly because it passes through the full choices array as-is. The streaming aggregation drops it.

What is missing

1. Request parameters modalities and audio not captured in metadata

In trace/contrib/openai/chatcompletions.go (lines 51–76), the metadata fields list does not include modalities or audio:

metadataFields := []string{
    "model",
    "frequency_penalty",
    "logit_bias",
    "logprobs",
    "max_tokens",
    "max_completion_tokens",
    "n",
    "presence_penalty",
    "reasoning_effort",
    "response_format",
    "seed",
    "service_tier",
    "stop",
    "stream",
    "stream_options",
    "temperature",
    "top_p",
    "top_logprobs",
    "tools",
    "tool_choice",
    "parallel_tool_calls",
    "user",
    "functions",
    "function_call",
}

When a user enables audio output with modalities: ["text", "audio"] and configures audio: { voice: "alloy", format: "wav" }, these parameters are silently dropped from the span metadata. Users cannot see which voice or audio format was requested in their traces.

2. Streaming aggregation does not handle delta.audio

In postprocessStreamingResults() (lines 170–275), the delta processing handles three fields:

  • delta.role (line 191)
  • delta.content (line 202)
  • delta.tool_calls (lines 207–248)

It does not handle delta.audio, which contains audio response chunks (id, transcript, data, expires_at). When streaming audio responses, the audio data and transcript are silently dropped from the aggregated output.

3. Non-streaming path works correctly

handleChatCompletionResponse() (lines 290–328) passes the full choices array to braintrust.output_json, preserving message.audio data including id, data (base64 audio), transcript, and expires_at.

4. Audio token metrics are partially captured

The parseUsageTokens function handles *_tokens_details generically — so prompt_tokens_details.audio_tokens and completion_tokens_details.audio_tokens ARE captured as prompt_audio_tokens and completion_audio_tokens respectively. The gap is only in the request parameters and streaming content aggregation.

5. Comparable integrations capture modality config

The Responses API tracer in this repo captures the full tools array (which includes built-in tools like web_search) and the reasoning config. Audio config for Chat Completions is an analogous modality configuration that controls output format but is not captured.

The Braintrust Gemini integration docs mention audio as a supported input modality. The OpenAI audio feature is a comparable generative surface.

Impact

  • Audio generation traces don't show which voice or format was requested
  • Streaming audio Chat Completions produce output with empty content and no audio data (the audio field with transcript and data is lost)
  • Users of GPT-4o-audio-preview and GPT-4o-mini-audio-preview lose audio observability in streaming mode
  • The openai-go SDK exposes ChatCompletionAudioParam (with Format and Voice) and ChatCompletionAudio (response) types, confirming this is a stable upstream feature

Braintrust docs status

Braintrust docs mention audio support for Gemini (multimodal content including audio transcription) but do not mention OpenAI audio modality specifically. Status: not_found for OpenAI Chat Completions audio instrumentation.

Upstream sources

Braintrust docs sources

Local repo files inspected

  • trace/contrib/openai/chatcompletions.go — metadata fields (lines 51–76): modalities and audio absent; postprocessStreamingResults() (lines 170–275): handles delta.content, delta.tool_calls, delta.role but not delta.audio
  • trace/contrib/openai/chatcompletions.gohandleChatCompletionResponse() (lines 290–328): non-streaming path passes full choices correctly
  • trace/contrib/openai/traceopenai.goparseUsageTokens(): generic _tokens_details handling does capture audio tokens
  • trace/contrib/openai/traceopenai_test.go — no tests for audio modality
  • examples/internal/openai-v2/main.go — no audio example

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions