Summary
The OpenAI Chat Completions tracer does not capture the modalities and audio request parameters in metadata, and does not aggregate delta.audio from streaming chunks. When using GPT-4o audio models (e.g. gpt-4o-audio-preview) with audio output enabled, the trace is missing both the audio configuration and the audio response data in streaming mode.
The non-streaming path captures audio response data correctly because it passes through the full choices array as-is. The streaming aggregation drops it.
What is missing
1. Request parameters modalities and audio not captured in metadata
In trace/contrib/openai/chatcompletions.go (lines 51–76), the metadata fields list does not include modalities or audio:
metadataFields := []string{
"model",
"frequency_penalty",
"logit_bias",
"logprobs",
"max_tokens",
"max_completion_tokens",
"n",
"presence_penalty",
"reasoning_effort",
"response_format",
"seed",
"service_tier",
"stop",
"stream",
"stream_options",
"temperature",
"top_p",
"top_logprobs",
"tools",
"tool_choice",
"parallel_tool_calls",
"user",
"functions",
"function_call",
}
When a user enables audio output with modalities: ["text", "audio"] and configures audio: { voice: "alloy", format: "wav" }, these parameters are silently dropped from the span metadata. Users cannot see which voice or audio format was requested in their traces.
2. Streaming aggregation does not handle delta.audio
In postprocessStreamingResults() (lines 170–275), the delta processing handles three fields:
delta.role (line 191)
delta.content (line 202)
delta.tool_calls (lines 207–248)
It does not handle delta.audio, which contains audio response chunks (id, transcript, data, expires_at). When streaming audio responses, the audio data and transcript are silently dropped from the aggregated output.
3. Non-streaming path works correctly
handleChatCompletionResponse() (lines 290–328) passes the full choices array to braintrust.output_json, preserving message.audio data including id, data (base64 audio), transcript, and expires_at.
4. Audio token metrics are partially captured
The parseUsageTokens function handles *_tokens_details generically — so prompt_tokens_details.audio_tokens and completion_tokens_details.audio_tokens ARE captured as prompt_audio_tokens and completion_audio_tokens respectively. The gap is only in the request parameters and streaming content aggregation.
5. Comparable integrations capture modality config
The Responses API tracer in this repo captures the full tools array (which includes built-in tools like web_search) and the reasoning config. Audio config for Chat Completions is an analogous modality configuration that controls output format but is not captured.
The Braintrust Gemini integration docs mention audio as a supported input modality. The OpenAI audio feature is a comparable generative surface.
Impact
- Audio generation traces don't show which voice or format was requested
- Streaming audio Chat Completions produce output with empty content and no audio data (the
audio field with transcript and data is lost)
- Users of GPT-4o-audio-preview and GPT-4o-mini-audio-preview lose audio observability in streaming mode
- The
openai-go SDK exposes ChatCompletionAudioParam (with Format and Voice) and ChatCompletionAudio (response) types, confirming this is a stable upstream feature
Braintrust docs status
Braintrust docs mention audio support for Gemini (multimodal content including audio transcription) but do not mention OpenAI audio modality specifically. Status: not_found for OpenAI Chat Completions audio instrumentation.
Upstream sources
Braintrust docs sources
Local repo files inspected
trace/contrib/openai/chatcompletions.go — metadata fields (lines 51–76): modalities and audio absent; postprocessStreamingResults() (lines 170–275): handles delta.content, delta.tool_calls, delta.role but not delta.audio
trace/contrib/openai/chatcompletions.go — handleChatCompletionResponse() (lines 290–328): non-streaming path passes full choices correctly
trace/contrib/openai/traceopenai.go — parseUsageTokens(): generic _tokens_details handling does capture audio tokens
trace/contrib/openai/traceopenai_test.go — no tests for audio modality
examples/internal/openai-v2/main.go — no audio example
Summary
The OpenAI Chat Completions tracer does not capture the
modalitiesandaudiorequest parameters in metadata, and does not aggregatedelta.audiofrom streaming chunks. When using GPT-4o audio models (e.g.gpt-4o-audio-preview) with audio output enabled, the trace is missing both the audio configuration and the audio response data in streaming mode.The non-streaming path captures audio response data correctly because it passes through the full
choicesarray as-is. The streaming aggregation drops it.What is missing
1. Request parameters
modalitiesandaudionot captured in metadataIn
trace/contrib/openai/chatcompletions.go(lines 51–76), the metadata fields list does not includemodalitiesoraudio:When a user enables audio output with
modalities: ["text", "audio"]and configuresaudio: { voice: "alloy", format: "wav" }, these parameters are silently dropped from the span metadata. Users cannot see which voice or audio format was requested in their traces.2. Streaming aggregation does not handle
delta.audioIn
postprocessStreamingResults()(lines 170–275), the delta processing handles three fields:delta.role(line 191)delta.content(line 202)delta.tool_calls(lines 207–248)It does not handle
delta.audio, which contains audio response chunks (id,transcript,data,expires_at). When streaming audio responses, the audio data and transcript are silently dropped from the aggregated output.3. Non-streaming path works correctly
handleChatCompletionResponse()(lines 290–328) passes the fullchoicesarray tobraintrust.output_json, preservingmessage.audiodata includingid,data(base64 audio),transcript, andexpires_at.4. Audio token metrics are partially captured
The
parseUsageTokensfunction handles*_tokens_detailsgenerically — soprompt_tokens_details.audio_tokensandcompletion_tokens_details.audio_tokensARE captured asprompt_audio_tokensandcompletion_audio_tokensrespectively. The gap is only in the request parameters and streaming content aggregation.5. Comparable integrations capture modality config
The Responses API tracer in this repo captures the full
toolsarray (which includes built-in tools likeweb_search) and thereasoningconfig. Audio config for Chat Completions is an analogous modality configuration that controls output format but is not captured.The Braintrust Gemini integration docs mention audio as a supported input modality. The OpenAI audio feature is a comparable generative surface.
Impact
audiofield with transcript and data is lost)openai-goSDK exposesChatCompletionAudioParam(withFormatandVoice) andChatCompletionAudio(response) types, confirming this is a stable upstream featureBraintrust docs status
Braintrust docs mention audio support for Gemini (multimodal content including audio transcription) but do not mention OpenAI audio modality specifically. Status: not_found for OpenAI Chat Completions audio instrumentation.
Upstream sources
modalitiesandaudioparameters)openai-goSDK definesChatCompletionAudioParam(request:Format,Voice) andChatCompletionAudio(response:ID,Data,Transcript,ExpiresAt)gpt-4o-audio-preview,gpt-4o-mini-audio-previewBraintrust docs sources
Local repo files inspected
trace/contrib/openai/chatcompletions.go— metadata fields (lines 51–76):modalitiesandaudioabsent;postprocessStreamingResults()(lines 170–275): handlesdelta.content,delta.tool_calls,delta.rolebut notdelta.audiotrace/contrib/openai/chatcompletions.go—handleChatCompletionResponse()(lines 290–328): non-streaming path passes fullchoicescorrectlytrace/contrib/openai/traceopenai.go—parseUsageTokens(): generic_tokens_detailshandling does capture audio tokenstrace/contrib/openai/traceopenai_test.go— no tests for audio modalityexamples/internal/openai-v2/main.go— no audio example