[bot] Groq streaming drops `reasoning` content from DeepSeek R1 and Qwen 3 reasoning models

## Summary

The Groq API returns a `reasoning` field in chat completion responses when using reasoning-capable models (DeepSeek R1, Qwen 3 with `reasoning_format: "parsed"`), but the Groq plugin's streaming aggregation silently drops all reasoning content. The `aggregateGroqChatCompletionChunks` function delegates to `aggregateChatCompletionChunks` from the OpenAI plugin, which only accumulates `delta.content`, `delta.refusal`, and `delta.tool_calls`; the `delta.reasoning` field falls through and is lost. This is a direct parity gap with Anthropic, Google GenAI, AI SDK, and Cohere reasoning instrumentation in this repo, and the same class of bug as OpenRouter #1883.

## What instrumentation is missing

### Streaming aggregation: reasoning content silently dropped

In `js/src/instrumentation/plugins/groq-plugin.ts`, `aggregateGroqChatCompletionChunks` (line 110) delegates to `aggregateChatCompletionChunks` from `openai-plugin.ts` (lines 475–570). That function only captures three delta fields:

```typescript
if (delta.content) {
  content = (content || "") + delta.content;
}
if (delta.refusal) {
  refusal = (refusal || "") + delta.refusal;
}
if (delta.tool_calls) { /* ... */ }
// delta.reasoning — silently dropped
```

The final aggregated output (lines 554–569) only includes `role`, `content`, `refusal`, and `tool_calls` — no `reasoning` field.

### Vendor types: no reasoning field on delta

The Groq chunk type (`js/src/vendor-sdk-types/groq.ts`, line 34) extends `OpenAIChatCompletionChunk`, whose `OpenAIChatDelta` (in `openai-common.ts`, lines 110–117) declares `role`, `content`, `refusal`, `tool_calls`, and `finish_reason`. While `[key: string]: unknown` allows extra fields at runtime, the aggregation code never reads `delta.reasoning`.

### Non-streaming: reasoning passes through in raw output

For non-streaming responses, the Groq plugin returns `result.choices` directly, so `message.reasoning` would be present in the raw output. The gap is specifically in streaming aggregation.

## Upstream API format

Groq surfaces reasoning content through the `reasoning` field when `reasoning_format: "parsed"`:

- **Non-GPT-OSS models** (Qwen 3 32B): `reasoning_format` parameter with `"raw"` (think tags in content), `"parsed"` (separate `reasoning` field), or `"hidden"` modes
- **GPT-OSS models** (DeepSeek R1 distills): `include_reasoning: true` returns `reasoning` field in messages

In streaming, `delta.reasoning` contains the reasoning text chunks. In `"raw"` mode, reasoning comes through `delta.content` with `<think>` tags (captured correctly). The gap only affects `"parsed"` mode where `delta.reasoning` is a separate field.

## Comparison with other providers in this repo

| Provider | Reasoning content captured in streaming | Issue |
|----------|----------------------------------------|-------|
| **Anthropic** | `thinking_delta` aggregated | ✅ Working |
| **Google GenAI** | `thought` parts handled | ✅ Working |
| **AI SDK** | `reasoning-delta` chunks aggregated | ✅ Working |
| **Cohere** | `thinking` content blocks aggregated | ✅ Working |
| **Mistral** | `thinking` chunks aggregated | ✅ Fixed in #1857 |
| **OpenRouter** | **Silently dropped** | ❌ Open #1883 |
| **Groq** | **Silently dropped** | ❌ **This issue** |

## Braintrust docs status

`not_found` — The Braintrust Groq integration page at https://www.braintrust.dev/docs/integrations/ai-providers/groq does not mention reasoning content handling.

## Upstream references

- Groq Reasoning docs: https://console.groq.com/docs/reasoning
- Groq API reference: https://console.groq.com/docs/api-reference
- Affected models: DeepSeek-R1-Distill-Llama-70B, DeepSeek-R1-Distill-Qwen-32B, Qwen 3 32B (with `reasoning_format: "parsed"`)
- `reasoning_format` parameter: `"raw"` | `"parsed"` | `"hidden"`
- `include_reasoning` parameter: boolean (for GPT-OSS models)

## Local files inspected

- `js/src/instrumentation/plugins/groq-plugin.ts` (line 110: `aggregateGroqChatCompletionChunks` delegates to OpenAI's aggregation)
- `js/src/instrumentation/plugins/openai-plugin.ts` (lines 475–570: `aggregateChatCompletionChunks` only captures `content`, `refusal`, `tool_calls`)
- `js/src/vendor-sdk-types/groq.ts` (line 34: `GroqChatCompletionChunk` extends `OpenAIChatCompletionChunk`)
- `js/src/vendor-sdk-types/openai-common.ts` (lines 110–117: `OpenAIChatDelta` has no `reasoning` field)
- `e2e/scenarios/groq-instrumentation/` (no reasoning test scenarios)

Provider	Reasoning content captured in streaming	Issue
Anthropic	`thinking_delta` aggregated	✅ Working
Google GenAI	`thought` parts handled	✅ Working
AI SDK	`reasoning-delta` chunks aggregated	✅ Working
Cohere	`thinking` content blocks aggregated	✅ Working
Mistral	`thinking` chunks aggregated	✅ Fixed in #1857
OpenRouter	Silently dropped	❌ Open #1883
Groq	Silently dropped	❌ This issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bot] Groq streaming drops `reasoning` content from DeepSeek R1 and Qwen 3 reasoning models #1911

Summary

What instrumentation is missing

Streaming aggregation: reasoning content silently dropped

Vendor types: no reasoning field on delta

Non-streaming: reasoning passes through in raw output

Upstream API format

Comparison with other providers in this repo

Braintrust docs status

Upstream references

Local files inspected

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bot] Groq streaming drops reasoning content from DeepSeek R1 and Qwen 3 reasoning models #1911

Description

Summary

What instrumentation is missing

Streaming aggregation: reasoning content silently dropped

Vendor types: no reasoning field on delta

Non-streaming: reasoning passes through in raw output

Upstream API format

Comparison with other providers in this repo

Braintrust docs status

Upstream references

Local files inspected

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[bot] Groq streaming drops `reasoning` content from DeepSeek R1 and Qwen 3 reasoning models #1911