[bot] Mistral: Conversations API server-side tool executions not decomposed into child TOOL spans

## Summary

The Mistral Conversations API (`client.beta.conversations.start()`, `.append()`, etc.) was recently instrumented (per #273), but server-side tool executions within conversation responses are not decomposed into child `SpanTypeAttribute.TOOL` spans. Tool execution entries (code interpreter, web search, image generation, document library) appear as opaque entries in the span's output array, without individual tool spans that would allow users to drill into each execution.

This is an asymmetry within the Mistral integration itself: the chat and agents paths now create child TOOL spans via `_log_completion_tool_spans()` (per #378), but the conversations finalization path (`_finalize_conversation_response()`) does not call any tool span logic.

## What is missing

`_finalize_conversation_response()` at line 1034 of `py/src/braintrust/integrations/mistral/tracing.py` logs the conversation output and ends the span, but never calls `_log_completion_tool_spans()` or an equivalent:

```python
def _finalize_conversation_response(span, request_metadata, response, start_time):
    response_data = _normalized_mistral_dict(response)
    response_metadata = _conversation_response_data_to_metadata(response_data)
    usage = response_data.get("usage") if response_data else None
    _log_and_end_span(
        span,
        output=_conversation_outputs_data(response_data),  # Full outputs array — tool executions are opaque
        metrics=_merge_metrics(start_time, usage),
        metadata={**request_metadata, **response_metadata},
    )
```

Contrast with the chat/agents paths (lines 1025 and 1097) which call `_log_completion_tool_spans(response_data, parent_span=span)` before `_log_and_end_span()`.

### Conversation outputs include tool executions

The Mistral Conversations API returns an `outputs` array containing mixed entry types:
- Message entries (assistant text responses)
- Tool execution entries (code interpreter output, web search results, image generation results)
- Function call entries (custom tool invocations)
- Agent handoff entries

Each tool execution entry in `outputs` has a type (e.g., `tool_execution`), the tool name, input/output, and status. These should be decomposed into child TOOL spans matching how the chat/agents paths now work.

### Comparison within the Mistral integration

| Mistral API surface | Tool calls in output? | Child TOOL spans? |
|---|---|---|
| `client.chat.complete()` / `.stream()` | Yes | **Yes** (via `_log_completion_tool_spans`) |
| `client.agents.complete()` / `.stream()` | Yes | **Yes** (via `_log_completion_tool_spans`) |
| `client.beta.conversations.start()` / `.append()` | Yes (in `outputs` array) | **No** |

### Comparison with other providers' agentic surfaces

| Provider | Agentic surface | Tool span decomposition? |
|---|---|---|
| **OpenAI** (Responses API) | `responses.create()` | Yes |
| **Anthropic** (Managed Agents) | `beta.sessions.events.stream()` | Yes |
| **Google GenAI** (Interactions) | `interactions.create()` | Yes |
| **Mistral** (Conversations) | `beta.conversations.start()` | **No** |

## Minimum fix

1. Add a `_log_conversation_tool_spans()` function (or adapt `_log_completion_tool_spans()`) that iterates over conversation `outputs` entries and creates child TOOL spans for tool execution entries
2. Call it from `_finalize_conversation_response()` before `_log_and_end_span()`
3. Apply the same logic to the streaming conversation aggregation path
4. Add VCR-backed test for a conversation with server-side tool execution

## Braintrust docs status

**not_found** — The [Mistral integration page](https://www.braintrust.dev/docs/integrations/ai-providers/mistral) does not mention the Conversations API or tool span decomposition for conversations.

## Upstream sources

- Mistral Conversations API: https://docs.mistral.ai/api/endpoint/beta/conversations
- Mistral Agents & Conversations guide: https://docs.mistral.ai/agents/agents
- Supported server-side tools in conversations: web search, web search premium, code interpreter, image generation, document library, custom function tools
- Tool execution entries appear in the `outputs` array with execution details

## Local files inspected

- `py/src/braintrust/integrations/mistral/tracing.py`:
  - `_finalize_conversation_response()` (line 1034) — does NOT call `_log_completion_tool_spans()` or equivalent
  - `_log_completion_tool_spans()` (line 990) — exists and works for chat/agents; not called for conversations
  - `_conversation_outputs_data()` (line 459) — returns the raw `outputs` array without tool span extraction
  - `_aggregate_conversation_events()` (line 919) — streaming aggregation; no tool span logic
- `py/src/braintrust/integrations/mistral/test_mistral.py`:
  - `test_wrap_mistral_chat_complete_tool_spans` (line 249) — validates chat tool spans exist
  - No equivalent test for conversations tool spans

Provider	Agentic surface	Tool span decomposition?
OpenAI (Responses API)	`responses.create()`	Yes
Anthropic (Managed Agents)	`beta.sessions.events.stream()`	Yes
Google GenAI (Interactions)	`interactions.create()`	Yes
Mistral (Conversations)	`beta.conversations.start()`	No

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bot] Mistral: Conversations API server-side tool executions not decomposed into child TOOL spans #387

Summary

What is missing

Conversation outputs include tool executions

Comparison within the Mistral integration

Comparison with other providers' agentic surfaces

Minimum fix

Braintrust docs status

Upstream sources

Local files inspected

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mistral API surface	Tool calls in output?	Child TOOL spans?
`client.chat.complete()` / `.stream()`	Yes	Yes (via `_log_completion_tool_spans`)
`client.agents.complete()` / `.stream()`	Yes	Yes (via `_log_completion_tool_spans`)
`client.beta.conversations.start()` / `.append()`	Yes (in `outputs` array)	No

[bot] Mistral: Conversations API server-side tool executions not decomposed into child TOOL spans #387

Description

Summary

What is missing

Conversation outputs include tool executions

Comparison within the Mistral integration

Comparison with other providers' agentic surfaces

Minimum fix

Braintrust docs status

Upstream sources

Local files inspected

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions