Skip to content

[bot] Mistral: Conversations API server-side tool executions not decomposed into child TOOL spans #387

@braintrust-bot

Description

@braintrust-bot

Summary

The Mistral Conversations API (client.beta.conversations.start(), .append(), etc.) was recently instrumented (per #273), but server-side tool executions within conversation responses are not decomposed into child SpanTypeAttribute.TOOL spans. Tool execution entries (code interpreter, web search, image generation, document library) appear as opaque entries in the span's output array, without individual tool spans that would allow users to drill into each execution.

This is an asymmetry within the Mistral integration itself: the chat and agents paths now create child TOOL spans via _log_completion_tool_spans() (per #378), but the conversations finalization path (_finalize_conversation_response()) does not call any tool span logic.

What is missing

_finalize_conversation_response() at line 1034 of py/src/braintrust/integrations/mistral/tracing.py logs the conversation output and ends the span, but never calls _log_completion_tool_spans() or an equivalent:

def _finalize_conversation_response(span, request_metadata, response, start_time):
    response_data = _normalized_mistral_dict(response)
    response_metadata = _conversation_response_data_to_metadata(response_data)
    usage = response_data.get("usage") if response_data else None
    _log_and_end_span(
        span,
        output=_conversation_outputs_data(response_data),  # Full outputs array — tool executions are opaque
        metrics=_merge_metrics(start_time, usage),
        metadata={**request_metadata, **response_metadata},
    )

Contrast with the chat/agents paths (lines 1025 and 1097) which call _log_completion_tool_spans(response_data, parent_span=span) before _log_and_end_span().

Conversation outputs include tool executions

The Mistral Conversations API returns an outputs array containing mixed entry types:

  • Message entries (assistant text responses)
  • Tool execution entries (code interpreter output, web search results, image generation results)
  • Function call entries (custom tool invocations)
  • Agent handoff entries

Each tool execution entry in outputs has a type (e.g., tool_execution), the tool name, input/output, and status. These should be decomposed into child TOOL spans matching how the chat/agents paths now work.

Comparison within the Mistral integration

Mistral API surface Tool calls in output? Child TOOL spans?
client.chat.complete() / .stream() Yes Yes (via _log_completion_tool_spans)
client.agents.complete() / .stream() Yes Yes (via _log_completion_tool_spans)
client.beta.conversations.start() / .append() Yes (in outputs array) No

Comparison with other providers' agentic surfaces

Provider Agentic surface Tool span decomposition?
OpenAI (Responses API) responses.create() Yes
Anthropic (Managed Agents) beta.sessions.events.stream() Yes
Google GenAI (Interactions) interactions.create() Yes
Mistral (Conversations) beta.conversations.start() No

Minimum fix

  1. Add a _log_conversation_tool_spans() function (or adapt _log_completion_tool_spans()) that iterates over conversation outputs entries and creates child TOOL spans for tool execution entries
  2. Call it from _finalize_conversation_response() before _log_and_end_span()
  3. Apply the same logic to the streaming conversation aggregation path
  4. Add VCR-backed test for a conversation with server-side tool execution

Braintrust docs status

not_found — The Mistral integration page does not mention the Conversations API or tool span decomposition for conversations.

Upstream sources

Local files inspected

  • py/src/braintrust/integrations/mistral/tracing.py:
    • _finalize_conversation_response() (line 1034) — does NOT call _log_completion_tool_spans() or equivalent
    • _log_completion_tool_spans() (line 990) — exists and works for chat/agents; not called for conversations
    • _conversation_outputs_data() (line 459) — returns the raw outputs array without tool span extraction
    • _aggregate_conversation_events() (line 919) — streaming aggregation; no tool span logic
  • py/src/braintrust/integrations/mistral/test_mistral.py:
    • test_wrap_mistral_chat_complete_tool_spans (line 249) — validates chat tool spans exist
    • No equivalent test for conversations tool spans

Metadata

Metadata

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions