Skip to content

Python: [Bug]: Foundry Toolbox does not propagate W3C trace context (traceparent/tracestate) to the underlying MCP server, breaking distributed tracing #5547

@cristofima

Description

@cristofima

Description

What happened?

When a Foundry-hosted agent invokes an MCP tool through a Foundry Toolbox connector (FoundryChatClient.get_toolbox(...)), the W3C trace context of the active OpenTelemetry span is not forwarded to the MCP server reached by the toolbox proxy. As a result, the span produced by the downstream MCP service (in our case a Logic Apps "OAuth Identity Passthrough" MCP server) starts a brand-new trace instead of continuing the agent's trace.

In Datadog APM and Datadog LLM Observability we see two disconnected traces:

  • one trace for the Hosted Agent (POST /responses, gen_ai.* spans, etc.),
  • a separate, unlinked trace on the Logic Apps side for the actual tool execution.

The same code path with a direct MCPStreamableHTTPTool (no toolbox proxy) propagates traceparent/tracestate correctly via params._meta of tools/call, as documented in python/samples/02-agents/observability/README.md and implemented in python/packages/core/agent_framework/_mcp.py (_inject_otel_into_mcp_meta). Switching the same agent from tools=[MCPStreamableHTTPTool(...)] to tools=toolbox is enough to lose the linkage.

The toolbox path is server-side: the agent process never opens the MCP connection itself. The Foundry platform proxy at FOUNDRY_AGENT_TOOLSET_ENDPOINT (the same component handled by the .NET FoundryToolboxBearerTokenHandler, see dotnet/src/Microsoft.Agents.AI.Foundry.Hosting/FoundryToolboxBearerTokenHandler.cs) only injects Authorization and Foundry-Features and does not appear to forward traceparent/tracestate headers (or inject them in the MCP params._meta) when calling the remote MCP server.

What did you expect to happen?

The Foundry Toolbox proxy should propagate the active W3C trace context to the underlying MCP server, either:

  • by forwarding the inbound traceparent/tracestate HTTP headers received from the agent process to the MCP server, and/or
  • by injecting them into the MCP params._meta of tools/call, equivalent to what _inject_otel_into_mcp_meta does for direct MCP tools.

That way Datadog APM, Datadog LLM Observability, and Application Insights can show a single end-to-end trace: agent → toolbox proxy → MCP server, instead of two disconnected ones.

Steps to reproduce the issue

  1. Build a Foundry hosted agent with agent-framework-foundry-hosting==1.0.0a260428, agent-framework-core==1.2.1, agent-framework-foundry==1.2.1.
  2. Configure OpenTelemetry with the W3C trace context propagator (the default) and an OTLP/HTTP exporter to Datadog (or any backend with distributed tracing).
  3. Wire two variants of the same agent:
    • Variant A (works): tools=[MCPStreamableHTTPTool(name="...", url=MCP_URL)] pointing directly to the MCP server.
    • Variant B (broken): toolbox = await client.get_toolbox(TOOLBOX_NAME); tools=toolbox against a Foundry Toolbox connector that points to the same MCP server.
  4. Send a prompt that triggers the tool in both variants.
  5. In Datadog APM, observe:
    • Variant A: a single trace with the agent span as parent and the MCP server span as child.
    • Variant B: the agent span and the MCP server span end up in two unrelated traces; the MCP server's traceparent is freshly generated.

Code Sample

Same agent, two tool wirings. Only the tools= value differs.

from agent_framework import MCPStreamableHTTPTool
from agent_framework_foundry import FoundryChatClient
from agent_framework_foundry_hosting import ResponsesHostServer, InMemoryResponseProvider
from azure.identity.aio import ManagedIdentityCredential

Variant A — direct MCP tool, W3C context flows end-to-end.

mcp_tool = MCPStreamableHTTPTool(name="research_tools", url=MCP_URL)

Variant B — Foundry Toolbox proxy, W3C context is dropped at the proxy.

toolbox = await chat_client.get_toolbox(TOOLBOX_NAME, version=TOOLBOX_VERSION)

async def main() -> None:
    async with (
        ManagedIdentityCredential() as credential,
        FoundryChatClient(
            project_endpoint=PROJECT_ENDPOINT,
            model=MODEL_DEPLOYMENT_NAME,
            credential=credential,
            allow_preview=True,
        ).as_agent(
            name="ea-ai-hosted-agent-baseline-python",
            instructions=SYSTEM_PROMPT,
            tools=toolbox,            # swap to [mcp_tool] to compare
            default_options={
                "store": False,
                "tool_choice": "required",
                "temperature": 0.2,
                "max_tokens": 2000,
            },
        ) as agent,
    ):
        server = ResponsesHostServer(agent, store=InMemoryResponseProvider())
        await server.run_async()

OpenTelemetry is configured at process start with the default W3C TraceContextTextMapPropagator and an OTLP/HTTP exporter to Datadog agentless intake.

Error Messages / Stack Traces

There is no exception. The symptom is a missing parent/child link in the backend.

In Datadog APM, the agent's run shows a gen_ai.* span tree ending at the mcp.tool.call (or equivalent) span. A separate trace, with no parent, contains the Logic Apps spans for the actual MCP execution. The two carry the same wall-clock window and the same caller identity but no shared trace_id.

Inbound HTTP headers received by the MCP server confirm the issue: traceparent is either absent or contains a brand-new trace-id that does not match the agent's outbound traceparent.

Package Versions

agent-framework-core: 1.2.1, agent-framework-foundry: 1.2.1, agent-framework-openai: 1.2.1, agent-framework-foundry-hosting: 1.0.0a260428, mcp: >=1.24,<2, azure-ai-agentserver-core: 2.0.0b3, azure-ai-agentserver-responses: 1.0.0b5, opentelemetry-api: >=1.40,<1.41, opentelemetry-sdk: >=1.40,<1.41, opentelemetry-exporter-otlp-proto-http: >=1.40,<1.41, azure-monitor-opentelemetry-exporter: >=1.0.0b51

Python Version

Python 3.12

Additional Context

  • Backend used to detect the missing link: Datadog APM and Datadog LLM Observability. The same gap is observable in Application Insights end-to-end transaction view.
  • The MCP server behind the toolbox is a Logic Apps "OAuth Identity Passthrough" workflow protected by Entra ID (auth code + PKCE). It logs the inbound traceparent header on every request, which is how we confirmed the value is not the one emitted by the agent.
  • Direct MCP path works as advertised in python/samples/02-agents/observability/README.md: "Whenever there is an active OpenTelemetry span context, Agent Framework automatically propagates trace context to MCP servers via the params._meta field of tools/call requests." The toolbox path bypasses that injection because the tools/call is issued by the Foundry proxy, not by the agent process.
  • Possible fix surfaces:
    • Have the toolbox proxy forward traceparent/tracestate HTTP headers received from the agent to the downstream MCP server.
    • Or have it re-inject the trace context into the MCP params._meta from the inbound HTTP headers, mirroring what _inject_otel_into_mcp_meta does on the client side.
  • Workaround we are using meanwhile: when full distributed tracing is required for a given environment, switch to the direct MCPStreamableHTTPTool path and forgo the toolbox-managed OAuth identity passthrough. This is not viable in production for OAuth-protected MCP servers.
  • Related: ADR docs/decisions/0025-foundry-toolbox-support.md covers toolbox span enrichment but does not address span linking across the proxy boundary.
  • Happy to share a sanitized HAR or OTel export from both variants privately if useful.

Metadata

Metadata

Assignees

Labels

Type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions