Skip to content

feat(mcp): add MCPOperation, resource/prompt instrumentation, transport context (GAPs 2-4)#272

Open
adityamehra wants to merge 17 commits intomainfrom
fix/mcp-gaps-2-3-4
Open

feat(mcp): add MCPOperation, resource/prompt instrumentation, transport context (GAPs 2-4)#272
adityamehra wants to merge 17 commits intomainfrom
fix/mcp-gaps-2-3-4

Conversation

@adityamehra
Copy link
Copy Markdown
Contributor

Summary

Builds on PR #268 to close the remaining gaps between SDOT's FastMCP instrumentation and the OTel MCP semantic conventions.

  • GAP 2 — Missing semconv attributes: Add MCPOperation dataclass with all required/recommended attributes (jsonrpc.request.id, mcp.resource.uri, gen_ai.prompt.name, rpc.response.status_code, server.address/port, client.address/port, network.protocol.*). Rename mcp_server_namesdot_mcp_server_name.
  • GAP 3 — Transport detection & context bridge: Dynamically detect network.transport (pipe vs tcp) instead of hardcoding. Add MCPRequestContext ContextVar to propagate server-side metadata (jsonrpc request id, transport) from the transport instrumentor to the server instrumentor.
  • GAP 4 — resources/read and prompts/get instrumentation: Instrument Client.read_resource, Client.get_prompt, FastMCP.read_resource, FastMCP.get_prompt with proper MCPOperation lifecycle.

Additionally:

  • Convert tools/list from Step to MCPOperation for consistent span naming
  • DRY client instrumentor via _traced_mcp_operation() helper
  • Consolidate transport detection into shared detect_transport() in utils
  • Trim MCPRequestContext to only populated fields
  • Add e2e examples for prompt and resource operations
  • Update CHANGELOGs for both packages

Sample Telemetry

All telemetry captured by running the e2e examples under examples/e2e/.

Spans — Tool Operations

tools/list                     (SpanKind.CLIENT)
  mcp.method.name:      "tools/list"
  network.transport:    "pipe"
  gen_ai.system:        "mcp"
  gen_ai.agent.name:    "mcp.client"

tools/call add                 (SpanKind.CLIENT)
  mcp.method.name:      "tools/call"
  gen_ai.operation.name: "execute_tool"
  gen_ai.tool.name:     "add"
  gen_ai.tool.call.arguments: "{\"a\": 15, \"b\": 27}"
  network.transport:    "pipe"
  gen_ai.system:        "mcp"

tools/call divide              (SpanKind.CLIENT, error)
  mcp.method.name:      "tools/call"
  gen_ai.operation.name: "execute_tool"
  gen_ai.tool.name:     "divide"
  error.type:           "ToolError"
  gen_ai.finish_reason: "failed"
  gen_ai.finish_reason_description: "Error calling tool 'divide': Cannot divide by zero"

Spans — Prompt Operations

prompts/get weather_forecast   (SpanKind.CLIENT)
  mcp.method.name:      "prompts/get"
  gen_ai.prompt.name:   "weather_forecast"
  network.transport:    "pipe"
  gen_ai.system:        "mcp"
  gen_ai.agent.name:    "mcp.client"

prompts/get travel_packing_advice (SpanKind.CLIENT)
  mcp.method.name:      "prompts/get"
  gen_ai.prompt.name:   "travel_packing_advice"
  network.transport:    "pipe"
  gen_ai.system:        "mcp"

Spans — Resource Operations

resources/read system://info   (SpanKind.CLIENT)
  mcp.method.name:      "resources/read"
  mcp.resource.uri:     "system://info"
  network.transport:    "pipe"
  gen_ai.system:        "mcp"
  gen_ai.agent.name:    "mcp.client"

resources/read system://env/HOME (SpanKind.CLIENT)
  mcp.method.name:      "resources/read"
  mcp.resource.uri:     "system://env/HOME"
  network.transport:    "pipe"
  gen_ai.system:        "mcp"

Spans — Session

invoke_agent mcp.client        (SpanKind.CLIENT)
  gen_ai.agent.name:    "mcp.client"
  gen_ai.agent.type:    "mcp_client"
  gen_ai.framework:     "fastmcp"
  gen_ai.system:        "mcp"
  gen_ai.operation.name: "mcp.client_session"

Metrics

mcp.client.operation.duration  (Histogram, unit: s)
  Description: Duration of MCP request or notification as observed
               on the sender from the time it was sent until response
               or ack is received
  Attributes:  mcp.method.name, gen_ai.tool.name, gen_ai.operation.name,
               network.transport, error.type

mcp.tool.output.size           (Histogram, unit: {byte})
  Description: Size of the tool call output in bytes. This output
               typically becomes part of the LLM input context.
  Attributes:  mcp.method.name, gen_ai.tool.name, gen_ai.operation.name,
               network.transport

Key Files Changed

File Change
util/.../types.py Add MCPOperation dataclass, refactor MCPToolCall MRO
util/.../handler.py Add start/stop/fail_mcp_operation lifecycle methods
util/.../emitters/span.py MCPOperation dispatch in on_start/on_end/on_error
util/.../emitters/metrics.py MCPOperation dispatch, unified _record_mcp_operation_metrics
fastmcp/_mcp_context.py New: MCPRequestContext ContextVar (3 fields)
fastmcp/client_instrumentor.py DRY refactor, read_resource/get_prompt hooks, shared detect_transport
fastmcp/server_instrumentor.py read_resource/get_prompt hooks, _enrich_from_request_context
fastmcp/transport_instrumentor.py Context population, transport detection, baggage injection
fastmcp/utils.py Add detect_transport()
examples/e2e/prompt/ New: weather prompt e2e example (wttr.in)
examples/e2e/resource/ New: system dashboard resource e2e example

Breaking Changes

  • MCPToolCall.mcp_server_name renamed to MCPToolCall.sdot_mcp_server_name (semconv field: sdot.mcp.server_name)
  • MCPOperation field name is now target (to avoid MRO conflict with ToolCall.name)

Test Plan

  • pytest util/opentelemetry-util-genai/tests/ — 197 passed
  • pytest instrumentation-genai/opentelemetry-instrumentation-fastmcp/tests/ — 74 passed
  • ruff check + ruff format — clean
  • E2E: examples/e2e/run_demo.py — tools/list, tools/call spans verified
  • E2E: examples/e2e/prompt/client.py — prompts/get spans verified
  • E2E: examples/e2e/resource/client.py — resources/read spans verified
  • Telemetry sent to otel-tui at localhost:4317

Made with Cursor

…conventions

MCPToolCall spans now use {mcp.method.name} {tool_name} format (e.g.
"tools/call add") with CLIENT/SERVER SpanKind, matching the OTel MCP
semconv spec. Previously used "execute_tool {tool_name}" with INTERNAL.

Added explicit bucket boundaries [0.01..300] to all MCP duration
histograms per semconv specification.

Made-with: Cursor
…nsport context (GAPs 2-4)

- Add MCPOperation dataclass for non-tool MCP operations (resources/read, prompts/get, tools/list)
- Add new semconv attributes: jsonrpc.request.id, mcp.resource.uri, gen_ai.prompt.name,
  rpc.response.status_code, server.address/port, client.address/port, network.protocol.*
- Rename mcp_server_name to sdot_mcp_server_name for clarity
- Add MCPRequestContext ContextVar for transport-to-server metadata propagation
- Detect network.transport dynamically (pipe vs tcp) instead of hardcoding
- Instrument Client.read_resource, Client.get_prompt, FastMCP.read_resource, FastMCP.get_prompt
- Convert tools/list from Step to MCPOperation with proper span naming
- Add e2e examples for prompt (wttr.in weather) and resource (system dashboard) operations
- Update TelemetryHandler, SpanEmitter, MetricsEmitter with MCPOperation dispatch

Made-with: Cursor
…ction, trim MCPRequestContext

- Extract _traced_mcp_operation() helper in client_instrumentor to DRY the
  identical start/stop/fail + duration pattern across list_tools, read_resource,
  and get_prompt wrappers
- Consolidate _detect_client_transport and _detect_server_transport into a
  single detect_transport() in utils.py, used by both client and transport
  instrumentors
- Remove 7 unused fields from MCPRequestContext (network_protocol_name/version,
  client_address/port, server_address/port, baggage) that were never populated
  by the transport instrumentor
- Trim _enrich_from_request_context to only copy the 2 fields actually set
  (jsonrpc_request_id, network_transport)
- Net reduction: -65 lines across 5 files

Made-with: Cursor
@adityamehra adityamehra requested review from a team as code owners April 14, 2026 23:42
HIGH-1: Hook render_prompt (not get_prompt) for prompts/get on server.
  FastMCP 3.x routes MCP prompts/get to render_prompt; get_prompt is
  only the internal lookup. Now hooks both with graceful fallback for
  2.x compatibility via _try_wrap.

HIGH-2: Skip SDOT server spans when FastMCP >= 3.x has native telemetry.
  FastMCP 3.x ships server_span() in fastmcp.server.telemetry that
  already creates SERVER spans for call_tool, read_resource, and
  render_prompt. _has_native_telemetry() detects this and our wrappers
  pass through to avoid duplicate instrumentation.

MED-1: Fix server-side transport detection.
  detect_transport() now falls back to fastmcp.settings.transport when
  the instance has no transport attribute (which is the case for the
  low-level mcp.server.lowlevel.Server in _handle_request).

MED-2: Fix MCPToolCall error path missing generic duration metric.
  on_error for MCPToolCall now records both mcp.*.operation.duration
  and gen_ai.client.operation.duration, using duration_s (consistent
  with _record_mcp_operation_metrics) instead of relying on end_time.

Also: add upper version bound (fastmcp >= 2.0.0, < 4) to _instruments
and pyproject.toml. New tests for render_prompt, read_resource,
native-telemetry dedupe, transport detection, and metrics error path.

Tests: 199 util passed, 88 fastmcp passed (up from 74).
Made-with: Cursor
duration, attributes=metric_attrs, context=context
)
return

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handles() missing MCPOperation → non-tool MCP metrics silently dropped

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed.

The transport_instrumentor is a temporary bridge for mcp SDK v1.x which
lacks native OTel context propagation. Native support has landed on the
upstream main branch (PRs #2298, #2381) targeting mcp v2.x.

Move detailed upstream tracking and migration plan to README.rst, keep
the module docstring minimal with a pointer to the README.

Made-with: Cursor
MCPOperation was missing from the handles() type check, causing the
CompositeEmitter to silently skip MetricsEmitter for plain MCP operations
(tools/list, resources/read, prompts/get). MCPToolCall was unaffected
(inherits from ToolCall) but non-tool operations lost mcp.client/server
.operation.duration metrics.

Add tests for handles(), on_end metrics for client/server MCPOperation,
and mcp.method.name attribute correctness across all MCP operation types.

Made-with: Cursor
Base automatically changed from fix/mcp-semconv-span-naming to main April 16, 2026 16:48
Merge origin/main into fix/mcp-gaps-2-3-4, resolving conflicts in:
- CHANGELOGs: combine our MCP entries with main's eval cost bucket entry
- instruments.py: accept main's _GEN_AI_EVALUATION_COST_BUCKETS
- span.py: keep MCPOperation import and our on_start dispatch (main's
  _start_tool_call MCPToolCall dispatch is superseded by MCPOperation path)
- test_tool_call_span_attributes.py: keep our full test suite with
  MCPOperation and metrics tests

Made-with: Cursor
Brings in session duration metrics (#273) and evals error-resilience
test updates (#276). Resolved conflict in server_instrumentor.py by
keeping both our GAP 2-3-4 instrumentation (read_resource,
render_prompt, native telemetry detection) and main's new
_server_run_wrapper for mcp.server.session.duration tracking.

Made-with: Cursor
…kHandler change

langgraph 1.1.7 introduced GraphCallbackHandler as a required base class
for all callback handlers passed to graphs. This breaks our
LangchainCallbackHandler which inherits from BaseCallbackHandler.
Pin to <= 1.1.6 until we add proper GraphCallbackHandler support.

Made-with: Cursor
…her agent example

- Client instrumentor: explicitly attach/detach span context around wrapped
  calls so TransportInstrumentor.propagate.inject() propagates our trace
  instead of FastMCP's native telemetry spans (which use start_as_current_span)
- Server instrumentor: use _try_wrap for FastMCP 2.x/3.x compat, remove
  _has_native_telemetry check (always instrument)
- Span emitter: remove redundant remote_parent_context handling (transport
  instrumentor already attaches context via context.attach)
- E2E examples: use PythonStdioTransport with explicit env to work around
  MCP SDK's default env allowlist stripping OTEL_* vars from subprocesses
- Add weather_agent example with --manual/zero-code instrumentation modes
- Pin langgraph <=1.1.6 in CI to avoid breaking GraphCallbackHandler change

Made-with: Cursor
…urface

Add version compatibility matrix (0.1.x for FastMCP 2.x, 0.2.0 for
FastMCP 3.x), document the expanded server-side 3.x API surface, and
update references to both FastMCP repos and OTel MCP semconv.

Made-with: Cursor
- Bump version to 0.2.0, pin fastmcp >= 3.0.0, < 4
- server_instrumentor: wrap FastMCP.call_tool/read_resource/render_prompt
  (3.x API surface), remove 2.x ToolManager fallbacks
- Add ContextVar reentrancy guard for FastMCP 3.x call_tool recursion
- Fix compatibility matrix: util-genai <= 0.1.9 for 0.1.1 (no 0.1.10 on PyPI)
- Forward FASTMCP_* env vars to server subprocess in weather_agent

Made-with: Cursor
The handler's _push_current_span (PR #235) skips context_api.attach()
in async contexts to avoid cross-task ValueError.  Since FastMCP is
async, propagate.inject() in the transport instrumentor never sees
the client span, producing an empty traceparent — server spans are
emitted as separate traces instead of children of the client span.

Add _activate_span() context manager that does a scoped attach/detach
around the wrapped call.  This is safe because both operations happen
in the same asyncio.Task (the exact cross-task scenario PR #235
protects against cannot occur here).

Applied to all client operations: call_tool, list_tools, read_resource,
get_prompt.

Also pass FASTMCP_* env vars to the server subprocess in client.py
example so FASTMCP_TELEMETRY_OPT_OUT reaches the server.

Made-with: Cursor
…r tool calls

Tool name (e.g. "add") was incorrectly passed as request_model to
_get_metric_attributes, populating gen_ai.request.model on the
gen_ai.client.operation.duration metric. Use gen_ai.tool.name instead.

Also gate MCPToolCall duration recording to client-side only — the
server already emits mcp.server.operation.duration via
_record_mcp_operation_metrics.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants