|
| 1 | +# Virtual MCP Server Observability |
| 2 | + |
| 3 | +This document describes the observability for the Virtual MCP |
| 4 | +Server (vMCP), which aggregates multiple backend MCP servers into a unified |
| 5 | +interface. The vMCP provides OpenTelemetry-based instrumentation for monitoring |
| 6 | +backend operations and composite tool workflow executions. |
| 7 | + |
| 8 | +For general ToolHive observability concepts and proxy runner telemetry, see the |
| 9 | +main [Observability and Telemetry](../observability.md) documentation. |
| 10 | + |
| 11 | +## Overview |
| 12 | + |
| 13 | +The vMCP telemetry provides visibility into: |
| 14 | + |
| 15 | +1. **Backend operations**: Track requests to individual backend MCP servers |
| 16 | + including tool calls, resource reads, prompt retrieval, and capability listing |
| 17 | +2. **Workflow executions**: Monitor composite tool workflow performance and errors |
| 18 | +3. **Distributed tracing**: Correlate requests across the vMCP and its backends |
| 19 | + |
| 20 | +The vMCP uses a decorator pattern to wrap backend clients and workflow executors |
| 21 | +with telemetry instrumentation. This approach provides consistent metrics and |
| 22 | +tracing without modifying the core business logic. |
| 23 | + |
| 24 | +The implementation of both metrics and traces can be found in `pkg/vmcp/server/telemetry.go`. |
| 25 | + |
| 26 | +## Metrics |
| 27 | + |
| 28 | +The vMCP emits metrics for backend operations and workflow executions. All |
| 29 | +metrics use the `toolhive_vmcp_` prefix. |
| 30 | + |
| 31 | +**Backend metrics** track requests to individual backend MCP servers, including |
| 32 | +request counts, error counts, and request duration histograms. These metrics |
| 33 | +include attributes identifying the target backend (workload ID, name, URL, |
| 34 | +transport type) and the action being performed (tool call, resource read, etc.). |
| 35 | + |
| 36 | +**Workflow metrics** track composite tool workflow executions, including |
| 37 | +execution counts, error counts, and duration histograms. These metrics include |
| 38 | +the workflow name as an attribute. |
| 39 | + |
| 40 | +## Distributed Tracing |
| 41 | + |
| 42 | +The vMCP creates spans for each individual backend operation as well as workflow executions, enabling the attribution of workflow exection errors or latency to specific tool calls. |
| 43 | + |
| 44 | + |
| 45 | +## Configuration |
| 46 | + |
| 47 | +Configure telemetry in the `VirtualMCPServer` resource using the `spec.telemetry` |
| 48 | +field: |
| 49 | + |
| 50 | +```yaml |
| 51 | +apiVersion: toolhive.stacklok.dev/v1alpha1 |
| 52 | +kind: VirtualMCPServer |
| 53 | +metadata: |
| 54 | + name: my-vmcp |
| 55 | +spec: |
| 56 | + groupRef: |
| 57 | + name: my-group |
| 58 | + telemetry: |
| 59 | + endpoint: "otel-collector:4317" |
| 60 | + serviceName: "my-vmcp" |
| 61 | + tracingEnabled: true |
| 62 | + metricsEnabled: true |
| 63 | + samplingRate: 0.1 |
| 64 | + insecure: true |
| 65 | + enablePrometheusMetricsPath: true |
| 66 | +``` |
| 67 | +
|
| 68 | +See the [VirtualMCPServer API reference](./virtualmcpserver-api.md) for complete |
| 69 | +CRD documentation. |
| 70 | +
|
| 71 | +## Related Documentation |
| 72 | +
|
| 73 | +- [Observability and Telemetry](../observability.md) - Main ToolHive observability documentation |
| 74 | +- [VirtualMCPServer API Reference](./virtualmcpserver-api.md) - Complete CRD specification |
0 commit comments