Skip to content

Instrument vMCP and Document o11y Configuration#2906

Merged
jerm-dro merged 23 commits intomainfrom
jerm/vmcp-o11y
Dec 8, 2025
Merged

Instrument vMCP and Document o11y Configuration#2906
jerm-dro merged 23 commits intomainfrom
jerm/vmcp-o11y

Conversation

@jerm-dro
Copy link
Copy Markdown
Contributor

@jerm-dro jerm-dro commented Dec 4, 2025

Summary

This PR adds metrics and distributed tracing to the Virtual MCP Server (vMCP), providing visibility into backend calls and composite workflow executions.

Fixes #2849

Large PR Justification

This is a fairly large PR, but most of it is documentation which I thought would be easier to review if it was included with the code changes.

Changes

Implementation

  • Add telemetry.go with decorator pattern for instrumenting backend clients and workflow executors
  • Instrument the vMCP server with the existing telemetry middleware.
  • Emit metrics for backend requests (request count, errors, duration) and workflow executions
  • Create distributed tracing spans for backend operations and workflow executions
  • Plumb telemetry configuration from VirtualMCPServer CRD to the vMCP server

Metrics Emitted

  • toolhive_vmcp_backends_discovered - Number of backends discovered
  • toolhive_vmcp_backend_requests - Request count per backend
  • toolhive_vmcp_backend_errors - Error count per backend
  • toolhive_vmcp_backend_requests_duration - Request duration histogram
  • toolhive_vmcp_workflow_executionsl - Workflow execution count
  • toolhive_vmcp_workflow_errors - Workflow error count
  • toolhive_vmcp_workflow_duration - Workflow duration histogram

CRD Updates

  • Add spec.telemetry field to VirtualMCPServer for configuring OpenTelemetry and Prometheus

Documentation

  • Add dedicated vMCP observability documentation (docs/operator/virtualmcpserver-observability.md)
  • Document spec.telemetry field in VirtualMCPServer API reference
  • Update docs/observability.md with link to vMCP observability
  • Add telemetry to examples/vmcp-config.yaml and cmd/vmcp/README.md

Testing

  • Add integration tests for metrics emission

@jerm-dro jerm-dro requested a review from Copilot December 4, 2025 19:26
@github-actions github-actions Bot added the size/L Large PR: 600-999 lines changed label Dec 4, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 4, 2025

Codecov Report

❌ Patch coverage is 1.08696% with 182 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.00%. Comparing base (92922c6) to head (105cd92).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/vmcp/server/telemetry.go 0.00% 122 Missing ⚠️
pkg/vmcp/server/server.go 0.00% 24 Missing and 4 partials ⚠️
cmd/vmcp/app/commands.go 0.00% 18 Missing ⚠️
test/integration/vmcp/helpers/vmcp_server.go 0.00% 14 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2906      +/-   ##
==========================================
- Coverage   56.30%   56.00%   -0.30%     
==========================================
  Files         324      325       +1     
  Lines       31921    32091     +170     
==========================================
+ Hits        17972    17974       +2     
- Misses      12412    12576     +164     
- Partials     1537     1541       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive observability to the Virtual MCP Server (vMCP) by integrating OpenTelemetry-based metrics and distributed tracing. The implementation uses a decorator pattern to instrument both backend client operations and composite workflow executions, providing visibility into the vMCP's operations without modifying core business logic.

Key Changes:

  • Implemented telemetry instrumentation using decorator pattern for backend clients and workflow executors
  • Added OpenTelemetry configuration to VirtualMCPServer CRD with pass-through to vMCP config
  • Created dedicated observability documentation for vMCP with configuration examples

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pkg/vmcp/server/telemetry.go New file implementing telemetry decorators for backend clients and workflow executors using OpenTelemetry metrics and traces
pkg/vmcp/server/server.go Integrates telemetry provider, decorates backend client and workflow executors, adds /metrics endpoint and telemetry middleware
pkg/vmcp/config/config.go Adds Telemetry field to vMCP configuration structure
test/integration/vmcp/vmcp_integration_test.go Adds integration test verifying metrics emission for workflows and backend calls
test/integration/vmcp/helpers/vmcp_server.go Extends test helper with workflow definitions and telemetry provider support
pkg/telemetry/middleware.go Documentation update for transport types
pkg/telemetry/config.go Documentation update for transport types
examples/vmcp-config.yaml Adds telemetry configuration example
docs/operator/virtualmcpserver-observability.md New documentation describing vMCP observability features and configuration
docs/operator/virtualmcpserver-api.md Documents spec.telemetry field in VirtualMCPServer CRD
docs/observability.md Links to vMCP-specific observability documentation
cmd/vmcp/app/commands.go Creates telemetry provider from config and passes to server
cmd/vmcp/README.md Updates feature list to include observability
cmd/thv-operator/pkg/vmcpconfig/converter.go Passes through telemetry config from CRD to vMCP config
cmd/thv-operator/api/v1alpha1/virtualmcpserver_types.go Adds Telemetry field to VirtualMCPServerSpec

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/vmcp/server/telemetry.go Outdated
Comment thread pkg/vmcp/server/telemetry.go Outdated
Comment thread pkg/vmcp/server/telemetry.go Outdated
Comment thread pkg/vmcp/server/telemetry.go Outdated
Comment thread docs/operator/virtualmcpserver-observability.md Outdated
Comment thread pkg/vmcp/server/server.go Outdated
@jerm-dro
Copy link
Copy Markdown
Contributor Author

jerm-dro commented Dec 4, 2025

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link
Copy Markdown
Contributor

Copilot AI commented Dec 4, 2025

@jerm-dro I've opened a new pull request, #2907, to work on those changes. Once the pull request is ready, I'll request review from you.

@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Dec 4, 2025
Comment thread cmd/thv-operator/api/v1alpha1/virtualmcpserver_types.go Outdated
Comment thread cmd/vmcp/app/commands.go
Comment thread pkg/vmcp/server/server.go Outdated
Comment thread pkg/vmcp/server/telemetry.go
Comment thread pkg/vmcp/server/telemetry.go
Comment thread pkg/vmcp/server/telemetry.go
Comment thread docs/observability.md
Copy link
Copy Markdown
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend pinging @ChrisJBurns about the metrics as he has pretty good experience implementing metrics for MCPServer but also as an SRE. I added some code questions inline, chiefly the one about the CRD types.

@jerm-dro jerm-dro changed the base branch from main to jerm/jerm/refactor-telemetry-conversion December 4, 2025 23:02
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Dec 4, 2025
Base automatically changed from jerm/jerm/refactor-telemetry-conversion to main December 4, 2025 23:09
@github-actions github-actions Bot removed the size/L Large PR: 600-999 lines changed label Dec 4, 2025
Signed-off-by: Jeremy Drouillard <jeremy@stacklok.com>
Signed-off-by: Jeremy Drouillard <jeremy@stacklok.com>
Signed-off-by: Jeremy Drouillard <jeremy@stacklok.com>
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Dec 8, 2025
Signed-off-by: Jeremy Drouillard <jeremy@stacklok.com>
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Dec 8, 2025
Signed-off-by: Jeremy Drouillard <jeremy@stacklok.com>
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Dec 8, 2025
jhrozek
jhrozek previously approved these changes Dec 8, 2025
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Dec 8, 2025
Signed-off-by: Jeremy Drouillard <jeremy@stacklok.com>
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Dec 8, 2025
Signed-off-by: Jeremy Drouillard <jeremy@stacklok.com>
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Dec 8, 2025
@jerm-dro jerm-dro merged commit 155ba42 into main Dec 8, 2025
36 checks passed
@jerm-dro jerm-dro deleted the jerm/vmcp-o11y branch December 8, 2025 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large PR: 600-999 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add OpenTelemetry metrics to vMCP server

4 participants