docs-website/docs/toolhive/integrations/opentelemetry.mdx at 58d5e04db0d7afaba7f78f8d9e9f273da9329a22 · stacklok/docs-website

title	Collect telemetry for MCP workloads
description	Learn how to collect metrics and traces for MCP workloads using either the ToolHive CLI or Kubernetes operator with OpenTelemetry, Jaeger, and Prometheus.
toc_max_heading_level	2

import useBaseUrl from '@docusaurus/useBaseUrl'; import ThemedImage from '@theme/ThemedImage';

In this tutorial, you'll set up comprehensive observability for your MCP workloads using OpenTelemetry with Jaeger for distributed tracing, Prometheus for metrics collection, and Grafana for visualization.

By the end, you'll have a complete, industry-standard observability solution that captures detailed traces and metrics, giving you visibility into your MCP server performance and usage patterns.

Choose your deployment path

This tutorial offers two paths for MCP observability:

ToolHive CLI + Docker observability stack

Use the ToolHive CLI to run MCP servers locally, with Jaeger and Prometheus running in Docker containers. This approach is perfect for:

Local development and testing
Quick setup and experimentation
Individual developer workflows
Learning OpenTelemetry concepts

ToolHive Kubernetes Operator + in-cluster observability

Use the ToolHive Kubernetes Operator to manage MCP servers in a cluster, with Jaeger and Prometheus deployed inside Kubernetes. This approach is ideal for:

Production-like environments
Team collaboration and shared infrastructure
Container orchestration workflows
Scalable observability deployments

:::tip[Choose one path]

Select your preferred deployment method using the tabs above. All subsequent steps will show instructions for your chosen path.

:::

What you'll learn

How to deploy Jaeger and Prometheus for your chosen environment
How to configure OpenTelemetry collection for ToolHive MCP servers
How to analyze traces in Jaeger and metrics in Prometheus
How to set up queries and monitoring for MCP workloads
Best practices for observability in your deployment environment

Prerequisites

Before starting this tutorial, make sure you have:

Completed the ToolHive CLI quickstart
A supported container runtime installed and running. Docker or Podman are recommended for this tutorial
Docker Compose or Podman Compose available
A supported MCP client for testing

Completed the ToolHive Kubernetes quickstart with a local kind cluster
kubectl configured to access your cluster
Helm (v3.10 minimum) installed
A supported MCP client for testing
The ToolHive CLI (optional, for client configuration)
Basic familiarity with Kubernetes concepts

Overview

The architecture for each deployment method:

graph TB
    A[AI client]
    THV[ToolHive CLI]
    Proxy[Proxy process]
    subgraph Docker[**Docker**]
      MCP[MCP server<br>container]
      OTEL[OTel Collector]
      J[Jaeger]
      P[Prometheus]
      G[Grafana]
    end

    THV -. manages .-> MCP & Proxy
    Proxy -- HTTP or stdio --> MCP
    Proxy -- OpenTelemetry data --> OTEL
    OTEL -- traces --> J
    OTEL -- metrics --> P
    G -- visualization --> P
    A -- HTTP --> Proxy

Your setup will include:

ToolHive CLI managing MCP servers in containers
Jaeger for distributed tracing with built-in UI
Prometheus for metrics collection with web UI
OpenTelemetry Collector forwarding data to both backends

graph TB
    A[AI client]
    subgraph K8s[**K8s Cluster**]
      THV[ToolHive Operator]
      Proxy[Proxyrunner pod]
      MCP[MCP server pod]
      OTEL[OTel Collector]
      J[Jaeger]
      P[Prometheus]
      G[Grafana]
    end

    A -- HTTP<br>via ingress --> Proxy
    THV -. manages .-> Proxy
    Proxy -. manages .-> MCP
    Proxy -- OpenTelemetry data --> OTEL
    OTEL -- traces --> J
    OTEL -- metrics --> P
    G -- visualization --> P

Your setup will include:

ToolHive Operator managing MCP servers as Kubernetes pods
Jaeger for distributed tracing
Prometheus for metrics collection
Grafana for metrics visualization
OpenTelemetry Collector running as a Kubernetes service

Step 1: Deploy the observability stack

First, set up the observability infrastructure for your chosen environment.

Create Docker Compose configuration

Create a Docker Compose file for the observability stack:

services:
  jaeger:
    image: jaegertracing/jaeger:latest
    container_name: jaeger
    environment:
      - COLLECTOR_OTLP_ENABLED=true
    ports:
      - '16686:16686' # Jaeger UI
    networks:
      - observability

  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
      - '--enable-feature=native-histograms'
    ports:
      - '9090:9090'
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    networks:
      - observability

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false
    ports:
      - '3000:3000'
    volumes:
      - ./grafana-prometheus.yml:/etc/grafana/provisioning/datasources/prometheus.yml
      - grafana-data:/var/lib/grafana
    networks:
      - observability

  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    container_name: otel-collector
    command: ['--config=/etc/otel-collector-config.yml']
    volumes:
      - ./otel-collector-config.yml:/etc/otel-collector-config.yml
    ports:
      - '4318:4318' # OTLP HTTP receiver (ToolHive sends here)
      - '8889:8889' # Prometheus exporter metrics
    depends_on:
      - jaeger
      - prometheus
    networks:
      - observability

volumes:
  prometheus-data:
  grafana-data:

networks:
  observability:
    driver: bridge

Configure the OpenTelemetry Collector

Create the collector configuration to export to both Jaeger and Prometheus:

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 10s
    send_batch_size: 1024

exporters:
  # Export traces to Jaeger
  otlp/jaeger:
    endpoint: jaeger:4317
    tls:
      insecure: true

  # Expose metrics for Prometheus
  prometheus:
    endpoint: 0.0.0.0:8889
    const_labels:
      service: 'toolhive-mcp-proxy'

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/jaeger]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus]

Configure Prometheus and Grafana

Create a Prometheus configuration to scrape the OpenTelemetry Collector:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'otel-collector'
    static_configs:
      - targets: ['otel-collector:8889']

Create the Prometheus data source configuration for Grafana:

apiVersion: 1

datasources:
  - name: prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: true

Start the observability stack

Deploy the stack and verify it's running:

# Start the stack
docker compose -f observability-stack.yml up -d

# Verify Jaeger is running
curl http://localhost:16686/api/services

# Verify Prometheus is running
curl http://localhost:9090/-/healthy

# Verify the OpenTelemetry Collector is ready
curl -I http://localhost:8889/metrics

Access the interfaces:

Jaeger UI: http://localhost:16686
Prometheus Web UI: http://localhost:9090
Grafana: http://localhost:3000 (login: admin/admin)

Prerequisite

If you've completed the Kubernetes quickstart, skip to the next step.

Otherwise, set up a local kind cluster and install the ToolHive operator:

kind create cluster --name toolhive
helm upgrade -i toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds
helm upgrade -i toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator -n toolhive-system --create-namespace

Verify the operator is running:

kubectl get pods -n toolhive-system

Create the monitoring namespace

Create a dedicated namespace for your observability stack:

kubectl create namespace monitoring

Deploy Jaeger

Install Jaeger using Helm with a configuration suited for ToolHive:

helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm repo update
helm upgrade -i jaeger-all-in-one jaegertracing/jaeger -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.3.6/examples/otel/jaeger-values.yaml -n monitoring

Deploy Prometheus and Grafana

Install Prometheus and Grafana using the kube-prometheus-stack Helm chart:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm upgrade -i kube-prometheus-stack prometheus-community/kube-prometheus-stack -f https://raw.githubusercontent.com/stacklok/toolhive/v0.3.6/examples/otel/prometheus-stack-values.yaml -n monitoring

Deploy OpenTelemetry Collector

Create the collector configuration and deployment manifest:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
helm upgrade -i otel-collector open-telemetry/opentelemetry-collector  -f https://raw.githubusercontent.com/stacklok/toolhive/v0.3.6/examples/otel/otel-values.yaml -n monitoring

Verify all components

Verify all components are running:

kubectl get pods -n monitoring

Wait for all pods to be in Running status before proceeding. The output should look similar to:

NAME                                                        READY   STATUS    RESTARTS   AGE
jaeger-all-in-one-6bf667c984-p5455                          1/1     Running   0          2m12s
kube-prometheus-stack-grafana-69c88f77c5-b9f7m              3/3     Running   0          37s
kube-prometheus-stack-kube-state-metrics-55cb9c8889-cnlkt   1/1     Running   0          37s
kube-prometheus-stack-operator-85655fb7cd-rxms9             1/1     Running   0          37s
kube-prometheus-stack-prometheus-node-exporter-zzcvh        1/1     Running   0          37s
otel-collector-opentelemetry-collector-agent-hqtnq          1/1     Running   0          11s
prometheus-kube-prometheus-stack-prometheus-0               2/2     Running   0          36s

Step 2: Configure MCP server telemetry

Now configure your MCP servers to send telemetry data to the observability stack.

Set global telemetry configuration

Configure ToolHive CLI with default telemetry settings to send data to the OpenTelemetry Collector:

# Configure the OpenTelemetry endpoint (collector, not directly to Jaeger)
thv config otel set-endpoint localhost:4318

# Enable both metrics and tracing
thv config otel set-metrics-enabled true
thv config otel set-tracing-enabled true

# Set 100% sampling for development
thv config otel set-sampling-rate 1.0

# Use insecure connection for local development
thv config otel set-insecure true

Run an MCP server with telemetry

Start an MCP server with enhanced telemetry configuration:

thv run \
  --otel-service-name "mcp-fetch-server" \
  --otel-env-vars "USER,HOST" \
  --otel-enable-prometheus-metrics-path \
  fetch

Verify the server started and is exporting telemetry:

# Check server status
thv list

# Check Prometheus metrics are available on the MCP server
PORT=$(thv list | grep fetch | awk '{print $5}')
curl http://localhost:$PORT/metrics

Create an MCP server with telemetry

Create an MCPServer resource with comprehensive telemetry configuration:

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPServer
metadata:
  name: fetch-telemetry
  namespace: toolhive-system
spec:
  image: ghcr.io/stackloklabs/gofetch/server
  transport: streamable-http
  proxyPort: 8080
  mcpPort: 8080
  resources:
    limits:
      cpu: '100m'
      memory: '128Mi'
    requests:
      cpu: '50m'
      memory: '64Mi'
  telemetry:
    openTelemetry:
      enabled: true
      endpoint: otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4318
      serviceName: mcp-fetch-server
      insecure: true # Using HTTP collector endpoint
      metrics:
        enabled: true
      tracing:
        enabled: true
        samplingRate: '1.0'
    prometheus:
      enabled: true

Deploy the MCP server:

kubectl apply -f fetch-with-telemetry.yml

Verify the MCP server is running and healthy:

# Verify the server is running
kubectl get mcpserver -n toolhive-system

# Check the pods are healthy
kubectl get pods -n toolhive-system -l app.kubernetes.io/instance=fetch-telemetry

Step 3: Generate telemetry data

Create some MCP interactions to generate traces and metrics for analysis.

Connect your AI client

Your MCP server is already configured to work with your AI client from the CLI quickstart. Simply use your client to make requests that will generate telemetry data.

Port-forward to access the MCP server

In a separate terminal window, create a port-forward to connect your AI client:

kubectl port-forward service/mcp-fetch-telemetry-proxy -n toolhive-system 8080:8080

Leave this running for the duration of this tutorial.

Configure your AI client

Use the ToolHive CLI to add the MCP server to your client configuration:

thv run http://localhost:8080/mcp --name fetch-k8s --transport streamable-http

Generate sample data

Make several requests using your AI client to create diverse telemetry:

Basic fetch request: "Fetch the content from https://toolhive.dev and summarize it"
Multiple requests: Make 3-4 more fetch requests with different URLs
Error generation: Try an invalid URL to generate error traces

Each interaction creates rich telemetry data including:

Request traces with timing information sent to Jaeger
Tool call details with sanitized arguments
Performance metrics sent to Prometheus

The CLI and Kubernetes deployments will both generate similar telemetry data, with the Kubernetes setup including additional Kubernetes-specific attributes.

Step 4: Access and analyze telemetry data

Now examine your telemetry data using Jaeger and Prometheus to understand MCP server performance.

Access Jaeger for traces

Open Jaeger in your browser at http://localhost:16686.

Explore traces in Jaeger

In the Service dropdown, select mcp-fetch-server
Click Find Traces to see recent traces
Click on individual traces to see detailed spans

Look for traces with protocol and MCP-specific attributes like:

{
  "serviceName": "mcp-fetch-server",
  "http.duration_ms": "307.8",
  "http.status_code": 200,
  "mcp.method": "tools/call",
  "mcp.tool.name": "fetch",
  "mcp.tool.arguments": "url=https://toolhive.dev",
  "mcp.transport": "streamable-http",
  "service.version": "v0.3.6"
}

Access Grafana for visualization

Open http://localhost:3000 in your browser and log in using the default credentials (admin / admin).

Import the ToolHive dashboard

Click the + icon in the top-right of the Grafana interface and select Import dashboard
In the Import via dashboard JSON model input box, paste the contents of this example dashboard file
Click Load, then Import

Make some requests to your MCP server again and watch the dashboard update in real-time.

You can also explore other metrics in Grafana by creating custom panels and queries. See the Observability guide for examples.

Port-forward to Jaeger

Access Jaeger through a port-forward:

kubectl port-forward service/jaeger-all-in-one-query -n monitoring 16686:16686

Open http://localhost:16686 in your browser.

Explore traces in Jaeger

In the Service dropdown, select mcp-fetch-server
Click Find Traces to see recent traces
Click on individual traces to see detailed spans

Review the available information including MCP and Kubernetes-specific attributes like:

{
  "serviceName": "mcp-fetch-server",
  "http.duration_ms": "307.8",
  "http.status_code": 200,
  "mcp.method": "tools/call",
  "mcp.tool.name": "fetch",
  "mcp.tool.arguments": "url=https://toolhive.dev",
  "mcp.transport": "streamable-http",
  "k8s.deployment.name": "fetch-telemetry",
  "k8s.namespace.name": "toolhive-system",
  "k8s.node.name": "toolhive-control-plane",
  "k8s.pod.name": "fetch-telemetry-7d7d55687c-glvpz",
  "service.namespace": "toolhive-system",
  "service.version": "v0.3.6"
}

Port-forward to Grafana

Access Grafana through a port-forward:

kubectl port-forward service/kube-prometheus-stack-grafana -n monitoring 3000:80

Open http://localhost:3000 in your browser and log in using the default credentials (admin / admin).

Import the ToolHive dashboard

Click the + icon in the top-right of the Grafana interface and select Import dashboard
In the Import via dashboard JSON model input box, paste the contents of this example dashboard file
Click Load, then Import

Make some requests to your MCP server again and watch the dashboard update in real-time.

You can also explore other metrics in Grafana by creating custom panels and queries. See the Observability guide for examples.

Step 5: Cleanup

When you're finished exploring, clean up your resources.

Stop MCP servers

# Stop and remove the MCP server
thv rm fetch

# Clear telemetry configuration (optional)
thv config otel unset-endpoint
thv config otel unset-metrics-enabled
thv config otel unset-tracing-enabled
thv config otel unset-sampling-rate
thv config otel unset-insecure

Stop observability stack

# Stop all containers
docker compose -f observability-stack.yml down

# Remove all data (optional)
docker compose -f observability-stack.yml down -v

# Clean up provisioning directories (optional)
rm -rf grafana/

Remove MCP servers

# Delete the MCP server
kubectl delete mcpserver fetch-telemetry -n toolhive-system

Remove observability stack

# Delete observability components
helm uninstall otel-collector -n monitoring
helm uninstall kube-prometheus-stack -n monitoring
helm uninstall jaeger-all-in-one -n monitoring

# Remove the monitoring namespace
kubectl delete namespace monitoring

Optional: Remove the kind cluster

If you're completely done:

kind delete cluster --name toolhive

What's next?

Congratulations! You've successfully set up comprehensive observability for ToolHive MCP workloads using Jaeger and Prometheus.

To learn more about ToolHive's telemetry capabilities and best practices, see the Observability concepts guide.

Here are some next steps to explore:

Custom dashboards: Create Grafana dashboards that query both Jaeger and Prometheus
Alerting: Set up Prometheus AlertManager for performance and error alerts
Performance optimization: Use telemetry data to optimize MCP server performance
Distributed tracing: Understand request flows across multiple MCP servers

CLI-specific next steps

Review the CLI telemetry guide: Explore detailed configuration options
Scale to multiple servers: Run multiple MCP servers with different configurations
Production CLI setup: Learn about secrets management and custom permissions
Alternative backends: Try other observability platforms mentioned in the CLI telemetry guide

Kubernetes-specific next steps

Review the Kubernetes telemetry guide: Explore detailed configuration options
Production deployment: Set up production-grade Jaeger and Prometheus with persistent storage, or configure an OpenTelemetry Collector to work with your existing observability tools
Advanced MCP configurations: Explore Kubernetes MCP deployment patterns
Secrets integration: Learn about HashiCorp Vault integration
Service mesh observability: Integrate with Istio or Linkerd for enhanced tracing

Related information

Observability concepts - Understanding ToolHive's telemetry architecture
CLI telemetry guide - Detailed CLI configuration options
Kubernetes telemetry guide - Kubernetes operator telemetry features
OpenTelemetry Collector documentation - Official OpenTelemetry Collector documentation
Jaeger documentation - Official Jaeger documentation
Prometheus documentation - Official Prometheus documentation

Troubleshooting

Docker containers won't start

Check Docker daemon and container logs:

# Verify Docker is running
docker info

# Check container logs
docker compose -f observability-stack.yml logs jaeger
docker compose -f observability-stack.yml logs prometheus
docker compose -f observability-stack.yml logs otel-collector

Common issues:

Port conflicts with existing services
Insufficient Docker memory allocation
Missing configuration files

ToolHive CLI not sending telemetry

Verify telemetry configuration:

# Check current config
thv config otel get-endpoint
thv config otel get-metrics-enabled

Check the ToolHive proxy logs for telemetry export errors:

thv logs fetch --proxy [--follow]

Alternatively, you can check the log file directly at:

macOS: ~/Library/Application Support/toolhive/logs/fetch.log
Windows: %LOCALAPPDATA%\toolhive\logs\fetch.log
Linux: ~/.local/share/toolhive/logs/fetch.log

No traces in Jaeger

Check the telemetry pipeline:

Verify collector is receiving data: curl http://localhost:8888/metrics
Check collector logs: docker logs otel-collector
Verify Jaeger connectivity: curl http://localhost:16686/api/services

Pods stuck in pending state

Check cluster resources and pod events:

# Check pod status
kubectl get pods -n monitoring

# Describe problematic pods
kubectl describe pod <pod-name> -n monitoring

# Check node resources
kubectl top nodes

Common issues:

Insufficient cluster resources
Image pull failures
Network policies blocking communication

MCP server not sending telemetry

Verify the telemetry configuration and connectivity:

# Check MCPServer status
kubectl describe mcpserver fetch-telemetry -n toolhive-system

# Check OpenTelemetry Collector logs
kubectl logs deployment/otel-collector -n monitoring

# Verify service connectivity
kubectl exec -it deployment/otel-collector -n monitoring -- wget -qO- http://jaeger:16686/api/services

No metrics in Prometheus

Common troubleshooting steps:

Verify Prometheus targets: Check http://localhost:9090/targets to ensure otel-collector target is UP
Check collector metrics endpoint: curl http://localhost:8889/metrics (CLI) or port-forward and check in K8s
Review collector configuration: Ensure the Prometheus exporter is properly configured
Check Prometheus config: Verify the scrape configuration includes the collector endpoint

FilesExpand file tree

opentelemetry.mdx

Latest commit

History

opentelemetry.mdx

File metadata and controls

Choose your deployment path

What you'll learn

Prerequisites

Overview

Step 1: Deploy the observability stack

Create Docker Compose configuration

Configure the OpenTelemetry Collector

Configure Prometheus and Grafana

Start the observability stack

Prerequisite

Create the monitoring namespace

Deploy Jaeger

Deploy Prometheus and Grafana

Deploy OpenTelemetry Collector

Verify all components

Step 2: Configure MCP server telemetry

Set global telemetry configuration

Run an MCP server with telemetry

Create an MCP server with telemetry

Step 3: Generate telemetry data

Connect your AI client

Port-forward to access the MCP server

Configure your AI client

Generate sample data

Step 4: Access and analyze telemetry data

Access Jaeger for traces

Explore traces in Jaeger

Access Grafana for visualization

Import the ToolHive dashboard

Port-forward to Jaeger

Explore traces in Jaeger

Port-forward to Grafana

Import the ToolHive dashboard

Step 5: Cleanup

Stop MCP servers

Stop observability stack

Remove MCP servers

Remove observability stack

Optional: Remove the kind cluster

What's next?

CLI-specific next steps

Kubernetes-specific next steps

Related information

Troubleshooting