Reference issue ##3598
Bug Summary
When querying /servers/{server_id}/tools?include_metrics=true, the API returns incorrect metrics that are aggregated across ALL servers using the tool, instead of metrics specific to the requested server.
Steps to Reproduce
- Create a tool (e.g.,
weather-tool)
- Associate the tool with two virtual servers:
server1 and server2
- Execute the tool:
- 2 times via
server1
- 3 times via
server2
- 1 time via Admin UI (no server context)
- Query metrics for
server1:
curl -X GET "http://localhost:4444/servers/server1/tools?include_metrics=true"
Expected Behavior
The API should return metrics scoped to server1 only:
{
"name": "weather-tool",
"metrics": {
"total_executions": 2,
"successful_executions": 2,
"failed_executions": 0
}
}
Actual Behavior
The API returns aggregated metrics across all servers:
{
"name": "weather-tool",
"metrics": {
"total_executions": 6, // WRONG: 2 + 3 + 1 = 6
"successful_executions": 6,
"failed_executions": 0
}
}
Impact
- Severity: High
- Affected Endpoints:
GET /servers/{server_id}/tools?include_metrics=true
GET /servers/{server_id}/resources?include_metrics=true
GET /servers/{server_id}/prompts?include_metrics=true
- User Impact:
- Cannot track per-server SLAs or performance
- Multi-tenant deployments cannot isolate server-specific issues
- Misleading data for capacity planning and troubleshooting
Root Cause
The metric tables are missing a server_id column:
tool_metrics table only has tool_id, no server_id
resource_metrics table only has resource_id, no server_id
prompt_metrics table only has prompt_id, no server_id
- Hourly rollup tables (
*_metrics_hourly) have the same issue
When metrics are recorded during tool/resource/prompt execution, the system does not capture which server was used. The metrics_summary property in mcpgateway/db.py aggregates ALL metrics for the entity without filtering by server.
Environment
Proposed Fix
1. Database Schema Changes (Migration Required)
Add server_id column to all metric tables:
-- Raw metrics tables
ALTER TABLE tool_metrics ADD COLUMN server_id VARCHAR(36) NULLABLE;
ALTER TABLE resource_metrics ADD COLUMN server_id VARCHAR(36) NULLABLE;
ALTER TABLE prompt_metrics ADD COLUMN server_id VARCHAR(36) NULLABLE;
-- Hourly rollup tables
ALTER TABLE tool_metrics_hourly ADD COLUMN server_id VARCHAR(36) NULLABLE;
ALTER TABLE resource_metrics_hourly ADD COLUMN server_id VARCHAR(36) NULLABLE;
ALTER TABLE prompt_metrics_hourly ADD COLUMN server_id VARCHAR(36) NULLABLE;
-- Add indexes for query performance
CREATE INDEX idx_tool_metrics_server_id ON tool_metrics(server_id);
CREATE INDEX idx_resource_metrics_server_id ON resource_metrics(server_id);
CREATE INDEX idx_prompt_metrics_server_id ON prompt_metrics(server_id);
Note: server_id is NULLABLE to support:
- Legacy metrics (existing data before fix)
- Admin UI executions (no server context)
- Direct invocations outside server context
2. Code Changes Required
A. Update ORM Models (mcpgateway/db.py)
class ToolMetric(Base):
# ... existing fields ...
server_id: Mapped[Optional[str]] = mapped_column(String(36), nullable=True, index=True)
class ResourceMetric(Base):
# ... existing fields ...
server_id: Mapped[Optional[str]] = mapped_column(String(36), nullable=True, index=True)
class PromptMetric(Base):
# ... existing fields ...
server_id: Mapped[Optional[str]] = mapped_column(String(36), nullable=True, index=True)
B. Update Metric Recording
Capture server_id in all execution paths:
- JSON-RPC handler (
/rpc)
- SSE transport (
/sse)
- WebSocket transport
- Streamable HTTP transport (
/mcp)
- Direct tool execution endpoints
C. Update Metrics Aggregation
Modify metrics_summary property to filter by server_id:
# In Tool/Resource/Prompt models (mcpgateway/db.py)
def metrics_summary(self, server_id: Optional[str] = None) -> Dict[str, Any]:
"""Aggregated metrics, optionally filtered by server_id."""
# Add WHERE server_id = ? to SQL queries
# Filter in-memory metrics by server_id
D. Update Service Methods
Pass server_id to conversion methods:
# In mcpgateway/services/tool_service.py - list_server_tools()
for tool in tools:
result.append(
self.convert_tool_to_read(
tool,
include_metrics=include_metrics,
server_id=server_id, # NEW: Pass server context for filtering
# ... other params ...
)
)
3. Backward Compatibility
- Existing metrics will have
server_id = NULL
- Queries without
server_id filter will aggregate all metrics (preserves current behavior for non-server contexts)
- Queries with
server_id filter will only include matching metrics (fixes the bug)
- Historical data remains queryable but not server-scoped
Files to Modify
- Database Migration:
mcpgateway/alembic/versions/XXXX_add_server_id_to_metrics.py
- ORM Models:
mcpgateway/db.py (ToolMetric, ResourceMetric, PromptMetric, *MetricsHourly classes)
- Metric Recording:
mcpgateway/services/tool_service.py (record_tool_metric method)
mcpgateway/services/resource_service.py (record_resource_metric method)
mcpgateway/services/prompt_service.py (record_prompt_metric method)
- Metric Aggregation:
mcpgateway/db.py (metrics_summary properties in Tool/Resource/Prompt classes)
- Service Conversions:
mcpgateway/services/tool_service.py (convert_tool_to_read, list_server_tools)
mcpgateway/services/resource_service.py (convert_resource_to_read, list_server_resources)
mcpgateway/services/prompt_service.py (convert_prompt_to_read, list_server_prompts)
- Rollup Service:
mcpgateway/services/metrics_rollup_service.py (aggregate by server_id)
Testing Checklist
Workaround
None available. Metrics are currently incorrect for multi-server deployments.
Related Issues
- Affects all multi-server deployments
- Impacts observability and monitoring accuracy
- Prevents proper SLA tracking per virtual server
Reference issue ##3598
Bug Summary
When querying
/servers/{server_id}/tools?include_metrics=true, the API returns incorrect metrics that are aggregated across ALL servers using the tool, instead of metrics specific to the requested server.Steps to Reproduce
weather-tool)server1andserver2server1server2server1:curl -X GET "http://localhost:4444/servers/server1/tools?include_metrics=true"Expected Behavior
The API should return metrics scoped to server1 only:
{ "name": "weather-tool", "metrics": { "total_executions": 2, "successful_executions": 2, "failed_executions": 0 } }Actual Behavior
The API returns aggregated metrics across all servers:
{ "name": "weather-tool", "metrics": { "total_executions": 6, // WRONG: 2 + 3 + 1 = 6 "successful_executions": 6, "failed_executions": 0 } }Impact
GET /servers/{server_id}/tools?include_metrics=trueGET /servers/{server_id}/resources?include_metrics=trueGET /servers/{server_id}/prompts?include_metrics=trueRoot Cause
The metric tables are missing a
server_idcolumn:tool_metricstable only hastool_id, noserver_idresource_metricstable only hasresource_id, noserver_idprompt_metricstable only hasprompt_id, noserver_id*_metrics_hourly) have the same issueWhen metrics are recorded during tool/resource/prompt execution, the system does not capture which server was used. The
metrics_summaryproperty inmcpgateway/db.pyaggregates ALL metrics for the entity without filtering by server.Environment
Proposed Fix
1. Database Schema Changes (Migration Required)
Add
server_idcolumn to all metric tables:Note:
server_idis NULLABLE to support:2. Code Changes Required
A. Update ORM Models (
mcpgateway/db.py)B. Update Metric Recording
Capture
server_idin all execution paths:/rpc)/sse)/mcp)C. Update Metrics Aggregation
Modify
metrics_summaryproperty to filter byserver_id:D. Update Service Methods
Pass
server_idto conversion methods:3. Backward Compatibility
server_id = NULLserver_idfilter will aggregate all metrics (preserves current behavior for non-server contexts)server_idfilter will only include matching metrics (fixes the bug)Files to Modify
mcpgateway/alembic/versions/XXXX_add_server_id_to_metrics.pymcpgateway/db.py(ToolMetric, ResourceMetric, PromptMetric, *MetricsHourly classes)mcpgateway/services/tool_service.py(record_tool_metric method)mcpgateway/services/resource_service.py(record_resource_metric method)mcpgateway/services/prompt_service.py(record_prompt_metric method)mcpgateway/db.py(metrics_summary properties in Tool/Resource/Prompt classes)mcpgateway/services/tool_service.py(convert_tool_to_read, list_server_tools)mcpgateway/services/resource_service.py(convert_resource_to_read, list_server_resources)mcpgateway/services/prompt_service.py(convert_prompt_to_read, list_server_prompts)mcpgateway/services/metrics_rollup_service.py(aggregate by server_id)Testing Checklist
server_id/servers/{server_id}/tools?include_metrics=truereturns server-scoped metrics/servers/{server_id}/resources?include_metrics=truereturns server-scoped metrics/servers/{server_id}/prompts?include_metrics=truereturns server-scoped metricsserver_id = NULLserver_idmetricsserver_idgroupingWorkaround
None available. Metrics are currently incorrect for multi-server deployments.
Related Issues