[BUG][OBSERVABILITY]: Metrics not scoped to server_id — showing aggregated totals

Reference issue #https://github.com/IBM/mcp-context-forge/issues/3598
### Bug Summary
When querying `/servers/{server_id}/tools?include_metrics=true`, the API returns **incorrect metrics** that are aggregated across ALL servers using the tool, instead of metrics specific to the requested server.

### Steps to Reproduce
1. Create a tool (e.g., `weather-tool`)
2. Associate the tool with two virtual servers: `server1` and `server2`
3. Execute the tool:
   - 2 times via `server1`
   - 3 times via `server2`
   - 1 time via Admin UI (no server context)
4. Query metrics for `server1`:
   ```bash
   curl -X GET "http://localhost:4444/servers/server1/tools?include_metrics=true"
   ```

### Expected Behavior
The API should return metrics **scoped to server1 only**:
```json
{
  "name": "weather-tool",
  "metrics": {
    "total_executions": 2,
    "successful_executions": 2,
    "failed_executions": 0
  }
}
```

### Actual Behavior
The API returns **aggregated metrics across all servers**:
```json
{
  "name": "weather-tool",
  "metrics": {
    "total_executions": 6,  // WRONG: 2 + 3 + 1 = 6
    "successful_executions": 6,
    "failed_executions": 0
  }
}
```

### Impact
- **Severity**: High
- **Affected Endpoints**:
  - `GET /servers/{server_id}/tools?include_metrics=true`
  - `GET /servers/{server_id}/resources?include_metrics=true`
  - `GET /servers/{server_id}/prompts?include_metrics=true`
- **User Impact**: 
  - Cannot track per-server SLAs or performance
  - Multi-tenant deployments cannot isolate server-specific issues
  - Misleading data for capacity planning and troubleshooting

### Root Cause
The metric tables are missing a `server_id` column:
- `tool_metrics` table only has `tool_id`, no `server_id`
- `resource_metrics` table only has `resource_id`, no `server_id`
- `prompt_metrics` table only has `prompt_id`, no `server_id`
- Hourly rollup tables (`*_metrics_hourly`) have the same issue

When metrics are recorded during tool/resource/prompt execution, the system does not capture which server was used. The `metrics_summary` property in `mcpgateway/db.py` aggregates ALL metrics for the entity without filtering by server.

### Environment
- **Version**: [v1.0.0-RC2]

## Proposed Fix

### 1. Database Schema Changes (Migration Required)
Add `server_id` column to all metric tables:

```sql
-- Raw metrics tables
ALTER TABLE tool_metrics ADD COLUMN server_id VARCHAR(36) NULLABLE;
ALTER TABLE resource_metrics ADD COLUMN server_id VARCHAR(36) NULLABLE;
ALTER TABLE prompt_metrics ADD COLUMN server_id VARCHAR(36) NULLABLE;

-- Hourly rollup tables
ALTER TABLE tool_metrics_hourly ADD COLUMN server_id VARCHAR(36) NULLABLE;
ALTER TABLE resource_metrics_hourly ADD COLUMN server_id VARCHAR(36) NULLABLE;
ALTER TABLE prompt_metrics_hourly ADD COLUMN server_id VARCHAR(36) NULLABLE;

-- Add indexes for query performance
CREATE INDEX idx_tool_metrics_server_id ON tool_metrics(server_id);
CREATE INDEX idx_resource_metrics_server_id ON resource_metrics(server_id);
CREATE INDEX idx_prompt_metrics_server_id ON prompt_metrics(server_id);
```

**Note:** `server_id` is NULLABLE to support:
- Legacy metrics (existing data before fix)
- Admin UI executions (no server context)
- Direct invocations outside server context

### 2. Code Changes Required

#### A. Update ORM Models (`mcpgateway/db.py`)
```python
class ToolMetric(Base):
    # ... existing fields ...
    server_id: Mapped[Optional[str]] = mapped_column(String(36), nullable=True, index=True)

class ResourceMetric(Base):
    # ... existing fields ...
    server_id: Mapped[Optional[str]] = mapped_column(String(36), nullable=True, index=True)

class PromptMetric(Base):
    # ... existing fields ...
    server_id: Mapped[Optional[str]] = mapped_column(String(36), nullable=True, index=True)
```

#### B. Update Metric Recording
Capture `server_id` in all execution paths:
- JSON-RPC handler (`/rpc`)
- SSE transport (`/sse`)
- WebSocket transport
- Streamable HTTP transport (`/mcp`)
- Direct tool execution endpoints

#### C. Update Metrics Aggregation
Modify `metrics_summary` property to filter by `server_id`:

```python
# In Tool/Resource/Prompt models (mcpgateway/db.py)
def metrics_summary(self, server_id: Optional[str] = None) -> Dict[str, Any]:
    """Aggregated metrics, optionally filtered by server_id."""
    # Add WHERE server_id = ? to SQL queries
    # Filter in-memory metrics by server_id
```

#### D. Update Service Methods
Pass `server_id` to conversion methods:

```python
# In mcpgateway/services/tool_service.py - list_server_tools()
for tool in tools:
    result.append(
        self.convert_tool_to_read(
            tool,
            include_metrics=include_metrics,
            server_id=server_id,  # NEW: Pass server context for filtering
            # ... other params ...
        )
    )
```

### 3. Backward Compatibility
- Existing metrics will have `server_id = NULL`
- Queries without `server_id` filter will aggregate all metrics (preserves current behavior for non-server contexts)
- Queries with `server_id` filter will only include matching metrics (fixes the bug)
- Historical data remains queryable but not server-scoped

## Files to Modify

1. **Database Migration**: `mcpgateway/alembic/versions/XXXX_add_server_id_to_metrics.py`
2. **ORM Models**: `mcpgateway/db.py` (ToolMetric, ResourceMetric, PromptMetric, *MetricsHourly classes)
3. **Metric Recording**: 
   - `mcpgateway/services/tool_service.py` (record_tool_metric method)
   - `mcpgateway/services/resource_service.py` (record_resource_metric method)
   - `mcpgateway/services/prompt_service.py` (record_prompt_metric method)
4. **Metric Aggregation**: `mcpgateway/db.py` (metrics_summary properties in Tool/Resource/Prompt classes)
5. **Service Conversions**:
   - `mcpgateway/services/tool_service.py` (convert_tool_to_read, list_server_tools)
   - `mcpgateway/services/resource_service.py` (convert_resource_to_read, list_server_resources)
   - `mcpgateway/services/prompt_service.py` (convert_prompt_to_read, list_server_prompts)
6. **Rollup Service**: `mcpgateway/services/metrics_rollup_service.py` (aggregate by server_id)

## Testing Checklist

- [ ] Verify metrics are recorded with correct `server_id`
- [ ] Verify `/servers/{server_id}/tools?include_metrics=true` returns server-scoped metrics
- [ ] Verify `/servers/{server_id}/resources?include_metrics=true` returns server-scoped metrics
- [ ] Verify `/servers/{server_id}/prompts?include_metrics=true` returns server-scoped metrics
- [ ] Verify Admin UI executions (no server) still work with `server_id = NULL`
- [ ] Verify backward compatibility with existing NULL `server_id` metrics
- [ ] Verify multi-server scenarios (same tool on 2+ servers)
- [ ] Verify hourly rollup aggregation includes `server_id` grouping

## Workaround
None available. Metrics are currently incorrect for multi-server deployments.

## Related Issues
- Affects all multi-server deployments
- Impacts observability and monitoring accuracy
- Prevents proper SLA tracking per virtual server



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG][OBSERVABILITY]: Metrics not scoped to server_id — showing aggregated totals #3642

Bug Summary

Steps to Reproduce

Expected Behavior

Actual Behavior

Impact

Root Cause

Environment

Proposed Fix

1. Database Schema Changes (Migration Required)

2. Code Changes Required

A. Update ORM Models (`mcpgateway/db.py`)

B. Update Metric Recording

C. Update Metrics Aggregation

D. Update Service Methods

3. Backward Compatibility

Files to Modify

Testing Checklist

Workaround

Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG][OBSERVABILITY]: Metrics not scoped to server_id — showing aggregated totals #3642

Description

Bug Summary

Steps to Reproduce

Expected Behavior

Actual Behavior

Impact

Root Cause

Environment

Proposed Fix

1. Database Schema Changes (Migration Required)

2. Code Changes Required

A. Update ORM Models (mcpgateway/db.py)

B. Update Metric Recording

C. Update Metrics Aggregation

D. Update Service Methods

3. Backward Compatibility

Files to Modify

Testing Checklist

Workaround

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

A. Update ORM Models (`mcpgateway/db.py`)