Skip to content

[FEATURE][API]: Enhancing Multi-Instance Monitoring with Leader Status Indicators #3838

@bogdanmariusc10

Description

@bogdanmariusc10

🧭 Type of Feature

Please select the most appropriate category:

  • Enhancement to existing functionality
  • New feature or capability
  • New MCP-compliant server
  • New component or integration
  • Developer tooling or test improvement
  • Packaging, automation and deployment (ex: pypi, docker, quay.io, kubernetes, terraform)
  • Other (please describe below)

🧭 Epic

Title: Leader Instance Identification and Operational Metadata Mapping

Goal: To provide a transparent way for operators and monitoring tools to identify which specific physical instance (mapped to Port, PID, or Hostname) currently holds the cluster leadership.

Why now: Currently, the system uses random UUIDs in Redis for leader election. These UUIDs are not mapped to any identifiable process information. In a production environment with multiple instances, it is impossible for an SRE or developer to know which instance is the leader without destructive testing (killing processes) or temporary code patching.


🧑🏻‍💻 User Story 1

As an: SRE / System Administrator

I want: The /health endpoint of each instance to return a boolean is_leader field.

So that: I can easily monitor the cluster state and health through automated monitoring tools (like Prometheus or Grafana) without guessing the leader's identity.

✅ Acceptance Criteria

Scenario: Checking leader status via API
  Given a cluster of 3 MCP Gateway instances (Ports 8001, 8002, 8003)
  When I perform a GET request to http://localhost:8001/health
  Then the response should include a field "is_leader": true (if 8001 holds the lock)
  And other instances should return "is_leader": false

Scenario: Health check consistency
  Given an instance loses leadership due to a Redis timeout
  When the health check is performed
  Then the "is_leader" field must immediately update to false

🧑🏻‍💻 User Story 2

As a: Developer

I want: Redis to store a JSON object containing instance metadata (Port, PID, Hostname) instead of just a raw UUID.

So that: I can run a single Redis command to identify the exact physical process currently acting as the leader.

✅ Acceptance Criteria

Scenario: Inspecting Redis for leader metadata
  Given Instance A on Port 8001 with PID 46807 wins the election
  When I run 'redis-cli GET gateway_service_leader'
  Then the value returned should be a JSON string like:
    {"instance_id": "88d64ed4...", "port": 8001, "pid": 46807, "hostname": "node-01"}

Scenario: Startup Logging
  Given a new instance is starting up
  When the instance_id is generated
  Then it should be immediately logged alongside its Port and PID

🔗 MCP Standards Check

  • Change adheres to current MCP specifications
  • No breaking changes to existing MCP-compliant integrations
  • If deviations exist, please describe them below:

🔄 Alternatives Considered

Temporary Logging Patch: I had to use a manual logger patch (adding logger.info after UUID generation). While this works for debugging, it is not scalable for automated operations.

Process Elimination: Killing instances one by one to see which one causes a Redis key change. This is rejected as it causes unnecessary downtime and is not viable in production.


📓 Additional Context

This request follows the identified limitation in mcpgateway/services/gateway_service.py (Line 474), where str(uuid.uuid4()) is generated but not linked to any operational context. During recent HA testing (Failover and Graceful Shutdown scenarios), it was observed that tracing the leader across multiple terminal sessions was a significant bottleneck.

Metadata

Metadata

Labels

SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseapiREST API Related itemenhancementNew feature or requestpythonPython / backend development (FastAPI)
No fields configured for Feature.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions