🧭 Type of Feature
Please select the most appropriate category:
🧭 Epic
Title: Leader Instance Identification and Operational Metadata Mapping
Goal: To provide a transparent way for operators and monitoring tools to identify which specific physical instance (mapped to Port, PID, or Hostname) currently holds the cluster leadership.
Why now: Currently, the system uses random UUIDs in Redis for leader election. These UUIDs are not mapped to any identifiable process information. In a production environment with multiple instances, it is impossible for an SRE or developer to know which instance is the leader without destructive testing (killing processes) or temporary code patching.
🧑🏻💻 User Story 1
As an: SRE / System Administrator
I want: The /health endpoint of each instance to return a boolean is_leader field.
So that: I can easily monitor the cluster state and health through automated monitoring tools (like Prometheus or Grafana) without guessing the leader's identity.
✅ Acceptance Criteria
Scenario: Checking leader status via API
Given a cluster of 3 MCP Gateway instances (Ports 8001, 8002, 8003)
When I perform a GET request to http://localhost:8001/health
Then the response should include a field "is_leader": true (if 8001 holds the lock)
And other instances should return "is_leader": false
Scenario: Health check consistency
Given an instance loses leadership due to a Redis timeout
When the health check is performed
Then the "is_leader" field must immediately update to false
🧑🏻💻 User Story 2
As a: Developer
I want: Redis to store a JSON object containing instance metadata (Port, PID, Hostname) instead of just a raw UUID.
So that: I can run a single Redis command to identify the exact physical process currently acting as the leader.
✅ Acceptance Criteria
Scenario: Inspecting Redis for leader metadata
Given Instance A on Port 8001 with PID 46807 wins the election
When I run 'redis-cli GET gateway_service_leader'
Then the value returned should be a JSON string like:
{"instance_id": "88d64ed4...", "port": 8001, "pid": 46807, "hostname": "node-01"}
Scenario: Startup Logging
Given a new instance is starting up
When the instance_id is generated
Then it should be immediately logged alongside its Port and PID
🔗 MCP Standards Check
🔄 Alternatives Considered
Temporary Logging Patch: I had to use a manual logger patch (adding logger.info after UUID generation). While this works for debugging, it is not scalable for automated operations.
Process Elimination: Killing instances one by one to see which one causes a Redis key change. This is rejected as it causes unnecessary downtime and is not viable in production.
📓 Additional Context
This request follows the identified limitation in mcpgateway/services/gateway_service.py (Line 474), where str(uuid.uuid4()) is generated but not linked to any operational context. During recent HA testing (Failover and Graceful Shutdown scenarios), it was observed that tracing the leader across multiple terminal sessions was a significant bottleneck.
🧭 Type of Feature
Please select the most appropriate category:
🧭 Epic
Title: Leader Instance Identification and Operational Metadata Mapping
Goal: To provide a transparent way for operators and monitoring tools to identify which specific physical instance (mapped to Port, PID, or Hostname) currently holds the cluster leadership.
Why now: Currently, the system uses random UUIDs in Redis for leader election. These UUIDs are not mapped to any identifiable process information. In a production environment with multiple instances, it is impossible for an SRE or developer to know which instance is the leader without destructive testing (killing processes) or temporary code patching.
🧑🏻💻 User Story 1
As an: SRE / System Administrator
I want: The
/healthendpoint of each instance to return a booleanis_leaderfield.So that: I can easily monitor the cluster state and health through automated monitoring tools (like Prometheus or Grafana) without guessing the leader's identity.
✅ Acceptance Criteria
🧑🏻💻 User Story 2
As a: Developer
I want: Redis to store a JSON object containing instance metadata (Port, PID, Hostname) instead of just a raw UUID.
So that: I can run a single Redis command to identify the exact physical process currently acting as the leader.
✅ Acceptance Criteria
🔗 MCP Standards Check
🔄 Alternatives Considered
Temporary Logging Patch: I had to use a manual logger patch (adding
logger.infoafter UUID generation). While this works for debugging, it is not scalable for automated operations.Process Elimination: Killing instances one by one to see which one causes a Redis key change. This is rejected as it causes unnecessary downtime and is not viable in production.
📓 Additional Context
This request follows the identified limitation in
mcpgateway/services/gateway_service.py(Line 474), wherestr(uuid.uuid4())is generated but not linked to any operational context. During recent HA testing (Failover and Graceful Shutdown scenarios), it was observed that tracing the leader across multiple terminal sessions was a significant bottleneck.