feat(api): add leader instance identification with operational metadata in /health endpoint#3975
Open
bogdanmariusc10 wants to merge 4 commits intomainfrom
Conversation
added 3 commits
April 1, 2026 16:28
- Add instance metadata (port, PID, hostname) to Redis leader election - Expose is_leader boolean field in /health endpoint for cluster monitoring - Store JSON metadata in Redis instead of plain UUID for operational visibility - Add startup logging of instance metadata - Make /health endpoint async to support Redis leader status checks - Update tests to async for health check endpoints - Maintain backward compatibility with legacy UUID-only format Closes #3838 Signed-off-by: Bogdan-Marius-Catanus <bogdan-marius.catanus@ibm.com>
- Store instance metadata (port, PID, hostname, instance_id) in Redis as JSON - Add is_leader boolean field to /health endpoint for monitoring - Make /health endpoint async to properly check Redis leadership status - Update tests to verify JSON metadata format and async health checks - Maintain backward compatibility with legacy UUID-only format - Remove unused is_leader_sync() method to avoid event loop conflicts Closes #3838 Signed-off-by: Bogdan-Marius-Catanus <bogdan-marius.catanus@ibm.com>
10 tasks
- Add test for health check exception handling when is_leader() fails - Add tests for GatewayService initialization with invalid/None port values - Add comprehensive tests for is_leader() method covering JSON metadata, legacy UUID, and edge cases - Fix is_leader() to return False when Redis is unavailable (was incorrectly returning True) - All tests passing: 15,833 passed (9 new tests added) Addresses coverage gaps identified in diff-cover report for lines: - mcpgateway/main.py: 10346, 10348 (exception handling) - mcpgateway/services/gateway_service.py: 478-479 (port conversion), 4133-4152 (is_leader logic) Signed-off-by: Bogdan-Marius-Catanus <bogdan-marius.catanus@ibm.com>
d2bd1b6 to
d38d030
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🔗 Related Issue
Closes #3838
📝 Summary
This PR implements leader instance identification and operational metadata mapping for MCP Gateway clusters, enabling operators and monitoring tools to easily identify which specific physical instance currently holds the cluster leadership.
Problem: Previously, the system used random UUIDs in Redis for leader election without mapping to any identifiable process information. In production environments with multiple instances, it was impossible to know which instance was the leader without destructive testing or temporary code patching.
Solution:
is_leaderboolean field to/healthendpoint for automated monitoring/healthendpoint async to properly check Redis leadership statusImpact: SREs can now identify the leader with a single Redis command or health check, and monitoring tools (Prometheus/Grafana) can track leadership changes automatically.
🏷️ Type of Change
🧪 Verification
make lintmake testmake coverageFunctional Testing:
/healthendpoint returns correctis_leaderstatus✅ Checklist
make black isort pre-commit)📓 Notes
Implementation Details
Files Modified:
mcpgateway/services/gateway_service.py- Core implementation_instance_metadatawith port, PID, hostname, instance_idis_leader()method for checking leadership statusmcpgateway/main.py- Health endpoint enhancement/healthendpoint asyncis_leaderboolean field to responsetests/unit/mcpgateway/services/test_gateway_service_redis_leadership.py- Test updatestests/unit/mcpgateway/test_main.py&tests/unit/mcpgateway/test_main_extended.pyis_leaderfieldExample Usage
Check leader via health endpoint:
Check leader via Redis:
Design Decisions
Async-only approach: Removed sync
is_leader()method to avoid event loop conflicts in FastAPI contextJSON metadata format: Provides structured data for better observability and debugging
Backward compatibility: Gracefully handles both JSON and legacy UUID formats during transition
Fail-safe defaults: Returns sensible defaults when Redis is unavailable to avoid blocking operations
Tests Added (9 total)
tests/unit/mcpgateway/test_main.pytest_health_check_leader_exception- Health check handlesis_leader()exception, returnsis_leader: falsetests/unit/mcpgateway/services/test_gateway_service_redis_leadership.pyTestGatewayServiceInitialization (3 tests):
2.
test_init_with_invalid_port- Invalid port string defaults to 03.
test_init_with_none_port- None port defaults to 04.
test_init_with_valid_port- Valid port string converts to intTestIsLeaderMethod (5 tests):
5.
test_is_leader_with_json_metadata- Parses JSON metadata from Redis6.
test_is_leader_with_legacy_uuid- Handles legacy UUID-only format7.
test_is_leader_returns_false_when_not_leader- Returns False for non-leader8.
test_is_leader_returns_false_when_no_leader- Returns False when Redis has no leader9.
test_is_leader_returns_false_when_redis_unavailable- Returns False when Redis unavailable (bug fix)Result: 15,833 tests passing, all coverage gaps addressed