fix(gateways): resolve authorization_code OAuth gateway offline issues#5244
Conversation
Fix bug #5237 where authorization_code OAuth gateways would become permanently offline due to health check assumptions and destructive token handling. Changes: - Decouple health checks from token ownership for authorization_code flow - Treat 401/403 responses as 'gateway reachable' instead of failures - Make client_secret decryption failures explicit (raise OAuthError) - Respect omit_resource flag during token refresh - Only delete tokens on invalid_grant errors (RFC 6749 §5.2) - Add gateway reactivation on 401/403 when previously unreachable - Update last_seen timestamp even on auth failures Tests: - Added test_oauth_authorization_code_health.py (8 tests) - Added test_gateway_service_401_reactivation.py (5 tests) - Updated existing OAuth tests to match new behavior - All 172 OAuth-related tests passing with no regressions Closes #5237 Signed-off-by: Bogdan-Marius-Catanus <bogdan-marius.catanus@ibm.com>
…ecome-permanently-offline-due-to-health-check-token-assumptions-and-destructive-refresh-token-handling
Signed-off-by: Bogdan-Marius-Catanus <bogdan-marius.catanus@ibm.com>
|
@ja8zyjits @bogdanmariusc10 Our production onboarding is blocked due to this critical issue . Assuming it is blocking every Oauth based MCP server any plan to release it as HotFix tag ? Note - if you any have workaround solutions and approach IBM is doing for Oauth based MCP servers without this bug fix please let us know . |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🔗 Related Issue
Closes #5237
📝 Summary
This PR fixes critical issues where OAuth
authorization_codegateways would become permanently offline due to health check assumptions and destructive token handling behavior.Problem:
authorization_codegateways (where users authorize individually) were failing health checks when no platform admin token existed, causing cascading failures:Solution:
invalid_granterrors (permanent failure), not transient errors🏷️ Type of Change
🧪 Verification
make lintmake testmake coverageTest Results:
✅ Checklist
make black isort pre-commit)📓 Notes
Files Changed
Core Implementation (2 files):
mcpgateway/services/gateway_service.py- Health check logic for authorization_code flowmcpgateway/services/token_storage_service.py- Token refresh improvementsTest Updates (4 files):
tests/unit/mcpgateway/services/test_gateway_service.py- Added httpx import, updated expectationstests/unit/mcpgateway/services/test_gateway_service_health_oauth.py- OAuth health check test updatestests/unit/mcpgateway/services/test_token_storage_service.py- Token refresh test updatestests/unit/mcpgateway/test_oauth_manager.py- Changed to use OAuthError with invalid_grantNew Test Files (2 files):
tests/unit/mcpgateway/services/test_oauth_authorization_code_health.py(8 tests)tests/unit/mcpgateway/services/test_gateway_service_401_reactivation.py(5 tests)Technical Details
Health Check Behavior:
Token Deletion Logic (RFC 6749 Compliance):
client_secret Handling:
omit_resource Flag:
Impact
Before (broken):
After (fixed):
Testing Strategy