feat(adk): Add SSE High-Availability Subsystem for Distributed MCP Deployments#911
Open
jizhuozhi wants to merge 3 commits into
Open
feat(adk): Add SSE High-Availability Subsystem for Distributed MCP Deployments#911jizhuozhi wants to merge 3 commits into
jizhuozhi wants to merge 3 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #911 +/- ##
==========================================
- Coverage 82.13% 79.15% -2.99%
==========================================
Files 146 154 +8
Lines 16063 17383 +1320
==========================================
+ Hits 13194 13759 +565
- Misses 1922 2585 +663
- Partials 947 1039 +92 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
056f9d3 to
d326808
Compare
…ployments
Reorganize the SSE High-Availability package under the mcp/ namespace
to clearly indicate this is MCP protocol-specific infrastructure.
- Move adk/transport/sseha/ -> adk/transport/mcp/sseha/
- Move adk/transport/sseha/redis/ -> adk/transport/mcp/sseha/redis/
- Update all import paths accordingly
- Future transport extensions (streamable HTTP, A2A, etc.) can be
organized under their respective protocol namespaces
Package structure:
adk/transport/mcp/sseha/ - Core interfaces and session management
adk/transport/mcp/sseha/redis/ - Redis backend implementations
fix(adk/transport/mcp/sseha): fix Go 1.18 compatibility and lint errors
- Use int64 + atomic.AddInt64 instead of atomic.Int64 (Go 1.19+)
- Fix gci import ordering
- Fix gofmt formatting
fix(adk/transport/mcp/sseha): replace interface{} with any for gofmt compatibility
golangci-lint gofmt has rewrite-rules to replace interface{} with any.
Update all interface{} usages to any to pass CI lint checks.
test(adk/transport/mcp/sseha): add comprehensive tests for middleware and session management
- Add middleware_test.go with tests for:
- HAMiddleware.Wrap (new session and reconnection scenarios)
- HAResponseWriter.SendEvent
- extractMetadata helper
- generateSessionID helper
- writeSSEEvent helper
- Extend redis_test.go with tests for:
- Session migration between nodes
- Cross-node session reconnection
- Session manager close
- CorrectSession operation
- HandleReconnection already local session
- EventBus pattern subscribe
- Add MCPMiddleware implementing MCP protocol over SSE transport - Add HAWriter interface for type-safe SSE event writing - Add MCP protocol compliance tests (handshake, tools, reconnection) - Add e2e HA scenario tests (migration, failover, multi-client)
- Add StreamableMiddleware for MCP 2025-03-26 Streamable HTTP transport - Single POST /mcp endpoint with Mcp-Session-Id header - Supports JSON response and SSE streaming upgrade - Supports stateless mode and notification handling - Rename middleware.go -> sse_middleware.go for clarity - Rename middleware_test.go -> sse_middleware_test.go - Delete redundant mcp_middleware.go (merged into sse_middleware.go) - Rewrite e2e_test.go to cover both SSE and Streamable HTTP transports - Fix SSE client to use proper MCP protocol (GET /sse + POST /messages) - Add Streamable HTTP e2e tests (basic, session continuation, streaming) - HA scenarios (migration, failover) remain transport-agnostic
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a High-Availability (HA) subsystem for SSE (Server-Sent Events) sessions in distributed MCP (Model Context Protocol) deployments (modelcontextprotocol/modelcontextprotocol#2001). It provides the infrastructure needed to maintain stateful streaming sessions across multiple nodes with automatic failover and session correction capabilities.
Why This Is Needed
SSE Long-Connection Is Still Needed
MCP Specification PR #206 introduces "Streamable HTTP" to support stateless server deployments. However, SSE long-connection sessions remain essential for many real-world scenarios:
Stateful Tool Execution: Some MCP tools maintain server-side state during execution:
Streaming Large Outputs: For tools that generate large amounts of data progressively (e.g., log streaming, large file generation), SSE provides natural backpressure and flow control that stateless HTTP cannot match.
Server-Initiated Notifications: MCP's notification model (
notifications/resources/updated,notifications/prompts/list_changed) requires a persistent channel from server to client.Existing Client Compatibility: Many MCP clients already implement SSE-based transport; migration to Streamable HTTP takes time.
For these stateful scenarios, session affinity and cross-node coordination are required, not optional.
The Multi-Instance Session Drift Problem
When deploying MCP servers with multiple replicas behind a load balancer:
Problem 1: Client Reconnect Lands on Wrong Node
Problem 2: Cross-Node Event Delivery
A tool executing on Pod A needs to notify a client connected to Pod B:
Problem 3: Session Ownership Confusion
Without a shared registry, multiple pods could inadvertently serve the same session:
What This PR Enables
This subsystem provides the infrastructure for multi-node coordination:
Solution Overview
This implementation is inspired by MCP Specification PR #206 and provides:
Core Components
Key Features
Architecture
Two event forwarding modes are provided as alternatives (choose one):
Option A: Pub/Sub Mode (Recommended for most cases)
When a session is established on Node B, Node B subscribes to its channel on EventBus. When Node A receives a misrouted event for Session 2, Node A publishes to EventBus.
Subscription Flow (at session initialization):
Event Routing Flow (when misrouted):
Pros: Nodes don't need to know each other's addresses; handles misrouting gracefully.
Cons: Slightly higher latency (hop through Redis); Redis EventBus is critical path.
Option B: P2P Mode
Nodes depend on MetadataStore (Registry) to discover session ownership, then forward events directly via P2P.
Event flow:
Pros: Lower latency (direct); EventBus not needed for routing.
Cons: Nodes must be network-reachable from each other; requires P2PForwarder implementation.
API Example
Testing
All components include comprehensive unit tests:
Run tests:
go test -v ./adk/transport/mcp/sseha/...Breaking Changes
None. This is a new optional subsystem. Existing MCP transport code continues to work unchanged.
Checklist
golangci-lint run ./...)Future Work
Related