Problem Statement
When loading multiple MCP servers (especially from a config file), if any single MCP server fails to start, the entire agent initialization fails. This creates a poor developer experience where one misconfigured or unavailable MCP server prevents the agent from using any other tools.
Currently, if you have 5 MCP servers configured and 3rd MCP server fails to connect (wrong path, server down, auth issue), you get only 2 MCP server connections instead of the 4 that would have worked.
Proposed Solution
Implement a "fail open" strategy for MCP client initialization:
- When starting multiple MCP clients, catch startup failures per-client
- Log a warning for failed clients but continue loading others
- Return successfully loaded tools from healthy clients
- Optionally provide a callback or return value indicating which servers failed
Example behavior:
# Current behavior (fail_fast=True, the default) - unchanged
clients = [mcp1, mcp2_broken, mcp3] # Throws exception, agent unusable
# New opt-in behavior (fail_fast=False) - graceful degradation
clients = load_mcp_clients(config, fail_fast=False)
# Logs: "WARNING: Failed to start MCP server 'mcp2_broken', skipping: Connection refused"
# Agent works with tools from mcp1 and mcp3
To maintain backwards compatibility, add a fail_fast=True parameter that defaults to the current strict behavior. Users can opt-in to graceful degradation with fail_fast=False.
Use Case
- Config-driven agents: Loading MCP servers from
mcp.json where some servers may be optional or environment-specific
- Development workflows: Testing with partial MCP availability without needing all servers running
- Production resilience: Agent continues functioning even if one MCP server has temporary issues
- Multi-tenant setups: Different users may have access to different MCP servers
Alternatives Solutions
- Wrap each MCPClient in try/except manually - works but verbose and error-prone
- Pre-validate MCP configs before loading - doesn't help with runtime failures
- Lazy loading of MCP clients - more complex, changes tool discovery timing
Additional Context
I implemented this pattern in a wrapper project and it significantly improved DX:
# Fail open approach
for name, server_config in servers.items():
try:
client.start()
tools = client.list_tools_sync()
successful_servers.extend(client)
logger.info(f"Loaded {len(tools)} tools from MCP server: {name}")
except Exception as e:
logger.warning(f"Failed to start MCP server {name}, skipping: {e}")
try:
client.stop()
except Exception:
pass
This becomes especially important with #482 (config file loading) since users will likely have multiple servers defined.
Problem Statement
When loading multiple MCP servers (especially from a config file), if any single MCP server fails to start, the entire agent initialization fails. This creates a poor developer experience where one misconfigured or unavailable MCP server prevents the agent from using any other tools.
Currently, if you have 5 MCP servers configured and 3rd MCP server fails to connect (wrong path, server down, auth issue), you get only 2 MCP server connections instead of the 4 that would have worked.
Proposed Solution
Implement a "fail open" strategy for MCP client initialization:
Example behavior:
To maintain backwards compatibility, add a
fail_fast=Trueparameter that defaults to the current strict behavior. Users can opt-in to graceful degradation withfail_fast=False.Use Case
mcp.jsonwhere some servers may be optional or environment-specificAlternatives Solutions
Additional Context
I implemented this pattern in a wrapper project and it significantly improved DX:
This becomes especially important with #482 (config file loading) since users will likely have multiple servers defined.