The Inference Gateway CLI supports direct integration with MCP (Model Context Protocol) servers, allowing you to extend the LLM's capabilities with custom tools from external services.
- Overview
- Quick Start
- Configuration
- Tool Discovery
- Liveness Probes
- Tool Execution
- Examples
- Auto-Starting MCP Servers
- Troubleshooting
- Security Considerations
- MCP vs A2A
- Advanced Topics
Model Context Protocol (MCP) is a standardized protocol for connecting AI models to external tools and data sources. It enables:
- Stateless tool execution: Each request is independent
- HTTP SSE transport: Server-Sent Events for real-time communication
- Dynamic tool discovery: Tools are discovered at runtime
- Schema-based validation: JSON Schema for tool parameters
┌──────────────────────────────────────────────────────────────────┐
│ Inference CLI │
│ │
│ ┌────────────────┐ ┌─────────────────┐ │
│ │ MCP Client │ │ Tool Registry │ │
│ │ Manager │──register──▶ │ │ │
│ │ │ tools │ • Bash │ │
│ │ • Discovery │ │ • Read │ │
│ │ • Execution │ │ • MCP_* │ │
│ └────────┬───────┘ └─────────────────┘ │
│ │ │
└───────────┼──────────────────────────────────────────────────────┘
│
│ HTTP SSE (stateless)
│
├────────────────────┬────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ MCP Server │ │ MCP Server │ │ MCP Server │
│ (Filesystem) │ │ (Database) │ │ (Custom API) │
│ │ │ │ │ │
│ Tools: │ │ Tools: │ │ Tools: │
│ • read_file │ │ • query │ │ • fetch_data │
│ • write_file │ │ • list_tables │ │ • process │
│ • list_dir │ │ • describe │ │ • transform │
└─────────────────┘ └─────────────────┘ └─────────────────┘
- Direct connections: CLI connects directly to MCP servers (no gateway intermediary)
- Stateless design: Each tool execution creates a new HTTP connection
- Auto-start servers: Automatically start and manage MCP servers in OCI/Docker containers
- Automatic port assignment: No need to manually configure ports for auto-started servers
- Per-server configuration: Enable/disable servers independently
- Tool filtering: Include/exclude specific tools per server
- Concurrent discovery: Servers are queried in parallel
- Resilient: Failed servers don't prevent CLI startup
- Mode-aware: MCP tools automatically excluded from Plan mode
cd your-project
infer initThis creates .infer/mcp.yaml with example configuration.
Edit .infer/mcp.yaml:
enabled: true
connection_timeout: 30
discovery_timeout: 30
servers:
- name: "filesystem"
host: "localhost"
port: 3000
path: "/sse"
enabled: true
description: "File system operations"Configure the server to start automatically when the CLI launches:
servers:
- name: "demo-server"
enabled: true
run: true # Auto-start in container
oci: "mcp-demo-server:latest"
description: "Demo MCP server"The CLI will automatically:
- Pull the OCI image if needed
- Start the container in the background
- Assign an available port (starting from 3000)
- Configure healthchecks
- Connect to the server
Run the included demo MCP server manually:
cd examples/mcp
docker compose up -dThen configure the server URL:
servers:
- name: "demo-server"
host: "localhost"
port: 3000
path: "/sse"
enabled: trueThe demo server provides four example tools: get_time, calculate, list_files, and get_env.
infer chatType /help to see available tools. MCP tools appear as MCP_<server>_<tool>.
Located in .infer/mcp.yaml:
| Setting | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | false |
Global MCP enable/disable toggle |
connection_timeout |
integer | 30 |
Default connection timeout (seconds) |
discovery_timeout |
integer | 30 |
Tool discovery timeout (seconds) |
liveness_probe_enabled |
boolean | false |
Enable health monitoring |
liveness_probe_interval |
integer | 10 |
Health check interval (seconds) |
max_retries |
integer | 10 |
Maximum retry attempts before marking server as permanently failed |
servers |
array | [] |
List of MCP server configurations |
Each server in the servers array supports:
| Field | Required | Type | Description |
|---|---|---|---|
name |
✅ | string | Unique server identifier |
enabled |
✅ | boolean | Enable/disable this server |
timeout |
❌ | integer | Override global timeout |
description |
❌ | string | Human-readable description |
include_tools |
❌ | array | Include specific tools |
exclude_tools |
❌ | array | Exclude specific tools |
run |
❌ | boolean | Auto-start server in OCI container (default: false) |
host |
❌ | string | Container host (default: localhost) |
scheme |
❌ | string | URL scheme (default: http) |
port |
❌ | integer | Simple port mapping (auto-assigned if omitted) |
ports |
❌ | array | Advanced Docker-compose style port mappings |
path |
❌ | string | HTTP path (default: /mcp) |
oci |
❌* | string | OCI/Docker image (*required if run=true) |
args |
❌ | array | Container startup arguments |
env |
❌ | object | Environment variables for container |
volumes |
❌ | array | Docker volume mounts |
startup_timeout |
❌ | integer | Container startup timeout in seconds (default: 30) |
health_cmd |
❌ | string | Custom Docker healthcheck command |
Manual servers (
run: false): sethost/port/pathto point at the running server's SSE endpoint. The URL is built as{scheme}://{host}:{port}{path}(defaults:http,localhost,/mcp).
When include_tools is specified, only these tools are exposed:
servers:
- name: "database"
host: "localhost"
port: 3001
path: "/sse"
enabled: true
include_tools:
- "query"
- "describe_table"
- "list_tables"When exclude_tools is specified, these tools are hidden:
servers:
- name: "filesystem"
host: "localhost"
port: 3000
path: "/sse"
enabled: true
exclude_tools:
- "delete_file" # Exclude dangerous operations
- "format_disk"- If
include_toolsis set, it takes precedence (strict allowed list) - If only
exclude_toolsis set, all tools except excluded ones are available - If neither is set, all tools from the server are available
All configuration values support environment variable expansion:
servers:
- name: "filesystem"
host: "${MCP_FILESYSTEM_HOST}"
port: 3000
path: "/sse"
enabled: trueOverride via environment:
export INFER_MCP_ENABLED=true
export INFER_MCP_CONNECTION_TIMEOUT=60
export MCP_FILESYSTEM_HOST=localhost- CLI startup: MCP client manager initializes
- Concurrent discovery: Each enabled server is queried in parallel
- Tool registration: Discovered tools are registered in the tool registry
- Filtering applied: Include/exclude rules are enforced
- Naming: Tools are prefixed with
MCP_<server>_<tool>
Configure discovery timeout to prevent slow servers from delaying startup:
discovery_timeout: 30 # secondsIf a server fails during discovery:
- A warning is logged
- Other servers continue normally
- CLI starts successfully
- Failed server's tools are not available
Example log output:
WARN Failed to discover tools from MCP server server=filesystem url=http://localhost:3000/sse error="connection refused"
INFO Discovered tools from MCP server server=database tool_count=5
The CLI includes health monitoring for MCP servers to detect disconnections and display real-time connection status in the UI.
- Background Monitoring: Goroutines periodically ping each enabled MCP server
- Status Updates: Connection changes trigger UI updates via event channels
- Real-time Display: Status bar shows "MCP: X/Y" (connected/total)
- Auto-reconnection: When server reconnects, tools become available immediately
liveness_probe_enabled: true # Enable health monitoring
liveness_probe_interval: 10 # Seconds between health checks
max_retries: 10 # Maximum retry attempts before permanent failureThe status bar displays current MCP server health:
MCP: 0/1 # 0 connected, 1 total (server down)
MCP: 1/1 # 1 connected, 1 total (server healthy)
MCP: 2/3 # 2 connected, 3 total (1 server down)
- Initial State: Shows total servers, all marked disconnected
- First Connect: When server responds to ping, status updates to connected
- Disconnection: Failed ping marks server as disconnected
- Reconnection: Successful ping after failure marks server as connected
- Event-Driven: UI updates only when status actually changes (no polling)
To disable health monitoring:
liveness_probe_enabled: falseWith probes disabled, the MCP status will not appear in the status bar. Servers are still checked during initial tool discovery at startup.
- LLM requests tool:
MCP_filesystem_read_file - Tool lookup: Registry finds MCP tool wrapper
- Client creation: New HTTP SSE client created (stateless)
- Server call: Tool executed on MCP server
- Result formatting: Response formatted for LLM/UI
- Connection closed: HTTP connection terminated
Per-server timeout configuration:
servers:
- name: "slow-server"
host: "slow-service"
port: 8080
path: "/sse"
timeout: 120 # Override global timeoutTimeout precedence:
- Server-specific
timeout - Global
connection_timeout - Default: 30 seconds
Common errors and handling:
| Error | Behavior |
|---|---|
| Server unreachable | Tool execution fails, error returned to LLM |
| Timeout exceeded | Connection closed, timeout error returned |
| Invalid arguments | Validation error before server call |
| Server error | Error response passed to LLM |
# .infer/mcp.yaml
enabled: true
connection_timeout: 30
discovery_timeout: 30
servers:
- name: "filesystem"
host: "localhost"
port: 3000
path: "/sse"
enabled: true
description: "Sandboxed file system operations"
exclude_tools:
- "delete_file"
- "delete_directory"Available tools:
MCP_filesystem_read_fileMCP_filesystem_write_fileMCP_filesystem_list_directoryMCP_filesystem_create_directoryMCP_filesystem_get_file_info
enabled: true
servers:
# Filesystem access
- name: "filesystem"
host: "localhost"
port: 3000
path: "/sse"
enabled: true
timeout: 60
# Database queries
- name: "postgres"
host: "localhost"
port: 3001
path: "/sse"
enabled: true
include_tools:
- "query"
- "describe_table"
# External API (disabled)
- name: "weather-api"
host: "localhost"
port: 3002
path: "/sse"
enabled: false# .infer/mcp.yaml
enabled: ${MCP_ENABLED:-false}
connection_timeout: ${MCP_TIMEOUT:-30}
servers:
- name: "production-db"
scheme: "https"
host: "${PROD_DB_MCP_HOST}"
path: "/sse"
enabled: ${PROD_DB_ENABLED:-false}# .env
MCP_ENABLED=true
MCP_TIMEOUT=60
PROD_DB_MCP_HOST=mcp.production.example.com
PROD_DB_ENABLED=trueCheck 1: Is MCP enabled?
enabled: true # Must be trueCheck 2: Are servers enabled?
servers:
- name: "my-server"
enabled: true # Must be trueCheck 3: Check CLI logs
infer chat --log-level debugLook for discovery messages:
INFO Discovered tools from MCP server server=filesystem tool_count=8
Error: connection refused
Causes:
- MCP server not running
- Wrong URL/port
- Firewall blocking connection
Solutions:
- Verify server is running:
curl http://localhost:3000/sse - Check URL in config
- Check network connectivity
Error: context deadline exceeded
Causes:
- Server response too slow
- Network latency
- Timeout too short
Solutions:
-
Increase timeout:
servers: - name: "slow-server" timeout: 120 # Increase from default 30
-
Check server performance
-
Verify network connection
Issue: Excluded tools still appear
Check: Verify exact tool names in logs:
infer chat --log-level debug 2>&1 | grep "Registered MCP tool"Fix: Match exact tool names:
exclude_tools:
- "delete_file" # Exact match requiredThis is a bug - MCP tools should be automatically filtered in Plan mode. Please file an issue.
Always use exclude_tools to block dangerous operations:
servers:
- name: "filesystem"
exclude_tools:
- "delete_file"
- "delete_directory"
- "format_disk"
- "execute_command"- Use HTTPS for production MCP servers
- Configure firewall rules
- Use VPN for remote servers
- Limit server access with authentication (if supported by MCP server)
Set reasonable timeouts to prevent hanging:
connection_timeout: 30
discovery_timeout: 30MCP tool wrappers validate arguments before sending to server. Invalid arguments are rejected before network calls.
Run MCP servers in sandboxed environments:
- Docker containers
- Virtual machines
- Restricted user accounts
- Stateless operations: Read data, query APIs, simple transformations
- Fast execution: Operations complete in seconds
- Direct tool calls: Single request/response
- External services: Databases, APIs, file systems
- Long-running tasks: Operations taking minutes to hours
- Complex workflows: Multi-step processes
- Background processing: Async task execution
- Agent delegation: Specialized AI agents for specific domains
| Feature | MCP | A2A |
|---|---|---|
| Connection | Stateless HTTP SSE | Persistent |
| Duration | Seconds | Minutes to hours |
| Use case | Tool execution | Task delegation |
| Polling | N/A | Background monitoring |
| Mode availability | Standard, Auto-Accept | All modes |
The CLI can automatically start and manage MCP servers as OCI/Docker containers. This provides:
- Zero-configuration startup: Servers start automatically when CLI launches
- Automatic port assignment: No need to manually configure ports
- Container lifecycle management: Start, stop, and health monitoring
- Background execution: Non-blocking startup, CLI ready immediately
- Automatic healthchecks: Docker healthchecks with MCP ping method
Add a server with auto-start:
infer mcp add my-server \
--description="My MCP server" \
--run \
--oci=my-mcp-server:latestThis automatically:
- Creates server configuration with
run: true - Assigns next available port (e.g., 3000, 3001, ...)
- Configures container with defaults (localhost, http, /mcp path)
- Adds Docker healthcheck using MCP ping method
Auto-start servers use component-based URL configuration instead of a single URL field:
servers:
- name: "my-server"
enabled: true
run: true # Enable auto-start
oci: "my-mcp-server:latest" # OCI/Docker image
host: localhost # Default: localhost
scheme: http # Default: http
port: 3000 # Auto-assigned if omitted
path: /mcp # Default: /mcp
startup_timeout: 60 # Default: 30 secondsThe URL is constructed as: {scheme}://{host}:{port}{path}
Automatic (recommended):
When adding servers without specifying --port, the CLI automatically:
- Finds the highest port currently used by MCP servers
- Assigns
basePort + 1(starting from 3000)
infer mcp add server-1 --run --oci=image:latest # Gets port 3000
infer mcp add server-2 --run --oci=image:latest # Gets port 3001
infer mcp add server-3 --run --oci=image:latest # Gets port 3002Manual:
Specify a custom port:
infer mcp add my-server --run --oci=image:latest --port=8080Advanced port mappings:
For complex scenarios, use ports array (Docker-compose style):
servers:
- name: "multi-port-server"
run: true
oci: "my-server:latest"
ports:
- "3000:8080" # Host:Container
- "3001:8081"Environment variables:
servers:
- name: "api-server"
run: true
oci: "api-mcp:latest"
env:
API_KEY: "${MY_API_KEY}" # Supports variable expansion
LOG_LEVEL: "debug"
DATABASE_URL: "postgres://..."Volume mounts:
servers:
- name: "filesystem-server"
run: true
oci: "fs-mcp:latest"
volumes:
- "/host/path:/container/path"
- "/data:/mnt/data:ro" # Read-onlyStartup arguments:
servers:
- name: "custom-server"
run: true
oci: "custom-mcp:latest"
args:
- "--verbose"
- "--config=/etc/config.yaml"Custom healthcheck:
servers:
- name: "api-server"
run: true
oci: "api-mcp:latest"
health_cmd: 'sh -c "curl -f http://localhost:8080/health || exit 1"'Default healthcheck (MCP ping):
sh -c 'curl -f -X POST http://localhost:3000/mcp \
-H "Content-Type: application/json" \
-d "{\"jsonrpc\":\"2.0\",\"method\":\"ping\",\"id\":1}" || exit 1'Container naming: inference-mcp-{server-name}
Network: All containers join the infer-network Docker network
Restart policy: unless-stopped (containers restart on Docker daemon restart)
Startup behavior:
- CLI checks if container already running (reuses if exists)
- Pulls image if not cached locally
- Starts container in background goroutine
- Waits for healthcheck to pass (with timeout)
- Logs success or failure (non-fatal)
Shutdown: Containers are stopped and removed when CLI exits
Add server with auto-start:
infer mcp add <name> [flags]
--run # Enable auto-start
--oci <image> # OCI/Docker image (required if --run)
--port <port> # Optional: specific port
--startup-timeout <sec> # Optional: startup timeout (default: 60)
--description <text> # Optional: description
--enabled # Optional: enable immediately (default: true)Examples:
# Minimal - automatic port assignment
infer mcp add demo --run --oci=mcp-demo:latest
# With custom port
infer mcp add api --run --oci=api-mcp:latest --port=8080
# With startup timeout
infer mcp add slow --run --oci=slow-mcp:latest --startup-timeout=120
# Complete configuration
infer mcp add custom \
--run \
--oci=custom-mcp:latest \
--port=3000 \
--startup-timeout=60 \
--description="Custom MCP server" \
--enabledManaging servers:
# List servers
infer mcp list
# Remove server (stops container if running)
infer mcp remove <name>
# Toggle server
infer mcp toggle <name>Container won't start:
Check container logs:
docker logs inference-mcp-<name>Check if port is already in use:
lsof -i :<port>Healthcheck failing:
Verify the server's healthcheck endpoint:
curl -v http://localhost:<port>/healthTest MCP ping method:
curl -X POST http://localhost:<port>/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"ping","id":1}'Image not found:
Build or pull the image manually:
docker pull <image>
# or
docker build -t <image> .Startup timeout:
Increase timeout for slow-starting servers:
servers:
- name: "slow-server"
startup_timeout: 120 # 2 minutesCreate a custom MCP server using @modelcontextprotocol/sdk (Node.js):
import { MCPServer } from '@modelcontextprotocol/sdk';
import { createServer } from 'http';
const mcp = new MCPServer({
name: 'my-custom-server',
version: '1.0.0'
});
// Register tools
mcp.tool('my_tool', {
description: 'My custom tool',
parameters: {
type: 'object',
properties: {
input: { type: 'string' }
}
}
}, async (params) => {
return { result: `Processed: ${params.input}` };
});
// Start HTTP SSE server
const server = createServer(mcp.createHTTPHandler());
server.listen(3000);Monitor MCP tool usage:
# Count MCP tool calls
cat .infer/logs/*.log | grep "MCP_" | wc -l
# Failed MCP calls
cat .infer/logs/*.log | grep "MCP.*failed"For issues or questions: