11---
22title : Scaling and Performance
33description :
4- How to scale Virtual MCP Server deployments vertically and horizontally.
4+ How to scale MCPServer and Virtual MCP Server deployments vertically and
5+ horizontally.
56---
67
7- This guide explains how to scale Virtual MCP Server (vMCP) deployments.
8+ This guide explains how to scale MCPServer and Virtual MCP Server (vMCP)
9+ deployments.
810
911## Vertical scaling
1012
1113Vertical scaling (increasing CPU/memory per instance) is the simplest approach
12- and works for all use cases, including stateful backends.
14+ and works for all use cases, including stateful backends. Both VirtualMCPServer
15+ and MCPServer support ` podTemplateSpec ` for configuring resource requests and
16+ limits.
1317
1418To increase resources, configure ` podTemplateSpec ` in your VirtualMCPServer:
1519
@@ -37,24 +41,91 @@ higher request volumes.
3741
3842### How to scale horizontally
3943
40- The VirtualMCPServer CRD does not have a ` replicas` field. The operator creates
41- a Deployment named `vmcp-<NAME>` (where `<NAME>` is your VirtualMCPServer name)
42- with 1 replica and preserves the replicas count, allowing you to manage scaling
43- separately.
44+ Set the ` replicas` field in your VirtualMCPServer spec to control the number of
45+ vMCP pods :
46+
47+ ` ` ` yaml title="VirtualMCPServer resource"
48+ spec:
49+ replicas: 3
50+ ` ` `
51+
52+ When `replicas` is not set, the operator does not manage the replica count,
53+ leaving it to an HPA or other external controller. You can also scale manually
54+ or with an HPA :
4455
4556**Option 1: Manual scaling**
4657
4758` ` ` bash
48- kubectl scale deployment vmcp-<vmcp-name > -n <NAMESPACE> --replicas=3
59+ kubectl scale deployment vmcp-<VMCP_NAME > -n <NAMESPACE> --replicas=3
4960` ` `
5061
5162**Option 2: Autoscaling with HPA**
5263
5364` ` ` bash
54- kubectl autoscale deployment vmcp-<vmcp-name > -n <NAMESPACE> \
65+ kubectl autoscale deployment vmcp-<VMCP_NAME > -n <NAMESPACE> \
5566 --min=2 --max=5 --cpu-percent=70
5667` ` `
5768
69+ # ## Session storage for multi-replica deployments
70+
71+ When running multiple replicas, configure Redis session storage so that sessions
72+ are shared across pods. Without session storage, a request routed to a different
73+ replica than the one that established the session will fail.
74+
75+ ` ` ` yaml title="VirtualMCPServer resource"
76+ spec:
77+ replicas: 3
78+ sessionStorage:
79+ provider: redis
80+ address: redis-master.toolhive-system.svc.cluster.local:6379
81+ db: 0
82+ keyPrefix: vmcp-sessions
83+ passwordRef:
84+ name: redis-secret
85+ key: password
86+ ` ` `
87+
88+ See [Redis Sentinel session storage](../guides-k8s/redis-session-storage.mdx)
89+ for a complete Redis deployment guide.
90+
91+ :::warning
92+
93+ If you configure multiple replicas without session storage, the operator sets a
94+ ` SessionStorageMissingForReplicas` status condition on the resource. Ensure
95+ Redis is available before scaling beyond a single replica.
96+
97+ :: :
98+
99+ # ## MCPServer horizontal scaling
100+
101+ MCPServer creates two separate Deployments : one for the proxy runner and one for
102+ the MCP server backend. You can scale each independently :
103+
104+ - ` spec.replicas` controls the proxy runner pod count
105+ - ` spec.backendReplicas` controls the backend MCP server pod count
106+
107+ ` ` ` yaml title="MCPServer resource"
108+ spec:
109+ replicas: 2
110+ backendReplicas: 3
111+ sessionStorage:
112+ provider: redis
113+ address: redis-master.toolhive-system.svc.cluster.local:6379
114+ db: 0
115+ keyPrefix: mcp-sessions
116+ passwordRef:
117+ name: redis-secret
118+ key: password
119+ ` ` `
120+
121+ :::warning[Stdio transport limitation]
122+
123+ Backends using the `stdio` transport are limited to a single replica. The
124+ operator rejects configurations with `backendReplicas` greater than 1 for stdio
125+ backends.
126+
127+ :: :
128+
58129# ## When horizontal scaling is challenging
59130
60131Horizontal scaling works well for **stateless backends** (fetch, search,
@@ -63,22 +134,22 @@ read-only operations) where sessions can be resumed on any instance.
63134However, **stateful backends** make horizontal scaling difficult :
64135
65136- **Stateful backends** (Playwright browser sessions, database connections, file
66- system operations) require requests to be routed to the same vMCP instance
67- that established the session
137+ system operations) require requests to be routed to the same instance that
138+ established the session
68139- Session resumption may not work reliably for stateful backends
69140
70- The `VirtualMCPServer` CRD includes a `sessionAffinity` field that controls how
71- the Kubernetes Service routes repeated client connections. By default, it uses
72- ` ClientIP` affinity, which routes connections from the same client IP to the
73- same pod. You can configure this using the `sessionAffinity` field :
141+ The `VirtualMCPServer` and `MCPServer` CRDs include a `sessionAffinity` field
142+ that controls how the Kubernetes Service routes repeated client connections. By
143+ default, it uses `ClientIP` affinity, which routes connections from the same
144+ client IP to the same pod :
74145
75146` ` ` yaml
76147spec:
77148 sessionAffinity: ClientIP # default
78149` ` `
79150
80- For stateful backends, vertical scaling or dedicated vMCP instances per team/use
81- case are recommended instead of horizontal scaling.
151+ For stateful backends, vertical scaling or dedicated instances per team/use case
152+ are recommended instead of horizontal scaling.
82153
83154# # Next steps
84155
0 commit comments