Skip to content

Commit 202f5f2

Browse files
rdimitrovclaudeyrobladanbarr
authored
Update docs for ToolHive v0.12.3–v0.13.0 (#641)
* Update docs for ToolHive v0.12.3–v0.13.0 Catch up documentation with features shipped in v0.12.3 through v0.13.0. Auto-generated CLI/CRD reference docs were already current; these changes cover manual doc updates verified against source code at each release tag. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Mention 1password support Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com> * Revert "Mention 1password support" This reverts commit ef1ea56. * Address scalability review feedback in operator and vMCP guides - Clarify SessionStorageWarning is advisory-only and the operator still applies the requested replica count - Correct condition type (SessionStorageWarning) vs reason (SessionStorageMissingForReplicas) distinction - Add warning that ClientIP session affinity fails silently behind NAT or shared egress IPs, with guidance to use None for stateless backends - Fix MCPServer horizontal scaling section: backend is a StatefulSet, not a Deployment; add architecture overview and common scaling configs - Note that SessionStorageWarning only fires for spec.replicas > 1, not backendReplicas - Add connection draining note: 30s grace/drain period, no preStop hook, override via podTemplateSpec - Add Redis address example comment to prompt users to update the value - Clarify maxParallel fan-out is per-pod, not distributed across replicas - Add tip on sizing workflow timeouts relative to maxIterations/maxParallel * Update docs/toolhive/guides-k8s/run-mcp-k8s.mdx Co-authored-by: Dan Barr <danbarr@users.noreply.github.com> * Fix em dashes and title case per review feedback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix review findings: StatefulSet→Deployment, redundant paragraph, nits - Align MCPServer backend workload type with CRD reference (Deployment, not StatefulSet) - Remove redundant closing paragraph in scaling guide - Add Redis address comment in vMCP scaling example - Use precise CRD description for forEach collection field Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Yolanda Robla <info@ysoft.biz> Co-authored-by: Yolanda Robla Mota <yolanda@stacklok.com> Co-authored-by: Dan Barr <danbarr@users.noreply.github.com>
1 parent 8cd92ea commit 202f5f2

6 files changed

Lines changed: 296 additions & 29 deletions

File tree

docs/toolhive/guides-k8s/auth-k8s.mdx

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -599,13 +599,13 @@ kubectl apply -f embedded-auth-config.yaml
599599

600600
**Configuration reference:**
601601

602-
| Field | Description |
603-
| ---------------------- | ---------------------------------------------------------------------------------------------------------------------- |
604-
| `issuer` | HTTPS URL identifying this authorization server. Appears in the `iss` claim of issued JWTs. |
605-
| `signingKeySecretRefs` | References to Secrets containing JWT signing keys. First key is active; additional keys support rotation. |
606-
| `hmacSecretRefs` | References to Secrets with symmetric keys for signing authorization codes and refresh tokens. |
607-
| `tokenLifespans` | Configurable durations for access tokens (default: 1h), refresh tokens (default: 168h), and auth codes (default: 10m). |
608-
| `upstreamProviders` | Configuration for the upstream identity provider. Currently supports one provider. |
602+
| Field | Description |
603+
| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
604+
| `issuer` | HTTPS URL identifying this authorization server. Appears in the `iss` claim of issued JWTs. |
605+
| `signingKeySecretRefs` | References to Secrets containing JWT signing keys. First key is active; additional keys support rotation. |
606+
| `hmacSecretRefs` | References to Secrets with symmetric keys for signing authorization codes and refresh tokens. |
607+
| `tokenLifespans` | Configurable durations for access tokens (default: 1h), refresh tokens (default: 168h), and auth codes (default: 10m). |
608+
| `upstreamProviders` | Configuration for upstream identity providers. MCPServer and MCPRemoteProxy support one provider; VirtualMCPServer supports multiple providers for sequential authentication. |
609609

610610
**Step 5: Create the MCPServer resource**
611611

docs/toolhive/guides-k8s/redis-session-storage.mdx

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: Redis Sentinel session storage
33
description:
44
How to deploy Redis Sentinel and configure persistent session storage for the
5-
ToolHive embedded authorization server.
5+
ToolHive embedded authorization server and horizontal scaling.
66
---
77

88
Deploy Redis Sentinel and configure it as the session storage backend for the
@@ -12,6 +12,11 @@ when pods restart and users must re-authenticate. Redis Sentinel provides
1212
persistent storage with automatic master discovery, ACL-based access control,
1313
and optional failover when replicas are configured.
1414

15+
Redis session storage is also required for horizontal scaling when running
16+
multiple [MCPServer](./run-mcp-k8s.mdx#horizontal-scaling) or
17+
[VirtualMCPServer](../guides-vmcp/scaling-and-performance.mdx#session-storage-for-multi-replica-deployments)
18+
replicas, so that sessions are shared across pods.
19+
1520
:::info[Prerequisites]
1621

1722
Before you begin, ensure you have:

docs/toolhive/guides-k8s/run-mcp-k8s.mdx

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -441,6 +441,89 @@ For more details about a specific MCP server:
441441
kubectl -n <NAMESPACE> describe mcpserver <NAME>
442442
```
443443

444+
## Horizontal scaling
445+
446+
MCPServer creates two separate Deployments: a proxy runner and a backend MCP
447+
server. You can scale each independently:
448+
449+
- `spec.replicas` controls the proxy runner pod count
450+
- `spec.backendReplicas` controls the backend MCP server pod count
451+
452+
The proxy runner handles authentication, MCP protocol framing, and session
453+
management; it is stateless with respect to tool execution. The backend runs the
454+
actual MCP server and executes tools.
455+
456+
Common configurations:
457+
458+
- **Scale only the proxy** (`replicas: N`, omit `backendReplicas`): useful when
459+
auth and connection overhead is the bottleneck with a single backend.
460+
- **Scale only the backend** (omit `replicas`, `backendReplicas: M`): useful
461+
when tool execution is CPU/memory-bound and the proxy is not a bottleneck. The
462+
backend Deployment uses client-IP session affinity to route repeated
463+
connections to the same pod - subject to the same NAT limitations as
464+
proxy-level affinity.
465+
- **Scale both** (`replicas: N`, `backendReplicas: M`): full horizontal scale.
466+
Redis session storage is required when `replicas > 1`.
467+
468+
```yaml title="MCPServer resource"
469+
spec:
470+
replicas: 2
471+
backendReplicas: 3
472+
sessionStorage:
473+
provider: redis
474+
address: redis-master.toolhive-system.svc.cluster.local:6379 # Update to match your Redis Service location
475+
db: 0
476+
keyPrefix: mcp-sessions
477+
passwordRef:
478+
name: redis-secret
479+
key: password
480+
```
481+
482+
When running multiple replicas, configure
483+
[Redis session storage](./redis-session-storage.mdx) so that sessions are shared
484+
across pods. If you omit `replicas` or `backendReplicas`, the operator defers
485+
replica management to an HPA or other external controller.
486+
487+
:::note
488+
489+
The `SessionStorageWarning` condition fires only when `spec.replicas > 1`.
490+
Scaling only the backend (`backendReplicas > 1`) does not trigger a warning, but
491+
backend client-IP affinity is still unreliable behind NAT or shared egress IPs.
492+
493+
:::
494+
495+
:::note[Connection draining on scale-down]
496+
497+
When a proxy runner pod is terminated (scale-in, rolling update, or node
498+
eviction), Kubernetes sends SIGTERM and the proxy drains in-flight requests for
499+
up to 30 seconds before force-closing connections. The grace period and drain
500+
timeout are both 30 seconds with no headroom, so long-lived SSE or streaming
501+
connections may be dropped if they exceed the drain window.
502+
503+
No preStop hook is injected by the operator. If your workload requires
504+
additional time - for example, to let kube-proxy propagate endpoint removal
505+
before the pod stops accepting traffic - override
506+
`terminationGracePeriodSeconds` via `podTemplateSpec`:
507+
508+
```yaml
509+
spec:
510+
podTemplateSpec:
511+
spec:
512+
terminationGracePeriodSeconds: 60
513+
```
514+
515+
The same 30-second default applies to the backend Deployment.
516+
517+
:::
518+
519+
:::warning[Stdio transport limitation]
520+
521+
Backends using the `stdio` transport are limited to a single replica. The
522+
operator rejects configurations with `backendReplicas` greater than 1 for stdio
523+
backends.
524+
525+
:::
526+
444527
## Next steps
445528

446529
- [Connect clients to your MCP servers](./connect-clients.mdx) from outside the
@@ -457,6 +540,8 @@ kubectl -n <NAMESPACE> describe mcpserver <NAME>
457540

458541
- [Kubernetes CRD reference](../reference/crd-spec.md#apiv1alpha1mcpserver) -
459542
Reference for the `MCPServer` Custom Resource Definition (CRD)
543+
- [vMCP scaling and performance](../guides-vmcp/scaling-and-performance.mdx) -
544+
Scale Virtual MCP Server deployments
460545
- [Deploy the operator](./deploy-operator.mdx) - Install the ToolHive operator
461546
- [Build MCP containers](../guides-cli/build-containers.mdx) - Create custom MCP
462547
server container images

docs/toolhive/guides-vmcp/composite-tools.mdx

Lines changed: 95 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ backend MCP servers, handling dependencies and collecting results.
1919
wait for their prerequisites
2020
- **Template expansion**: Dynamic arguments using step outputs
2121
- **Elicitation**: Request user input mid-workflow (approval gates, choices)
22+
- **Iteration**: Loop over collections with forEach steps
2223
- **Error handling**: Configurable abort, continue, or retry behavior
2324
- **Timeouts**: Workflow and per-step timeout configuration
2425

@@ -290,7 +291,7 @@ spec:
290291

291292
### Steps
292293

293-
Each step can be a tool call or an elicitation:
294+
Each step can be a tool call, an elicitation, or a forEach loop:
294295

295296
```yaml title="VirtualMCPServer resource"
296297
spec:
@@ -344,6 +345,89 @@ spec:
344345
timeout: '5m'
345346
```
346347

348+
### forEach steps
349+
350+
Iterate over a collection from a previous step's output and execute a tool call
351+
for each item:
352+
353+
```yaml title="VirtualMCPServer resource"
354+
spec:
355+
config:
356+
compositeTools:
357+
- name: scan_repositories
358+
description: Check each repository for security advisories
359+
parameters:
360+
type: object
361+
properties:
362+
org:
363+
type: string
364+
required:
365+
- org
366+
steps:
367+
- id: list_repos
368+
tool: github_list_repos
369+
arguments:
370+
org: '{{.params.org}}'
371+
# highlight-start
372+
- id: check_advisories
373+
type: forEach
374+
collection: '{{json .steps.list_repos.output.repositories}}'
375+
itemVar: repo
376+
maxParallel: 5
377+
step:
378+
type: tool
379+
tool: github_list_security_advisories
380+
arguments:
381+
repo: '{{.forEach.repo.name}}'
382+
onError:
383+
action: continue
384+
dependsOn: [list_repos]
385+
# highlight-end
386+
```
387+
388+
**forEach fields:**
389+
390+
| Field | Description | Default |
391+
| --------------- | ----------------------------------------------------- | ------- |
392+
| `collection` | Template expression that resolves to a JSON array | - |
393+
| `itemVar` | Variable name for the current item | item |
394+
| `maxParallel` | Maximum concurrent iterations (max 50) | 10 |
395+
| `maxIterations` | Maximum total iterations (max 1000) | 100 |
396+
| `step` | Inner step definition (tool call to execute per item) | - |
397+
| `onError` | Error handling: `abort` (stop) or `continue` (skip) | abort |
398+
399+
:::note
400+
401+
`forEach` does not support `onError.action: retry`. Use `retry` on regular tool
402+
steps. The `maxParallel` cap of 50 is enforced at runtime regardless of the
403+
configured value.
404+
405+
:::
406+
407+
Access the current item inside the inner step using
408+
`{{.forEach.<itemVar>.<field>}}`. In the example above, `{{.forEach.repo.name}}`
409+
accesses the `name` field of the current repository. You can also use
410+
`{{.forEach.index}}` to access the zero-based iteration index.
411+
412+
`maxParallel` controls how many iterations run concurrently **on the pod that
413+
received the composite tool request**. Iterations are not distributed across
414+
vMCP replicas - all parallel backend calls originate from a single pod
415+
regardless of `spec.replicas`. When sizing your deployment, account for the
416+
per-pod fan-out: a `maxParallel: 50` forEach step can open up to 50 simultaneous
417+
connections to backend MCP servers from one pod. Ensure both the vMCP pod
418+
resources and the backend MCP servers can handle that per-pod concurrency.
419+
420+
:::tip[Plan your workflow timeouts]
421+
422+
With `maxIterations: 1000` and `maxParallel: 10` (the defaults), a forEach loop
423+
runs up to 100 serial batches. If each backend call takes a few seconds, the
424+
total duration can easily exceed a workflow-level timeout. Set the workflow
425+
`timeout` to at least
426+
`ceil(maxIterations / maxParallel) × expected step duration` to avoid silent
427+
truncation.
428+
429+
:::
430+
347431
### Error handling
348432

349433
Configure behavior when steps fail:
@@ -507,13 +591,16 @@ without defaultResults defined
507591

508592
Access workflow context in arguments:
509593

510-
| Template | Description |
511-
| --------------------------- | ------------------------------------------ |
512-
| `{{.params.name}}` | Input parameter |
513-
| `{{.steps.id.output}}` | Step output (map) |
514-
| `{{.steps.id.output.text}}` | Text content from step output |
515-
| `{{.steps.id.content}}` | Elicitation response content |
516-
| `{{.steps.id.action}}` | Elicitation action (accept/decline/cancel) |
594+
| Template | Description |
595+
| -------------------------------- | ------------------------------------------ |
596+
| `{{.params.name}}` | Input parameter |
597+
| `{{.steps.id.output}}` | Step output (map) |
598+
| `{{.steps.id.output.text}}` | Text content from step output |
599+
| `{{.steps.id.content}}` | Elicitation response content |
600+
| `{{.steps.id.action}}` | Elicitation action (accept/decline/cancel) |
601+
| `{{.forEach.<itemVar>}}` | Current forEach item |
602+
| `{{.forEach.<itemVar>.<field>}}` | Field on current forEach item |
603+
| `{{.forEach.index}}` | Zero-based iteration index |
517604

518605
### Template functions
519606

docs/toolhive/guides-vmcp/scaling-and-performance.mdx

Lines changed: 67 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
11
---
2-
title: Scaling and Performance
2+
title: Scaling and performance
33
description:
44
How to scale Virtual MCP Server deployments vertically and horizontally.
55
---
66

7-
This guide explains how to scale Virtual MCP Server (vMCP) deployments.
7+
This guide explains how to scale Virtual MCP Server (vMCP) deployments. For
8+
MCPServer scaling, see
9+
[Horizontal scaling](../guides-k8s/run-mcp-k8s.mdx#horizontal-scaling) in the
10+
Kubernetes operator guide.
811

912
## Vertical scaling
1013

@@ -37,24 +40,62 @@ higher request volumes.
3740
3841
### How to scale horizontally
3942
40-
The VirtualMCPServer CRD does not have a `replicas` field. The operator creates
41-
a Deployment named `vmcp-<NAME>` (where `<NAME>` is your VirtualMCPServer name)
42-
with 1 replica and preserves the replicas count, allowing you to manage scaling
43-
separately.
43+
Set the `replicas` field in your VirtualMCPServer spec to control the number of
44+
vMCP pods:
45+
46+
```yaml title="VirtualMCPServer resource"
47+
spec:
48+
replicas: 3
49+
```
50+
51+
If you omit `replicas`, the operator defers replica management to an HPA or
52+
other external controller. You can also scale manually or with an HPA:
4453

4554
**Option 1: Manual scaling**
4655

4756
```bash
48-
kubectl scale deployment vmcp-<vmcp-name> -n <NAMESPACE> --replicas=3
57+
kubectl scale deployment vmcp-<VMCP_NAME> -n <NAMESPACE> --replicas=3
4958
```
5059

5160
**Option 2: Autoscaling with HPA**
5261

5362
```bash
54-
kubectl autoscale deployment vmcp-<vmcp-name> -n <NAMESPACE> \
63+
kubectl autoscale deployment vmcp-<VMCP_NAME> -n <NAMESPACE> \
5564
--min=2 --max=5 --cpu-percent=70
5665
```
5766

67+
### Session storage for multi-replica deployments
68+
69+
When running multiple replicas, configure Redis session storage so that sessions
70+
are shared across pods. Without session storage, a request routed to a different
71+
replica than the one that established the session will fail.
72+
73+
```yaml title="VirtualMCPServer resource"
74+
spec:
75+
replicas: 3
76+
sessionStorage:
77+
provider: redis
78+
address: redis-master.toolhive-system.svc.cluster.local:6379 # Update to match your Redis Service location
79+
db: 0
80+
keyPrefix: vmcp-sessions
81+
passwordRef:
82+
name: redis-secret
83+
key: password
84+
```
85+
86+
See [Redis Sentinel session storage](../guides-k8s/redis-session-storage.mdx)
87+
for a complete Redis deployment guide.
88+
89+
:::warning
90+
91+
If you configure multiple replicas without session storage, the operator sets a
92+
`SessionStorageWarning` status condition on the resource but **still applies the
93+
replica count**. Pods will start, but requests routed to a replica that did not
94+
establish the session will fail. Ensure Redis is available before scaling beyond
95+
a single replica.
96+
97+
:::
98+
5899
### When horizontal scaling is challenging
59100

60101
Horizontal scaling works well for **stateless backends** (fetch, search,
@@ -63,22 +104,35 @@ read-only operations) where sessions can be resumed on any instance.
63104
However, **stateful backends** make horizontal scaling difficult:
64105

65106
- **Stateful backends** (Playwright browser sessions, database connections, file
66-
system operations) require requests to be routed to the same vMCP instance
67-
that established the session
107+
system operations) require requests to be routed to the same instance that
108+
established the session
68109
- Session resumption may not work reliably for stateful backends
69110

70111
The `VirtualMCPServer` CRD includes a `sessionAffinity` field that controls how
71112
the Kubernetes Service routes repeated client connections. By default, it uses
72113
`ClientIP` affinity, which routes connections from the same client IP to the
73-
same pod. You can configure this using the `sessionAffinity` field:
114+
same pod:
74115

75116
```yaml
76117
spec:
77118
sessionAffinity: ClientIP # default
78119
```
79120

80-
For stateful backends, vertical scaling or dedicated vMCP instances per team/use
81-
case are recommended instead of horizontal scaling.
121+
:::warning[ClientIP affinity is unreliable behind NAT or shared egress IPs]
122+
123+
`ClientIP` affinity relies on the source IP reaching kube-proxy. When clients
124+
sit behind a NAT gateway, corporate proxy, or cloud load balancer (common in
125+
EKS, GKE, and AKS), all traffic appears to originate from the same IP - routing
126+
every client to the same pod and eliminating the benefit of horizontal scaling.
127+
This fails silently: the deployment appears healthy but only one pod handles all
128+
load.
129+
130+
For stateless backends, set `sessionAffinity: None` so the Service load-balances
131+
freely. For stateful backends where true per-session routing is required,
132+
`ClientIP` affinity is a best-effort mechanism only. Prefer vertical scaling or
133+
a dedicated vMCP instance per team instead.
134+
135+
:::
82136

83137
## Next steps
84138

0 commit comments

Comments
 (0)