Update docs for ToolHive v0.12.3–v0.13.0 (#641)

rdimitrov · claude · yrobla · web-flow · commit 202f5f24d892 · 2026-04-09T08:59:30.000-04:00
* Update docs for ToolHive v0.12.3–v0.13.0 Catch up documentation with features shipped in v0.12.3 through v0.13.0. Auto-generated CLI/CRD reference docs were already current; these changes cover manual doc updates verified against source code at each release tag. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Mention 1password support Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com> * Revert "Mention 1password support" This reverts commit ef1ea56. * Address scalability review feedback in operator and vMCP guides - Clarify SessionStorageWarning is advisory-only and the operator still applies the requested replica count - Correct condition type (SessionStorageWarning) vs reason (SessionStorageMissingForReplicas) distinction - Add warning that ClientIP session affinity fails silently behind NAT or shared egress IPs, with guidance to use None for stateless backends - Fix MCPServer horizontal scaling section: backend is a StatefulSet, not a Deployment; add architecture overview and common scaling configs - Note that SessionStorageWarning only fires for spec.replicas > 1, not backendReplicas - Add connection draining note: 30s grace/drain period, no preStop hook, override via podTemplateSpec - Add Redis address example comment to prompt users to update the value - Clarify maxParallel fan-out is per-pod, not distributed across replicas - Add tip on sizing workflow timeouts relative to maxIterations/maxParallel * Update docs/toolhive/guides-k8s/run-mcp-k8s.mdx Co-authored-by: Dan Barr <danbarr@users.noreply.github.com> * Fix em dashes and title case per review feedback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix review findings: StatefulSet→Deployment, redundant paragraph, nits - Align MCPServer backend workload type with CRD reference (Deployment, not StatefulSet) - Remove redundant closing paragraph in scaling guide - Add Redis address comment in vMCP scaling example - Use precise CRD description for forEach collection field Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Yolanda Robla <info@ysoft.biz> Co-authored-by: Yolanda Robla Mota <yolanda@stacklok.com> Co-authored-by: Dan Barr <danbarr@users.noreply.github.com>
diff --git a/docs/toolhive/guides-k8s/auth-k8s.mdx b/docs/toolhive/guides-k8s/auth-k8s.mdx
@@ -599,13 +599,13 @@ kubectl apply -f embedded-auth-config.yaml
 
 **Configuration reference:**
 
-| Field                  | Description                                                                                                            |
-| ---------------------- | ---------------------------------------------------------------------------------------------------------------------- |
-| `issuer`               | HTTPS URL identifying this authorization server. Appears in the `iss` claim of issued JWTs.                            |
-| `signingKeySecretRefs` | References to Secrets containing JWT signing keys. First key is active; additional keys support rotation.              |
-| `hmacSecretRefs`       | References to Secrets with symmetric keys for signing authorization codes and refresh tokens.                          |
-| `tokenLifespans`       | Configurable durations for access tokens (default: 1h), refresh tokens (default: 168h), and auth codes (default: 10m). |
-| `upstreamProviders`    | Configuration for the upstream identity provider. Currently supports one provider.                                     |
+| Field                  | Description                                                                                                                                                                   |
+| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `issuer`               | HTTPS URL identifying this authorization server. Appears in the `iss` claim of issued JWTs.                                                                                   |
+| `signingKeySecretRefs` | References to Secrets containing JWT signing keys. First key is active; additional keys support rotation.                                                                     |
+| `hmacSecretRefs`       | References to Secrets with symmetric keys for signing authorization codes and refresh tokens.                                                                                 |
+| `tokenLifespans`       | Configurable durations for access tokens (default: 1h), refresh tokens (default: 168h), and auth codes (default: 10m).                                                        |
+| `upstreamProviders`    | Configuration for upstream identity providers. MCPServer and MCPRemoteProxy support one provider; VirtualMCPServer supports multiple providers for sequential authentication. |
 
 **Step 5: Create the MCPServer resource**
 
diff --git a/docs/toolhive/guides-k8s/redis-session-storage.mdx b/docs/toolhive/guides-k8s/redis-session-storage.mdx
@@ -2,7 +2,7 @@
 title: Redis Sentinel session storage
 description:
   How to deploy Redis Sentinel and configure persistent session storage for the
-  ToolHive embedded authorization server.
+  ToolHive embedded authorization server and horizontal scaling.
 ---
 
 Deploy Redis Sentinel and configure it as the session storage backend for the
@@ -12,6 +12,11 @@ when pods restart and users must re-authenticate. Redis Sentinel provides
 persistent storage with automatic master discovery, ACL-based access control,
 and optional failover when replicas are configured.
 
+Redis session storage is also required for horizontal scaling when running
+multiple [MCPServer](./run-mcp-k8s.mdx#horizontal-scaling) or
+[VirtualMCPServer](../guides-vmcp/scaling-and-performance.mdx#session-storage-for-multi-replica-deployments)
+replicas, so that sessions are shared across pods.
+
 :::info[Prerequisites]
 
 Before you begin, ensure you have:
diff --git a/docs/toolhive/guides-k8s/run-mcp-k8s.mdx b/docs/toolhive/guides-k8s/run-mcp-k8s.mdx
@@ -441,6 +441,89 @@ For more details about a specific MCP server:
 kubectl -n <NAMESPACE> describe mcpserver <NAME>
 ```
 
+## Horizontal scaling
+
+MCPServer creates two separate Deployments: a proxy runner and a backend MCP
+server. You can scale each independently:
+
+- `spec.replicas` controls the proxy runner pod count
+- `spec.backendReplicas` controls the backend MCP server pod count
+
+The proxy runner handles authentication, MCP protocol framing, and session
+management; it is stateless with respect to tool execution. The backend runs the
+actual MCP server and executes tools.
+
+Common configurations:
+
+- **Scale only the proxy** (`replicas: N`, omit `backendReplicas`): useful when
+  auth and connection overhead is the bottleneck with a single backend.
+- **Scale only the backend** (omit `replicas`, `backendReplicas: M`): useful
+  when tool execution is CPU/memory-bound and the proxy is not a bottleneck. The
+  backend Deployment uses client-IP session affinity to route repeated
+  connections to the same pod - subject to the same NAT limitations as
+  proxy-level affinity.
+- **Scale both** (`replicas: N`, `backendReplicas: M`): full horizontal scale.
+  Redis session storage is required when `replicas > 1`.
+
+```yaml title="MCPServer resource"
+spec:
+  replicas: 2
+  backendReplicas: 3
+  sessionStorage:
+    provider: redis
+    address: redis-master.toolhive-system.svc.cluster.local:6379 # Update to match your Redis Service location
+    db: 0
+    keyPrefix: mcp-sessions
+    passwordRef:
+      name: redis-secret
+      key: password
+```
+
+When running multiple replicas, configure
+[Redis session storage](./redis-session-storage.mdx) so that sessions are shared
+across pods. If you omit `replicas` or `backendReplicas`, the operator defers
+replica management to an HPA or other external controller.
+
+:::note
+
+The `SessionStorageWarning` condition fires only when `spec.replicas > 1`.
+Scaling only the backend (`backendReplicas > 1`) does not trigger a warning, but
+backend client-IP affinity is still unreliable behind NAT or shared egress IPs.
+
+:::
+
+:::note[Connection draining on scale-down]
+
+When a proxy runner pod is terminated (scale-in, rolling update, or node
+eviction), Kubernetes sends SIGTERM and the proxy drains in-flight requests for
+up to 30 seconds before force-closing connections. The grace period and drain
+timeout are both 30 seconds with no headroom, so long-lived SSE or streaming
+connections may be dropped if they exceed the drain window.
+
+No preStop hook is injected by the operator. If your workload requires
+additional time - for example, to let kube-proxy propagate endpoint removal
+before the pod stops accepting traffic - override
+`terminationGracePeriodSeconds` via `podTemplateSpec`:
+
+```yaml
+spec:
+  podTemplateSpec:
+    spec:
+      terminationGracePeriodSeconds: 60
+```
+
+The same 30-second default applies to the backend Deployment.
+
+:::
+
+:::warning[Stdio transport limitation]
+
+Backends using the `stdio` transport are limited to a single replica. The
+operator rejects configurations with `backendReplicas` greater than 1 for stdio
+backends.
+
+:::
+
 ## Next steps
 
 - [Connect clients to your MCP servers](./connect-clients.mdx) from outside the
@@ -457,6 +540,8 @@ kubectl -n <NAMESPACE> describe mcpserver <NAME>
 
 - [Kubernetes CRD reference](../reference/crd-spec.md#apiv1alpha1mcpserver) -
   Reference for the `MCPServer` Custom Resource Definition (CRD)
+- [vMCP scaling and performance](../guides-vmcp/scaling-and-performance.mdx) -
+  Scale Virtual MCP Server deployments
 - [Deploy the operator](./deploy-operator.mdx) - Install the ToolHive operator
 - [Build MCP containers](../guides-cli/build-containers.mdx) - Create custom MCP
   server container images
diff --git a/docs/toolhive/guides-vmcp/composite-tools.mdx b/docs/toolhive/guides-vmcp/composite-tools.mdx
@@ -19,6 +19,7 @@ backend MCP servers, handling dependencies and collecting results.
   wait for their prerequisites
 - **Template expansion**: Dynamic arguments using step outputs
 - **Elicitation**: Request user input mid-workflow (approval gates, choices)
+- **Iteration**: Loop over collections with forEach steps
 - **Error handling**: Configurable abort, continue, or retry behavior
 - **Timeouts**: Workflow and per-step timeout configuration
 
@@ -290,7 +291,7 @@ spec:
 
 ### Steps
 
-Each step can be a tool call or an elicitation:
+Each step can be a tool call, an elicitation, or a forEach loop:
 
 ```yaml title="VirtualMCPServer resource"
 spec:
@@ -344,6 +345,89 @@ spec:
             timeout: '5m'
 ```
 
+### forEach steps
+
+Iterate over a collection from a previous step's output and execute a tool call
+for each item:
+
+```yaml title="VirtualMCPServer resource"
+spec:
+  config:
+    compositeTools:
+      - name: scan_repositories
+        description: Check each repository for security advisories
+        parameters:
+          type: object
+          properties:
+            org:
+              type: string
+          required:
+            - org
+        steps:
+          - id: list_repos
+            tool: github_list_repos
+            arguments:
+              org: '{{.params.org}}'
+          # highlight-start
+          - id: check_advisories
+            type: forEach
+            collection: '{{json .steps.list_repos.output.repositories}}'
+            itemVar: repo
+            maxParallel: 5
+            step:
+              type: tool
+              tool: github_list_security_advisories
+              arguments:
+                repo: '{{.forEach.repo.name}}'
+            onError:
+              action: continue
+            dependsOn: [list_repos]
+          # highlight-end
+```
+
+**forEach fields:**
+
+| Field           | Description                                           | Default |
+| --------------- | ----------------------------------------------------- | ------- |
+| `collection`    | Template expression that resolves to a JSON array     | -       |
+| `itemVar`       | Variable name for the current item                    | item    |
+| `maxParallel`   | Maximum concurrent iterations (max 50)                | 10      |
+| `maxIterations` | Maximum total iterations (max 1000)                   | 100     |
+| `step`          | Inner step definition (tool call to execute per item) | -       |
+| `onError`       | Error handling: `abort` (stop) or `continue` (skip)   | abort   |
+
+:::note
+
+`forEach` does not support `onError.action: retry`. Use `retry` on regular tool
+steps. The `maxParallel` cap of 50 is enforced at runtime regardless of the
+configured value.
+
+:::
+
+Access the current item inside the inner step using
+`{{.forEach.<itemVar>.<field>}}`. In the example above, `{{.forEach.repo.name}}`
+accesses the `name` field of the current repository. You can also use
+`{{.forEach.index}}` to access the zero-based iteration index.
+
+`maxParallel` controls how many iterations run concurrently **on the pod that
+received the composite tool request**. Iterations are not distributed across
+vMCP replicas - all parallel backend calls originate from a single pod
+regardless of `spec.replicas`. When sizing your deployment, account for the
+per-pod fan-out: a `maxParallel: 50` forEach step can open up to 50 simultaneous
+connections to backend MCP servers from one pod. Ensure both the vMCP pod
+resources and the backend MCP servers can handle that per-pod concurrency.
+
+:::tip[Plan your workflow timeouts]
+
+With `maxIterations: 1000` and `maxParallel: 10` (the defaults), a forEach loop
+runs up to 100 serial batches. If each backend call takes a few seconds, the
+total duration can easily exceed a workflow-level timeout. Set the workflow
+`timeout` to at least
+`ceil(maxIterations / maxParallel) × expected step duration` to avoid silent
+truncation.
+
+:::
+
 ### Error handling
 
 Configure behavior when steps fail:
@@ -507,13 +591,16 @@ without defaultResults defined
 
 Access workflow context in arguments:
 
-| Template                    | Description                                |
-| --------------------------- | ------------------------------------------ |
-| `{{.params.name}}`          | Input parameter                            |
-| `{{.steps.id.output}}`      | Step output (map)                          |
-| `{{.steps.id.output.text}}` | Text content from step output              |
-| `{{.steps.id.content}}`     | Elicitation response content               |
-| `{{.steps.id.action}}`      | Elicitation action (accept/decline/cancel) |
+| Template                         | Description                                |
+| -------------------------------- | ------------------------------------------ |
+| `{{.params.name}}`               | Input parameter                            |
+| `{{.steps.id.output}}`           | Step output (map)                          |
+| `{{.steps.id.output.text}}`      | Text content from step output              |
+| `{{.steps.id.content}}`          | Elicitation response content               |
+| `{{.steps.id.action}}`           | Elicitation action (accept/decline/cancel) |
+| `{{.forEach.<itemVar>}}`         | Current forEach item                       |
+| `{{.forEach.<itemVar>.<field>}}` | Field on current forEach item              |
+| `{{.forEach.index}}`             | Zero-based iteration index                 |
 
 ### Template functions
 
diff --git a/docs/toolhive/guides-vmcp/scaling-and-performance.mdx b/docs/toolhive/guides-vmcp/scaling-and-performance.mdx
@@ -1,10 +1,13 @@
 ---
-title: Scaling and Performance
+title: Scaling and performance
 description:
   How to scale Virtual MCP Server deployments vertically and horizontally.
 ---
 
-This guide explains how to scale Virtual MCP Server (vMCP) deployments.
+This guide explains how to scale Virtual MCP Server (vMCP) deployments. For
+MCPServer scaling, see
+[Horizontal scaling](../guides-k8s/run-mcp-k8s.mdx#horizontal-scaling) in the
+Kubernetes operator guide.
 
 ## Vertical scaling
 
@@ -37,24 +40,62 @@ higher request volumes.
 
 ### How to scale horizontally
 
-The VirtualMCPServer CRD does not have a `replicas` field. The operator creates
-a Deployment named `vmcp-<NAME>` (where `<NAME>` is your VirtualMCPServer name)
-with 1 replica and preserves the replicas count, allowing you to manage scaling
-separately.
+Set the `replicas` field in your VirtualMCPServer spec to control the number of
+vMCP pods:
+
+```yaml title="VirtualMCPServer resource"
+spec:
+  replicas: 3
+```
+
+If you omit `replicas`, the operator defers replica management to an HPA or
+other external controller. You can also scale manually or with an HPA:
 
 **Option 1: Manual scaling**
 
 ```bash
-kubectl scale deployment vmcp-<vmcp-name> -n <NAMESPACE> --replicas=3
+kubectl scale deployment vmcp-<VMCP_NAME> -n <NAMESPACE> --replicas=3
 ```
 
 **Option 2: Autoscaling with HPA**
 
 ```bash
-kubectl autoscale deployment vmcp-<vmcp-name> -n <NAMESPACE> \
+kubectl autoscale deployment vmcp-<VMCP_NAME> -n <NAMESPACE> \
   --min=2 --max=5 --cpu-percent=70
 ```
 
+### Session storage for multi-replica deployments
+
+When running multiple replicas, configure Redis session storage so that sessions
+are shared across pods. Without session storage, a request routed to a different
+replica than the one that established the session will fail.
+
+```yaml title="VirtualMCPServer resource"
+spec:
+  replicas: 3
+  sessionStorage:
+    provider: redis
+    address: redis-master.toolhive-system.svc.cluster.local:6379 # Update to match your Redis Service location
+    db: 0
+    keyPrefix: vmcp-sessions
+    passwordRef:
+      name: redis-secret
+      key: password
+```
+
+See [Redis Sentinel session storage](../guides-k8s/redis-session-storage.mdx)
+for a complete Redis deployment guide.
+
+:::warning
+
+If you configure multiple replicas without session storage, the operator sets a
+`SessionStorageWarning` status condition on the resource but **still applies the
+replica count**. Pods will start, but requests routed to a replica that did not
+establish the session will fail. Ensure Redis is available before scaling beyond
+a single replica.
+
+:::
+
 ### When horizontal scaling is challenging
 
 Horizontal scaling works well for **stateless backends** (fetch, search,
@@ -63,22 +104,35 @@ read-only operations) where sessions can be resumed on any instance.
 However, **stateful backends** make horizontal scaling difficult:
 
 - **Stateful backends** (Playwright browser sessions, database connections, file
-  system operations) require requests to be routed to the same vMCP instance
-  that established the session
+  system operations) require requests to be routed to the same instance that
+  established the session
 - Session resumption may not work reliably for stateful backends
 
 The `VirtualMCPServer` CRD includes a `sessionAffinity` field that controls how
 the Kubernetes Service routes repeated client connections. By default, it uses
 `ClientIP` affinity, which routes connections from the same client IP to the
-same pod. You can configure this using the `sessionAffinity` field:
+same pod:
 
 ```yaml
 spec:
   sessionAffinity: ClientIP # default
 ```
 
-For stateful backends, vertical scaling or dedicated vMCP instances per team/use
-case are recommended instead of horizontal scaling.
+:::warning[ClientIP affinity is unreliable behind NAT or shared egress IPs]
+
+`ClientIP` affinity relies on the source IP reaching kube-proxy. When clients
+sit behind a NAT gateway, corporate proxy, or cloud load balancer (common in
+EKS, GKE, and AKS), all traffic appears to originate from the same IP - routing
+every client to the same pod and eliminating the benefit of horizontal scaling.
+This fails silently: the deployment appears healthy but only one pod handles all
+load.
+
+For stateless backends, set `sessionAffinity: None` so the Service load-balances
+freely. For stateful backends where true per-session routing is required,
+`ClientIP` affinity is a best-effort mechanism only. Prefer vertical scaling or
+a dedicated vMCP instance per team instead.
+
+:::
 
 ## Next steps
 
diff --git a/docs/toolhive/guides-vmcp/tool-aggregation.mdx b/docs/toolhive/guides-vmcp/tool-aggregation.mdx