Skip to content

Commit 395bfdc

Browse files
renovate[bot]github-actions[bot]claude
authored
Update stacklok/toolhive to v0.24.1 (#799)
* Update stacklok/toolhive to v0.24.1 Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Refresh reference assets for toolhive v0.24.1 * Document vMCP capacity limits for v0.24.1 Add a Capacity limits section to the vMCP scaling guide covering the per-pod 1,000-session LRU cache, 30-minute inactivity TTL with Redis sliding-window refresh, file-descriptor planning, Redis sizing and default timeouts, and stateful-backend data loss on pod restart. Derived from upstream PR stacklok/toolhive#5025 and verified against source at the v0.24.1 tag. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Editorial fixes to vMCP capacity limits section - Remove inaccurate claim that Redis client timeouts are tunable via sessionStorage CRD fields; the CRD does not expose them. - Rephrase two mid-sentence spaced hyphens per the project style guide. - Expand TTL acronym on first use. - Flip one passive-voice sentence to active voice. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 874a696 commit 395bfdc

3 files changed

Lines changed: 135 additions & 5 deletions

File tree

.github/upstream-projects.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ projects:
3535

3636
- id: toolhive
3737
repo: stacklok/toolhive
38-
version: v0.24.0
38+
version: v0.24.1
3939
# toolhive is a monorepo covering the CLI, the Kubernetes
4040
# operator, and the vMCP gateway. It also introduces cross-
4141
# cutting features that land in concepts/, integrations/,

docs/toolhive/guides-vmcp/scaling-and-performance.mdx

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,81 @@ a dedicated vMCP instance per team instead.
138138

139139
:::
140140

141+
## Capacity limits
142+
143+
Review these limits before planning capacity for a vMCP deployment.
144+
145+
### Per-pod session cache
146+
147+
Each vMCP pod holds a node-local LRU cache capped at **1,000 concurrent
148+
sessions**. When the cache is full, the least-recently-used session is evicted
149+
and its backend connections are closed. Any request in flight at eviction time
150+
fails, and the next request for that session ID triggers a cache miss.
151+
152+
When Redis session storage is configured, the session manager transparently
153+
rebuilds the session from stored metadata and reconnects to backends, so clients
154+
do not need to reinitialize. Without Redis, an evicted session is lost and the
155+
client must reinitialize.
156+
157+
To serve more than 1,000 concurrent sessions per replica, add vMCP replicas and
158+
configure Redis session storage. Total capacity scales as `replicas × 1,000`.
159+
160+
### Session time-to-live (TTL)
161+
162+
The vMCP server applies a **30-minute inactivity TTL** to session metadata. A
163+
session that receives no activity for 30 minutes expires, and the client must
164+
reinitialize it.
165+
166+
With Redis session storage, the TTL is a sliding window: every request
167+
atomically refreshes the key's expiry. Active sessions remain valid indefinitely
168+
as long as they receive at least one request per TTL window. There is no
169+
absolute maximum session lifetime.
170+
171+
### File descriptors
172+
173+
Each open backend connection consumes one file descriptor on the vMCP pod. A pod
174+
aggregating many MCP backends at high session concurrency can exhaust the
175+
container's `nofile` limit before hitting the 1,000-session cache cap.
176+
177+
Estimate the requirement as `concurrent_sessions × backends_per_session`, plus
178+
overhead for incoming client connections. The default Linux soft `nofile` limit
179+
is typically 1,024; raise it in the container spec or at the node level if you
180+
expect to serve hundreds of sessions aggregating multiple backends.
181+
182+
### Redis sizing
183+
184+
When you enable Redis session storage, size the Redis instance for the full
185+
fleet. Session payloads include routing tables and tool metadata. A rough
186+
estimate is 10-50 KB per session depending on backend count and tool count, with
187+
a fleet-wide maximum of `replicas × 1,000` concurrent sessions.
188+
189+
Configure Redis with the `allkeys-lru` eviction policy so Redis sheds stale
190+
sessions under memory pressure rather than returning errors on new writes. Redis
191+
persistence is not required for session storage; if the Redis instance restarts,
192+
all sessions are lost and clients must reinitialize.
193+
194+
The Redis client uses these default timeouts. They are hardcoded defaults and
195+
are not currently exposed through the VirtualMCPServer CRD.
196+
197+
| Setting | Default |
198+
| ------------- | --------- |
199+
| Dial timeout | 5 seconds |
200+
| Read timeout | 3 seconds |
201+
| Write timeout | 3 seconds |
202+
203+
### Stateful backend data loss on pod restart
204+
205+
vMCP is a stateless proxy: it holds routing tables and tool aggregation state,
206+
but backend MCP servers own their own state (browser sessions, database cursors,
207+
open files). When a vMCP pod restarts or is evicted, backend connections are
208+
torn down without a graceful MCP shutdown sequence.
209+
210+
With Redis session storage, the routing table survives and clients can
211+
reconnect. However, the new connection does not recover any backend-side state;
212+
it starts fresh. In-flight tool calls are lost without a response. Implement
213+
retry logic with idempotency guards for tool invocations that modify external
214+
state.
215+
141216
## Next steps
142217

143218
- [Explore Kubernetes operator guides](../guides-k8s/index.mdx) for managing MCP

static/api-specs/toolhive-api.yaml

Lines changed: 59 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -372,8 +372,8 @@ components:
372372
subject_token_type:
373373
description: |-
374374
SubjectTokenType specifies the type of the subject token being exchanged.
375-
Common values: tokenTypeAccessToken (default), tokenTypeIDToken, tokenTypeJWT.
376-
If empty, defaults to tokenTypeAccessToken.
375+
Common values: oauth.TokenTypeAccessToken (default), oauth.TokenTypeIDToken, oauth.TokenTypeJWT.
376+
If empty, defaults to oauth.TokenTypeAccessToken.
377377
type: string
378378
token_url:
379379
description: TokenURL is the OAuth 2.0 token endpoint URL
@@ -1176,6 +1176,13 @@ components:
11761176
K8sPodTemplatePatch is a JSON string to patch the Kubernetes pod template
11771177
Only applicable when using Kubernetes runtime
11781178
type: string
1179+
mcpserver_generation:
1180+
description: |-
1181+
MCPServerGeneration is the K8s .metadata.generation of the MCPServer CR that rendered
1182+
this RunConfig. The Kubernetes runtime uses it as a monotonic version to prevent stale
1183+
rolling-update pods from overwriting a newer RunConfig's StatefulSet apply. Zero value
1184+
means unversioned (backward-compat with older operators, or non-operator callers).
1185+
type: integer
11791186
middleware_configs:
11801187
description: |-
11811188
MiddlewareConfigs contains the list of middleware to apply to the transport
@@ -4324,12 +4331,30 @@ paths:
43244331
schema:
43254332
type: string
43264333
description: Bad Request
4334+
"401":
4335+
content:
4336+
application/json:
4337+
schema:
4338+
type: string
4339+
description: Unauthorized (registry refused credentials)
4340+
"404":
4341+
content:
4342+
application/json:
4343+
schema:
4344+
type: string
4345+
description: Not Found (artifact not present in registry)
43274346
"409":
43284347
content:
43294348
application/json:
43304349
schema:
43314350
type: string
43324351
description: Conflict
4352+
"429":
4353+
content:
4354+
application/json:
4355+
schema:
4356+
type: string
4357+
description: Too Many Requests (registry rate limit)
43334358
"500":
43344359
content:
43354360
application/json:
@@ -4341,7 +4366,13 @@ paths:
43414366
application/json:
43424367
schema:
43434368
type: string
4344-
description: Bad Gateway
4369+
description: Bad Gateway (upstream registry failure)
4370+
"504":
4371+
content:
4372+
application/json:
4373+
schema:
4374+
type: string
4375+
description: Gateway Timeout (upstream pull timed out)
43454376
summary: Install a skill
43464377
tags:
43474378
- skills
@@ -4560,6 +4591,24 @@ paths:
45604591
schema:
45614592
type: string
45624593
description: Bad Request
4594+
"401":
4595+
content:
4596+
application/json:
4597+
schema:
4598+
type: string
4599+
description: Unauthorized (registry refused credentials)
4600+
"404":
4601+
content:
4602+
application/json:
4603+
schema:
4604+
type: string
4605+
description: Not Found (artifact not present in registry)
4606+
"429":
4607+
content:
4608+
application/json:
4609+
schema:
4610+
type: string
4611+
description: Too Many Requests (registry rate limit)
45634612
"500":
45644613
content:
45654614
application/json:
@@ -4571,7 +4620,13 @@ paths:
45714620
application/json:
45724621
schema:
45734622
type: string
4574-
description: Bad Gateway
4623+
description: Bad Gateway (upstream registry or git resolver failure)
4624+
"504":
4625+
content:
4626+
application/json:
4627+
schema:
4628+
type: string
4629+
description: Gateway Timeout (upstream pull timed out)
45754630
summary: Get skill content
45764631
tags:
45774632
- skills

0 commit comments

Comments
 (0)