Skip to content

Commit fd77210

Browse files
authored
Add rate limiting guide for MCPServer (#691)
Document the rate limiting capabilities added to MCPServer in toolhive v0.18.0 Explain the token bucket algorithm and how shared vs per-user scopes work Show YAML examples for shared limits, per-user limits, combined configs, and per-tool overrides Add to the Kubernetes Operator sidebar after auth-k8s
1 parent 9d62419 commit fd77210

4 files changed

Lines changed: 209 additions & 2 deletions

File tree

docs/toolhive/guides-k8s/auth-k8s.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -901,6 +901,8 @@ kubectl logs -n toolhive-system -l app.kubernetes.io/name=weather-server-k8s
901901

902902
## Next steps
903903

904+
- [Configure rate limiting](./rate-limiting.mdx) to set per-user and shared
905+
request limits on MCP servers
904906
- [Configure token exchange](./token-exchange-k8s.mdx) to let MCP servers
905907
authenticate to backend services
906908
- [Set up audit logging](./logging.mdx) to track authentication decisions and
Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
---
2+
title: Rate limiting
3+
description:
4+
Configure per-user and shared rate limits on MCPServer resources to prevent
5+
noisy neighbors and protect downstream services.
6+
---
7+
8+
Configure token bucket rate limits on MCPServer resources to control how many
9+
tool invocations users can make. Rate limiting prevents individual users from
10+
monopolizing shared servers and protects downstream services from traffic
11+
spikes.
12+
13+
ToolHive supports two scopes of rate limiting:
14+
15+
- **Shared** limits cap total requests across all users.
16+
- **Per-user** limits cap requests independently for each authenticated user.
17+
18+
Both scopes can be applied at the server level and overridden per tool. A
19+
request must pass all applicable limits to proceed.
20+
21+
:::info[Prerequisites]
22+
23+
Before you begin, ensure you have:
24+
25+
- A Kubernetes cluster with the ToolHive Operator installed
26+
- Redis deployed in your cluster — rate limiting stores token bucket counters in
27+
Redis (see [Redis Sentinel session storage](./redis-session-storage.mdx) for
28+
deployment instructions)
29+
- For per-user limits: authentication enabled on the MCPServer (`oidcConfig`,
30+
`oidcConfigRef`, or `externalAuthConfigRef`)
31+
32+
If you need help with these prerequisites, see:
33+
34+
- [Kubernetes quickstart](./quickstart.mdx)
35+
- [Authentication and authorization](./auth-k8s.mdx)
36+
37+
:::
38+
39+
## How rate limiting works
40+
41+
Rate limits use a **token bucket** algorithm. Each bucket has a capacity
42+
(`maxTokens`) and a refill period (`refillPeriod`). The bucket starts full and
43+
each `tools/call` request consumes one token. When the bucket is empty, requests
44+
are rejected until tokens refill. The refill rate is `maxTokens / refillPeriod`
45+
tokens per second.
46+
47+
Only `tools/call` requests are rate-limited. Lifecycle methods (`initialize`,
48+
`ping`) and discovery methods (`tools/list`, `prompts/list`) pass through
49+
unconditionally.
50+
51+
When a request is rejected, the proxy returns:
52+
53+
- **HTTP 429** with a `Retry-After` header (seconds until a token is available)
54+
- A **JSON-RPC error** with code `-32029` and `retryAfterSeconds` in the error
55+
data
56+
57+
If Redis is unreachable, rate limiting **fails open** and all requests are
58+
allowed through.
59+
60+
## Configure shared rate limits
61+
62+
Shared limits apply a single token bucket across all users. Use them to cap
63+
total throughput to protect downstream services.
64+
65+
```yaml title="mcpserver-shared-ratelimit.yaml"
66+
apiVersion: toolhive.stacklok.dev/v1alpha1
67+
kind: MCPServer
68+
metadata:
69+
name: weather-server
70+
spec:
71+
image: ghcr.io/stackloklabs/weather-mcp/server
72+
transport: streamable-http
73+
sessionStorage:
74+
provider: redis
75+
address: <YOUR_REDIS_ADDRESS>
76+
# highlight-start
77+
rateLimiting:
78+
shared:
79+
maxTokens: 1000
80+
refillPeriod: 1m0s
81+
# highlight-end
82+
```
83+
84+
This allows 1,000 total `tools/call` requests per minute across all users.
85+
86+
## Configure per-user rate limits
87+
88+
Per-user limits give each authenticated user their own independent token bucket.
89+
This prevents a single user from consuming the entire server capacity.
90+
91+
Per-user limits **require authentication** to be enabled. The proxy identifies
92+
users by the `sub` claim from their JWT token.
93+
94+
```yaml title="mcpserver-peruser-ratelimit.yaml"
95+
apiVersion: toolhive.stacklok.dev/v1alpha1
96+
kind: MCPServer
97+
metadata:
98+
name: weather-server
99+
spec:
100+
image: ghcr.io/stackloklabs/weather-mcp/server
101+
transport: streamable-http
102+
oidcConfig:
103+
type: inline
104+
inline:
105+
issuer: https://my-idp.example.com
106+
audience: my-audience
107+
sessionStorage:
108+
provider: redis
109+
address: <YOUR_REDIS_ADDRESS>
110+
# highlight-start
111+
rateLimiting:
112+
perUser:
113+
maxTokens: 100
114+
refillPeriod: 1m0s
115+
# highlight-end
116+
```
117+
118+
This allows each user 100 `tools/call` requests per minute independently.
119+
120+
## Combine shared and per-user limits
121+
122+
You can configure both scopes together. A request must pass **all** applicable
123+
limits. This lets you set a per-user ceiling while also capping total server
124+
throughput.
125+
126+
```yaml title="mcpserver-combined-ratelimit.yaml"
127+
apiVersion: toolhive.stacklok.dev/v1alpha1
128+
kind: MCPServer
129+
metadata:
130+
name: weather-server
131+
spec:
132+
image: ghcr.io/stackloklabs/weather-mcp/server
133+
transport: streamable-http
134+
oidcConfig:
135+
type: inline
136+
inline:
137+
issuer: https://my-idp.example.com
138+
audience: my-audience
139+
sessionStorage:
140+
provider: redis
141+
address: <YOUR_REDIS_ADDRESS>
142+
rateLimiting:
143+
# highlight-start
144+
shared:
145+
maxTokens: 1000
146+
refillPeriod: 1m0s
147+
perUser:
148+
maxTokens: 100
149+
refillPeriod: 1m0s
150+
# highlight-end
151+
```
152+
153+
## Add per-tool overrides
154+
155+
Individual tools can have tighter limits than the server default. Per-tool
156+
limits are enforced **in addition to** server-level limits.
157+
158+
```yaml title="mcpserver-pertool-ratelimit.yaml"
159+
apiVersion: toolhive.stacklok.dev/v1alpha1
160+
kind: MCPServer
161+
metadata:
162+
name: weather-server
163+
spec:
164+
image: ghcr.io/stackloklabs/weather-mcp/server
165+
transport: streamable-http
166+
oidcConfig:
167+
type: inline
168+
inline:
169+
issuer: https://my-idp.example.com
170+
audience: my-audience
171+
sessionStorage:
172+
provider: redis
173+
address: <YOUR_REDIS_ADDRESS>
174+
rateLimiting:
175+
perUser:
176+
maxTokens: 100
177+
refillPeriod: 1m0s
178+
# highlight-start
179+
tools:
180+
- name: expensive_search
181+
perUser:
182+
maxTokens: 10
183+
refillPeriod: 1m0s
184+
- name: shared_resource
185+
shared:
186+
maxTokens: 50
187+
refillPeriod: 1m0s
188+
# highlight-end
189+
```
190+
191+
In this example:
192+
193+
- Each user can make 100 total tool calls per minute.
194+
- Each user can make at most 10 `expensive_search` calls per minute (and those
195+
also count toward the 100 server-level limit).
196+
- All users combined can make 50 `shared_resource` calls per minute.
197+
198+
## Next steps
199+
200+
- [Token exchange](./token-exchange-k8s.mdx) to configure token exchange for
201+
upstream service authentication
202+
- [CRD reference](../reference/crd-spec.md) for complete field definitions

docs/toolhive/guides-k8s/redis-session-storage.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,10 @@ when pods restart and users must re-authenticate. Redis Sentinel provides
1212
persistent storage with automatic master discovery, ACL-based access control,
1313
and optional failover when replicas are configured.
1414

15-
Redis session storage is also required for horizontal scaling when running
16-
multiple [MCPServer](./run-mcp-k8s.mdx#horizontal-scaling) or
15+
Redis is also required as the backend for [rate limiting](./rate-limiting.mdx),
16+
which stores token bucket counters in Redis independently of session data. It is
17+
also required for horizontal scaling when running multiple
18+
[MCPServer](./run-mcp-k8s.mdx#horizontal-scaling) or
1719
[VirtualMCPServer](../guides-vmcp/scaling-and-performance.mdx#session-storage-for-multi-replica-deployments)
1820
replicas, so that sessions are shared across pods.
1921

sidebars.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,7 @@ const sidebars: SidebarsConfig = {
152152
'toolhive/guides-k8s/customize-tools',
153153
'toolhive/guides-k8s/auth-k8s',
154154
'toolhive/guides-k8s/redis-session-storage',
155+
'toolhive/guides-k8s/rate-limiting',
155156
'toolhive/guides-k8s/token-exchange-k8s',
156157
'toolhive/guides-k8s/telemetry-and-metrics',
157158
'toolhive/guides-k8s/logging',

0 commit comments

Comments
 (0)