|
| 1 | +--- |
| 2 | +title: Rate limiting |
| 3 | +description: |
| 4 | + Configure per-user and shared rate limits on MCPServer resources to prevent |
| 5 | + noisy neighbors and protect downstream services. |
| 6 | +--- |
| 7 | + |
| 8 | +Configure token bucket rate limits on MCPServer resources to control how many |
| 9 | +tool invocations users can make. Rate limiting prevents individual users from |
| 10 | +monopolizing shared servers and protects downstream services from traffic |
| 11 | +spikes. |
| 12 | + |
| 13 | +ToolHive supports two scopes of rate limiting: |
| 14 | + |
| 15 | +- **Shared** limits cap total requests across all users. |
| 16 | +- **Per-user** limits cap requests independently for each authenticated user. |
| 17 | + |
| 18 | +Both scopes can be applied at the server level and overridden per tool. A |
| 19 | +request must pass all applicable limits to proceed. |
| 20 | + |
| 21 | +:::info[Prerequisites] |
| 22 | + |
| 23 | +Before you begin, ensure you have: |
| 24 | + |
| 25 | +- A Kubernetes cluster with the ToolHive Operator installed |
| 26 | +- Redis deployed in your cluster — rate limiting stores token bucket counters in |
| 27 | + Redis (see [Redis Sentinel session storage](./redis-session-storage.mdx) for |
| 28 | + deployment instructions) |
| 29 | +- For per-user limits: authentication enabled on the MCPServer (`oidcConfig`, |
| 30 | + `oidcConfigRef`, or `externalAuthConfigRef`) |
| 31 | + |
| 32 | +If you need help with these prerequisites, see: |
| 33 | + |
| 34 | +- [Kubernetes quickstart](./quickstart.mdx) |
| 35 | +- [Authentication and authorization](./auth-k8s.mdx) |
| 36 | + |
| 37 | +::: |
| 38 | + |
| 39 | +## How rate limiting works |
| 40 | + |
| 41 | +Rate limits use a **token bucket** algorithm. Each bucket has a capacity |
| 42 | +(`maxTokens`) and a refill period (`refillPeriod`). The bucket starts full and |
| 43 | +each `tools/call` request consumes one token. When the bucket is empty, requests |
| 44 | +are rejected until tokens refill. The refill rate is `maxTokens / refillPeriod` |
| 45 | +tokens per second. |
| 46 | + |
| 47 | +Only `tools/call` requests are rate-limited. Lifecycle methods (`initialize`, |
| 48 | +`ping`) and discovery methods (`tools/list`, `prompts/list`) pass through |
| 49 | +unconditionally. |
| 50 | + |
| 51 | +When a request is rejected, the proxy returns: |
| 52 | + |
| 53 | +- **HTTP 429** with a `Retry-After` header (seconds until a token is available) |
| 54 | +- A **JSON-RPC error** with code `-32029` and `retryAfterSeconds` in the error |
| 55 | + data |
| 56 | + |
| 57 | +If Redis is unreachable, rate limiting **fails open** and all requests are |
| 58 | +allowed through. |
| 59 | + |
| 60 | +## Configure shared rate limits |
| 61 | + |
| 62 | +Shared limits apply a single token bucket across all users. Use them to cap |
| 63 | +total throughput to protect downstream services. |
| 64 | + |
| 65 | +```yaml title="mcpserver-shared-ratelimit.yaml" |
| 66 | +apiVersion: toolhive.stacklok.dev/v1alpha1 |
| 67 | +kind: MCPServer |
| 68 | +metadata: |
| 69 | + name: weather-server |
| 70 | +spec: |
| 71 | + image: ghcr.io/stackloklabs/weather-mcp/server |
| 72 | + transport: streamable-http |
| 73 | + sessionStorage: |
| 74 | + provider: redis |
| 75 | + address: <YOUR_REDIS_ADDRESS> |
| 76 | + # highlight-start |
| 77 | + rateLimiting: |
| 78 | + shared: |
| 79 | + maxTokens: 1000 |
| 80 | + refillPeriod: 1m0s |
| 81 | + # highlight-end |
| 82 | +``` |
| 83 | + |
| 84 | +This allows 1,000 total `tools/call` requests per minute across all users. |
| 85 | + |
| 86 | +## Configure per-user rate limits |
| 87 | + |
| 88 | +Per-user limits give each authenticated user their own independent token bucket. |
| 89 | +This prevents a single user from consuming the entire server capacity. |
| 90 | + |
| 91 | +Per-user limits **require authentication** to be enabled. The proxy identifies |
| 92 | +users by the `sub` claim from their JWT token. |
| 93 | + |
| 94 | +```yaml title="mcpserver-peruser-ratelimit.yaml" |
| 95 | +apiVersion: toolhive.stacklok.dev/v1alpha1 |
| 96 | +kind: MCPServer |
| 97 | +metadata: |
| 98 | + name: weather-server |
| 99 | +spec: |
| 100 | + image: ghcr.io/stackloklabs/weather-mcp/server |
| 101 | + transport: streamable-http |
| 102 | + oidcConfig: |
| 103 | + type: inline |
| 104 | + inline: |
| 105 | + issuer: https://my-idp.example.com |
| 106 | + audience: my-audience |
| 107 | + sessionStorage: |
| 108 | + provider: redis |
| 109 | + address: <YOUR_REDIS_ADDRESS> |
| 110 | + # highlight-start |
| 111 | + rateLimiting: |
| 112 | + perUser: |
| 113 | + maxTokens: 100 |
| 114 | + refillPeriod: 1m0s |
| 115 | + # highlight-end |
| 116 | +``` |
| 117 | + |
| 118 | +This allows each user 100 `tools/call` requests per minute independently. |
| 119 | + |
| 120 | +## Combine shared and per-user limits |
| 121 | + |
| 122 | +You can configure both scopes together. A request must pass **all** applicable |
| 123 | +limits. This lets you set a per-user ceiling while also capping total server |
| 124 | +throughput. |
| 125 | + |
| 126 | +```yaml title="mcpserver-combined-ratelimit.yaml" |
| 127 | +apiVersion: toolhive.stacklok.dev/v1alpha1 |
| 128 | +kind: MCPServer |
| 129 | +metadata: |
| 130 | + name: weather-server |
| 131 | +spec: |
| 132 | + image: ghcr.io/stackloklabs/weather-mcp/server |
| 133 | + transport: streamable-http |
| 134 | + oidcConfig: |
| 135 | + type: inline |
| 136 | + inline: |
| 137 | + issuer: https://my-idp.example.com |
| 138 | + audience: my-audience |
| 139 | + sessionStorage: |
| 140 | + provider: redis |
| 141 | + address: <YOUR_REDIS_ADDRESS> |
| 142 | + rateLimiting: |
| 143 | + # highlight-start |
| 144 | + shared: |
| 145 | + maxTokens: 1000 |
| 146 | + refillPeriod: 1m0s |
| 147 | + perUser: |
| 148 | + maxTokens: 100 |
| 149 | + refillPeriod: 1m0s |
| 150 | + # highlight-end |
| 151 | +``` |
| 152 | + |
| 153 | +## Add per-tool overrides |
| 154 | + |
| 155 | +Individual tools can have tighter limits than the server default. Per-tool |
| 156 | +limits are enforced **in addition to** server-level limits. |
| 157 | + |
| 158 | +```yaml title="mcpserver-pertool-ratelimit.yaml" |
| 159 | +apiVersion: toolhive.stacklok.dev/v1alpha1 |
| 160 | +kind: MCPServer |
| 161 | +metadata: |
| 162 | + name: weather-server |
| 163 | +spec: |
| 164 | + image: ghcr.io/stackloklabs/weather-mcp/server |
| 165 | + transport: streamable-http |
| 166 | + oidcConfig: |
| 167 | + type: inline |
| 168 | + inline: |
| 169 | + issuer: https://my-idp.example.com |
| 170 | + audience: my-audience |
| 171 | + sessionStorage: |
| 172 | + provider: redis |
| 173 | + address: <YOUR_REDIS_ADDRESS> |
| 174 | + rateLimiting: |
| 175 | + perUser: |
| 176 | + maxTokens: 100 |
| 177 | + refillPeriod: 1m0s |
| 178 | + # highlight-start |
| 179 | + tools: |
| 180 | + - name: expensive_search |
| 181 | + perUser: |
| 182 | + maxTokens: 10 |
| 183 | + refillPeriod: 1m0s |
| 184 | + - name: shared_resource |
| 185 | + shared: |
| 186 | + maxTokens: 50 |
| 187 | + refillPeriod: 1m0s |
| 188 | + # highlight-end |
| 189 | +``` |
| 190 | + |
| 191 | +In this example: |
| 192 | + |
| 193 | +- Each user can make 100 total tool calls per minute. |
| 194 | +- Each user can make at most 10 `expensive_search` calls per minute (and those |
| 195 | + also count toward the 100 server-level limit). |
| 196 | +- All users combined can make 50 `shared_resource` calls per minute. |
| 197 | + |
| 198 | +## Next steps |
| 199 | + |
| 200 | +- [Token exchange](./token-exchange-k8s.mdx) to configure token exchange for |
| 201 | + upstream service authentication |
| 202 | +- [CRD reference](../reference/crd-spec.md) for complete field definitions |
0 commit comments