-
Notifications
You must be signed in to change notification settings - Fork 2
Add rate limiting guide for MCPServer #691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
7277156
Add rate limiting guide for MCPServer
jerm-dro 6353afb
Merge branch 'main' into jerm-dro/rate-limiting-docs
jerm-dro d4c47ed
Add inbound links and polish rate limiting guide
jerm-dro 489df8f
Address review feedback on rate limiting guide
jerm-dro File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,202 @@ | ||
| --- | ||
| title: Rate limiting | ||
| description: | ||
| Configure per-user and shared rate limits on MCPServer resources to prevent | ||
| noisy neighbors and protect downstream services. | ||
| --- | ||
|
|
||
| Configure token bucket rate limits on MCPServer resources to control how many | ||
| tool invocations users can make. Rate limiting prevents individual users from | ||
| monopolizing shared servers and protects downstream services from traffic | ||
| spikes. | ||
|
|
||
| ToolHive supports two scopes of rate limiting: | ||
|
|
||
| - **Shared** limits cap total requests across all users. | ||
| - **Per-user** limits cap requests independently for each authenticated user. | ||
|
|
||
| Both scopes can be applied at the server level and overridden per tool. A | ||
| request must pass all applicable limits to proceed. | ||
|
|
||
| :::info[Prerequisites] | ||
|
|
||
| Before you begin, ensure you have: | ||
|
|
||
| - A Kubernetes cluster with the ToolHive Operator installed | ||
| - Redis deployed in your cluster — rate limiting stores token bucket counters in | ||
| Redis (see [Redis Sentinel session storage](./redis-session-storage.mdx) for | ||
| deployment instructions) | ||
| - For per-user limits: authentication enabled on the MCPServer (`oidcConfig`, | ||
| `oidcConfigRef`, or `externalAuthConfigRef`) | ||
|
|
||
| If you need help with these prerequisites, see: | ||
|
|
||
| - [Kubernetes quickstart](./quickstart.mdx) | ||
| - [Authentication and authorization](./auth-k8s.mdx) | ||
|
|
||
| ::: | ||
|
|
||
| ## How rate limiting works | ||
|
|
||
| Rate limits use a **token bucket** algorithm. Each bucket has a capacity | ||
| (`maxTokens`) and a refill period (`refillPeriod`). The bucket starts full and | ||
| each `tools/call` request consumes one token. When the bucket is empty, requests | ||
| are rejected until tokens refill. The refill rate is `maxTokens / refillPeriod` | ||
| tokens per second. | ||
|
|
||
| Only `tools/call` requests are rate-limited. Lifecycle methods (`initialize`, | ||
| `ping`) and discovery methods (`tools/list`, `prompts/list`) pass through | ||
| unconditionally. | ||
|
|
||
| When a request is rejected, the proxy returns: | ||
|
|
||
| - **HTTP 429** with a `Retry-After` header (seconds until a token is available) | ||
| - A **JSON-RPC error** with code `-32029` and `retryAfterSeconds` in the error | ||
| data | ||
|
|
||
| If Redis is unreachable, rate limiting **fails open** and all requests are | ||
| allowed through. | ||
|
|
||
| ## Configure shared rate limits | ||
|
|
||
| Shared limits apply a single token bucket across all users. Use them to cap | ||
| total throughput to protect downstream services. | ||
|
|
||
| ```yaml title="mcpserver-shared-ratelimit.yaml" | ||
| apiVersion: toolhive.stacklok.dev/v1alpha1 | ||
| kind: MCPServer | ||
| metadata: | ||
| name: weather-server | ||
| spec: | ||
| image: ghcr.io/stackloklabs/weather-mcp/server | ||
| transport: streamable-http | ||
| sessionStorage: | ||
| provider: redis | ||
| address: <YOUR_REDIS_ADDRESS> | ||
| # highlight-start | ||
| rateLimiting: | ||
| shared: | ||
| maxTokens: 1000 | ||
| refillPeriod: 1m0s | ||
| # highlight-end | ||
| ``` | ||
|
|
||
| This allows 1,000 total `tools/call` requests per minute across all users. | ||
|
|
||
| ## Configure per-user rate limits | ||
|
|
||
| Per-user limits give each authenticated user their own independent token bucket. | ||
| This prevents a single user from consuming the entire server capacity. | ||
|
|
||
| Per-user limits **require authentication** to be enabled. The proxy identifies | ||
| users by the `sub` claim from their JWT token. | ||
|
|
||
| ```yaml title="mcpserver-peruser-ratelimit.yaml" | ||
| apiVersion: toolhive.stacklok.dev/v1alpha1 | ||
| kind: MCPServer | ||
| metadata: | ||
| name: weather-server | ||
| spec: | ||
| image: ghcr.io/stackloklabs/weather-mcp/server | ||
| transport: streamable-http | ||
| oidcConfig: | ||
| type: inline | ||
| inline: | ||
| issuer: https://my-idp.example.com | ||
| audience: my-audience | ||
| sessionStorage: | ||
| provider: redis | ||
| address: <YOUR_REDIS_ADDRESS> | ||
| # highlight-start | ||
| rateLimiting: | ||
| perUser: | ||
| maxTokens: 100 | ||
| refillPeriod: 1m0s | ||
| # highlight-end | ||
jerm-dro marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| This allows each user 100 `tools/call` requests per minute independently. | ||
|
|
||
| ## Combine shared and per-user limits | ||
|
|
||
| You can configure both scopes together. A request must pass **all** applicable | ||
| limits. This lets you set a per-user ceiling while also capping total server | ||
| throughput. | ||
|
|
||
| ```yaml title="mcpserver-combined-ratelimit.yaml" | ||
| apiVersion: toolhive.stacklok.dev/v1alpha1 | ||
| kind: MCPServer | ||
| metadata: | ||
| name: weather-server | ||
| spec: | ||
| image: ghcr.io/stackloklabs/weather-mcp/server | ||
| transport: streamable-http | ||
| oidcConfig: | ||
| type: inline | ||
| inline: | ||
| issuer: https://my-idp.example.com | ||
| audience: my-audience | ||
| sessionStorage: | ||
| provider: redis | ||
| address: <YOUR_REDIS_ADDRESS> | ||
| rateLimiting: | ||
| # highlight-start | ||
| shared: | ||
| maxTokens: 1000 | ||
| refillPeriod: 1m0s | ||
| perUser: | ||
| maxTokens: 100 | ||
| refillPeriod: 1m0s | ||
| # highlight-end | ||
jerm-dro marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| ## Add per-tool overrides | ||
|
|
||
| Individual tools can have tighter limits than the server default. Per-tool | ||
| limits are enforced **in addition to** server-level limits. | ||
|
|
||
| ```yaml title="mcpserver-pertool-ratelimit.yaml" | ||
| apiVersion: toolhive.stacklok.dev/v1alpha1 | ||
| kind: MCPServer | ||
| metadata: | ||
| name: weather-server | ||
| spec: | ||
| image: ghcr.io/stackloklabs/weather-mcp/server | ||
| transport: streamable-http | ||
| oidcConfig: | ||
| type: inline | ||
| inline: | ||
| issuer: https://my-idp.example.com | ||
| audience: my-audience | ||
| sessionStorage: | ||
| provider: redis | ||
| address: <YOUR_REDIS_ADDRESS> | ||
| rateLimiting: | ||
| perUser: | ||
| maxTokens: 100 | ||
| refillPeriod: 1m0s | ||
| # highlight-start | ||
| tools: | ||
| - name: expensive_search | ||
| perUser: | ||
| maxTokens: 10 | ||
| refillPeriod: 1m0s | ||
| - name: shared_resource | ||
| shared: | ||
| maxTokens: 50 | ||
| refillPeriod: 1m0s | ||
| # highlight-end | ||
jerm-dro marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| In this example: | ||
|
|
||
| - Each user can make 100 total tool calls per minute. | ||
| - Each user can make at most 10 `expensive_search` calls per minute (and those | ||
| also count toward the 100 server-level limit). | ||
| - All users combined can make 50 `shared_resource` calls per minute. | ||
|
|
||
| ## Next steps | ||
|
|
||
| - [Token exchange](./token-exchange-k8s.mdx) to configure token exchange for | ||
| upstream service authentication | ||
| - [CRD reference](../reference/crd-spec.md) for complete field definitions | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.