-
Notifications
You must be signed in to change notification settings - Fork 2
RFC: Rate limiting for MCP servers #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
aa1d2fc
5fa232f
1b8ae9d
5fcd7db
1cbf8f5
4a79621
dadaa2b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,246 @@ | ||
| # RFC-0057: Rate Limiting for MCP Servers | ||
|
|
||
| - **Status**: Draft | ||
| - **Author(s)**: Jeremy Drouillard (@jerm-dro) | ||
| - **Created**: 2026-03-18 | ||
| - **Last Updated**: 2026-03-19 | ||
| - **Target Repository**: toolhive | ||
| - **Related Issues**: None | ||
|
|
||
| ## Summary | ||
|
|
||
| Enable rate limiting for `MCPServer` and `VirtualMCPServer`, supporting per-user and global limits at both the server and individual operation level. Rate limits are configured declaratively on the resource spec and enforced by a new middleware in the middleware chain, with Redis as the shared counter backend. | ||
|
|
||
| ## Problem Statement | ||
|
|
||
| ToolHive currently has no mechanism to limit the rate of requests flowing through its proxy layer. This creates several risks for administrators: | ||
|
|
||
| 1. **Noisy-neighbor problem**: A single authenticated user can consume unbounded resources, degrading the experience for all other users of a shared MCP server. | ||
| 2. **Downstream overload**: Aggregate traffic spikes — even when no single user is misbehaving — can overwhelm the upstream MCP server or the external services it depends on (APIs with their own rate limits, databases, etc.). | ||
| 3. **Agent data exfiltration**: AI agents can invoke tools in tight loops to export large volumes of data. Without per-tool or per-user limits, there is no mechanism to cap this behavior. | ||
|
|
||
| These risks grow as ToolHive deployments move to shared, multi-user Kubernetes environments. Without rate limiting, cluster administrators have no knob to turn between "fully open" and "take the server offline." | ||
|
|
||
| **Scope**: This RFC targets Kubernetes-based deployments of ToolHive. | ||
|
|
||
| ## Goals | ||
|
|
||
| - Allow administrators to configure **per-user** rate limits so that no single user can monopolize a server. | ||
| - Allow administrators to configure **global** rate limits so that total throughput stays within safe bounds for downstream services. | ||
| - Allow administrators to configure rate limits **per tool**, **per prompt**, or **per resource**, so that expensive or externally-constrained operations can have tighter limits than the server default. | ||
| - Provide a consistent configuration experience across `MCPServer` and `VirtualMCPServer` resources. | ||
| - Enforce rate limits in the existing proxy middleware chain with minimal latency overhead. | ||
| - Support correct enforcement across multiple replicas using Redis as the shared counter backend. | ||
|
|
||
| ## Non-Goals | ||
|
|
||
| - **Adaptive / auto-tuning rate limits**: Automatically adjusting limits based on observed load or downstream health signals. Limits are static and administrator-configured. | ||
| - **Cost or billing integration**: Tracking usage for billing purposes. This is purely a protective mechanism. | ||
| - **Request queuing or throttling**: Requests that exceed the limit are rejected, not queued. | ||
|
|
||
| ## Proposed Solution | ||
|
|
||
| ### High-Level Design | ||
|
|
||
| Rate limiting is implemented as a new middleware in ToolHive's middleware chain. When a request arrives, the middleware checks the applicable limits (global, per-user, per-operation) and either allows the request to proceed or returns an appropriate error response. | ||
jerm-dro marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ```mermaid | ||
| flowchart LR | ||
| Client -->|request| Auth | ||
| Auth -->|identified user| RateLimit | ||
| RateLimit -->|allowed| Downstream[Remaining Middleware + MCP Server] | ||
| RateLimit -->|rejected| Error[Error Response] | ||
| ``` | ||
|
|
||
| The rate limit middleware sits after authentication (so user identity is available) and before the rest of the middleware chain. | ||
|
|
||
| **Request unit**: One token corresponds to one incoming `tools/call`, `prompts/get`, or `resources/read` invocation at the proxy. Other MCP methods (`tools/list`, `prompts/list`, `resources/list`, etc.) are not rate-limited — the motivation is to cap excessive use of operations that do real work, not to restrict discovery or listing. For `VirtualMCPServer`, rate limiting applies to incoming traffic only; outgoing calls to backends are not independently limited. | ||
|
|
||
| Rate limit counters are stored in Redis, which ToolHive already depends on. This ensures accurate enforcement across multiple replicas in horizontally-scaled deployments. | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Dual Middleware Chain Integration Undefined[ToolHive Expert] MCPServer and VirtualMCPServer have structurally different middleware chains:
The RFC must describe integration into both chains. Also, the rate limiter needs access to the parsed MCP request (to know A full middleware chain diagram (like THV-0017 provides) showing exactly where rate limiting sits relative to auth, mcp-parser, tool-filter, webhooks, telemetry, authz, and audit would be very helpful here.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good points on the ordering dependency — the rate limiter depends on the parsed MCP request from The specific integration mechanics for each middleware chain (MiddlewareFactory vs manual handler wrapping) are implementation details — the architectural constraint is that rate limiting must run after both |
||
| ### Configuration | ||
|
|
||
| Rate limits are configured via a `rateLimiting` field on the server spec. The same structure applies to both `MCPServer` and `VirtualMCPServer`. | ||
|
|
||
| #### Server-Level Limits | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Method Coverage Gaps[MCP Protocol Expert] The selected methods ( However, two issues:
Also: how does this interact with batch JSON-RPC requests? A batch containing both
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
|
||
| ```yaml | ||
| apiVersion: toolhive.stacklok.dev/v1alpha1 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Redis Connection Strategy Unspecified[ToolHive Expert, Kubernetes Expert] The RFC doesn't specify whether rate limiting creates its own Redis client, shares the session storage connection, or shares the auth server connection (THV-0035). ToolHive currently has two Redis integration points:
Creating a separate connection pool would be wasteful. Recommendation: reuse the existing Also: rate limiting is a write-heavy workload (every request does read-modify-write via Lua). Auth storage is read-heavy. If sharing Redis, this workload difference should be acknowledged. Should a dedicated Redis instance be recommended for high-throughput deployments?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question. The intent is to reuse the existing session storage Redis connection — rate limiting will validate at reconciliation time that Redis-backed session storage is configured. The write-heavy workload concern is valid but the actual volume is modest — one Lua EVAL per rate-limited request on a tiny hash. For most deployments this is negligible alongside session storage. We'll revisit a dedicated Redis option if workload separation is justified by real usage patterns. Updated the RFC to state that rate limiting reuses the session storage Redis connection and that Redis-backed session storage is a prerequisite. |
||
| kind: MCPServer | ||
| metadata: | ||
| name: my-server | ||
| spec: | ||
| # ... existing fields ... | ||
| rateLimiting: | ||
| # Global limit: total requests across all users | ||
| global: | ||
| capacity: 1000 | ||
| refillPeriodSeconds: 60 | ||
|
|
||
| # Per-user limit: applied independently to each authenticated user | ||
| perUser: | ||
| capacity: 100 | ||
| refillPeriodSeconds: 60 | ||
| ``` | ||
|
|
||
| **Validation**: Per-user rate limits require authentication to be enabled. If `perUser` limits are configured with anonymous inbound auth, the server will raise an error at startup. | ||
|
|
||
| #### Per-Operation Limits | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use Duration Type Instead of Integer Seconds[Kubernetes Expert] Replace Also:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree on the duration type — |
||
|
|
||
| Individual tools, prompts, or resources can have their own limits that supplement the server-level defaults. Per-operation limits can be either global or per-user: | ||
|
|
||
| ```yaml | ||
| spec: | ||
| rateLimiting: | ||
| perUser: | ||
| capacity: 100 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Validation Should Be at Admission Time, Not Startup[Kubernetes Expert] In the Kubernetes model, this should be caught at admission time via CEL validation rules on the CRD, not at container startup. Example: This gives immediate feedback via Additional validation needed:
Also missing: status conditions. Following established patterns (
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree that admission-time validation is better UX than startup errors. Updated the validation section to specify: (1) validation happens at admission time via CRD schema validation, (2) required rules listed at a high level (perUser requires auth, at least one of global/perUser, minimum values), and (3) a |
||
| refillPeriodSeconds: 60 | ||
|
|
||
| tools: | ||
| - name: "expensive_search" | ||
| perUser: | ||
| capacity: 10 | ||
| refillPeriodSeconds: 60 | ||
| - name: "shared_resource" | ||
| global: | ||
| capacity: 50 | ||
| refillPeriodSeconds: 60 | ||
|
|
||
| prompts: | ||
| - name: "generate_report" | ||
| perUser: | ||
| capacity: 5 | ||
| refillPeriodSeconds: 60 | ||
|
|
||
| resources: | ||
| - name: "large_dataset" | ||
| global: | ||
| capacity: 20 | ||
| refillPeriodSeconds: 60 | ||
| ``` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Atomic Multi-Limit Check Required[Security Advisor] If global, per-user, and per-operation limits are checked in separate Redis calls, a race condition exists where a request passes the per-operation check but the global counter has been exhausted by a concurrent request. This is bypass vector BV-3. The Lua script must atomically check ALL applicable limits (global + per-user + per-operation) in a single
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed this is a real concern in theory. In practice, phantom token consumption from sequential bucket checks is minimal since tokens refill continuously. We'll keep the implementation non-atomic for simplicity and revisit based on experience if the trade-off proves problematic under real workloads. |
||
|
|
||
| When an operation-level limit is defined, it is enforced **in addition to** any server-level limits. A request must pass all applicable limits. | ||
|
|
||
| The `tools`, `prompts`, and `resources` lists all follow the same structure — each entry specifies an operation name and either a `global` or `perUser` limit (or both). | ||
|
|
||
| #### VirtualMCPServer | ||
|
|
||
| The same `rateLimiting` configuration is available on `VirtualMCPServer`. Server-level `perUser` limits are based on the user identity at ingress (e.g. the `sub` claim of an OIDC token) and are shared across all backends — they cap the user's total usage of the vMCP, not per-backend. Per-operation limits target specific backend operations by name (e.g. `backend_a/costly_tool`), so those are inherently scoped to a single backend. | ||
|
|
||
| ```yaml | ||
| apiVersion: toolhive.stacklok.dev/v1alpha1 | ||
| kind: VirtualMCPServer | ||
| metadata: | ||
| name: my-vmcp | ||
| spec: | ||
| config: | ||
| rateLimiting: | ||
| perUser: | ||
| capacity: 200 | ||
| refillPeriodSeconds: 60 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. VirtualMCPServer Field Placement Is Inconsistent[Kubernetes Expert, ToolHive Expert] This places The established pattern is that K8s-native operational concerns live on the spec directly (e.g., Recommendation: Place
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right — placing it inside |
||
| tools: | ||
| - name: "backend_a/costly_tool" | ||
| perUser: | ||
| capacity: 5 | ||
| refillPeriodSeconds: 60 | ||
|
|
||
| prompts: | ||
jerm-dro marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - name: "backend_b/heavy_prompt" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tool Naming Convention Conflict + Two-Phase Check Problem[ToolHive Expert, MCP Protocol Expert] Two issues with
Additionally: what happens when a tool exists on multiple backends (name collision after aggregation)? And what about interaction with the vMCP optimizer — should limits apply to
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| global: | ||
| capacity: 30 | ||
| refillPeriodSeconds: 60 | ||
| ``` | ||
|
|
||
| ### Algorithm: Token Bucket | ||
|
|
||
| Rate limits are enforced using a **token bucket** algorithm. The configuration maps directly onto it: | ||
|
|
||
| - `capacity` = bucket capacity (maximum tokens) | ||
| - `refillPeriodSeconds` = time in seconds to fully refill the bucket from zero | ||
| - Refill rate = `capacity / refillPeriodSeconds` tokens per second | ||
|
|
||
| Each token represents a single allowed request. The bucket starts full, refills at a steady rate, and each incoming request consumes one token. When the bucket is empty, requests are rejected. | ||
|
|
||
| Per-user limits work identically — each user gets their own bucket, keyed by identity. Redis keys follow a structure like: | ||
|
|
||
| - Global: `thv:rl:{server}:global` | ||
| - Global per-tool: `thv:rl:{server}:global:tool:{toolName}` | ||
| - Global per-prompt: `thv:rl:{server}:global:prompt:{promptName}` | ||
| - Global per-resource: `thv:rl:{server}:global:resource:{resourceName}` | ||
| - Per-user: `thv:rl:{server}:user:{userId}` | ||
jerm-dro marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - Per-user per-tool: `thv:rl:{server}:user:{userId}:tool:{toolName}` | ||
| - Per-user per-prompt: `thv:rl:{server}:user:{userId}:prompt:{promptName}` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Namespace Missing from Redis Key Schema[Security Advisor, Kubernetes Expert] THV-0035 established the convention If an MCPServer named "foo" in namespace "team-a" and a VirtualMCPServer named "foo" in namespace "team-b" share the same Redis, their rate limit counters would collide. Must be: Also:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch — namespace must be included to prevent cross-namespace collisions. Updated all key patterns to On TTL: keys auto-expire at On key derivation: confirmed — |
||
| - Per-user per-resource: `thv:rl:{server}:user:{userId}:resource:{resourceName}` | ||
|
|
||
| Each bucket is stored as a Redis hash with two fields: token count and last refill timestamp. Refill is lazy — there is no background process. When a request arrives, an atomic Lua script calculates how many tokens should have accumulated since the last access based on elapsed time, adds them (capped at capacity), and then attempts to consume one. This ensures no race conditions across replicas. Keys auto-expire when inactive, so no garbage collection is needed. | ||
|
|
||
| Storage is **O(1) per counter** (two fields per hash). For example, 500 users with 10 per-operation limits = 5,000 hashes — negligible for Redis. | ||
|
|
||
| **Burst behavior**: An idle user accumulates tokens up to the bucket capacity. This means they can send a full burst of `capacity` requests at once after a period of inactivity. This is intentional — it handles legitimate traffic spikes — but administrators should understand it when setting capacity. | ||
|
|
||
| ### Redis Unavailability | ||
|
|
||
| If Redis is unreachable, the middleware **fails open** — all requests are allowed through. This ensures a Redis hiccup does not become a full MCP outage. When this occurs, the middleware emits a structured log entry and increments a `rate_limit_redis_unavailable` metric so that operators can detect and alert on Redis health independently. | ||
|
|
||
| ### Rejection Behavior | ||
|
|
||
jerm-dro marked this conversation as resolved.
Show resolved
Hide resolved
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lua Script: Clock Skew and Atomicity Requirements[Security Advisor] Two concerns with the Lua script:
Also: should the script be loaded via The RFC should include the actual Lua script (or pseudocode) since correctness of this atomic operation is critical to the entire design.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good points on clock skew and key initialization. Agree at a high level — the Lua script should use |
||
| When a request is rate-limited, the middleware returns an MCP-appropriate error response. The middleware includes a `Retry-After` value computed as `1 / refill_rate` from the most restrictive bucket that caused the rejection. This value is a **best-effort lower bound**, not a guarantee — particularly for global limits, where other users may consume the next available token before this client retries. | ||
|
|
||
| ## Security Considerations | ||
|
|
||
| ### Threat Model | ||
|
|
||
| | Threat | Likelihood | Impact | Mitigation | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fail-Open Needs More Design Work[Security Advisor, SRE] Three concerns:
Also: what happens when Redis recovers? Log that connectivity is restored (INFO level).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Keeping it simple for now: fail-open only. Adding a configurable Updated the RFC to specify state-transition logging (log first failure and recovery, not every request) to avoid log noise during sustained outages. Also added a note that Sentinel failover resets bucket state — all users receive fresh burst allowance. This is expected behavior since bucket state is ephemeral. |
||
| |--------|-----------|--------|------------| | ||
| | Redis key injection via crafted operation names | Medium | High | Validate and sanitize operation names before constructing Redis keys. Reject names containing key-separator characters. | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Observability Is Severely Underspecified[SRE] A single metric is not sufficient to operate a rate limiting system in production. Required additions: Counters (must-have):
Histograms (must-have):
Naming: The current Tracing: Add span attributes on the existing request span: Alerting: The RFC should document recommended alerts — at minimum
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair feedback — the observability section was too thin. Added a high-level Observability subsection listing the key metric categories (decision counters, Redis error counters, fail-open counters, check latency histogram) and span attributes. Also fixed the metric naming to follow the Specific metric names, label cardinality, and alert definitions will be sorted out at implementation time. |
||
| | Redis as single point of compromise | Low | High | Require Redis authentication and TLS. Reuse the existing Redis security posture established in [THV-0035](./THV-0035-auth-server-redis-storage.md). | | ||
| | Rate limit bypass via identity spoofing | Low | Medium | Rate limiting relies on upstream auth middleware. It is only as strong as the authentication layer. | | ||
| | Denial of service via Redis exhaustion | Low | Medium | Keys auto-expire when inactive. Storage is O(1) per counter, bounding memory usage. | | ||
| | Unauthenticated DoS | Medium | High | Rate limiting sits after the auth middleware, so unauthenticated requests are rejected before reaching rate limit evaluation. Global limits only count authenticated traffic. | | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. JSON-RPC Error Response Must Be Specified[MCP Protocol Expert] "MCP-appropriate error response" is undefined. MCP uses JSON-RPC 2.0 — the RFC must specify:
{
"jsonrpc": "2.0",
"id": 2,
"error": {
"code": -32029,
"message": "Rate limit exceeded",
"data": {
"retryAfterSeconds": 0.6,
"limitType": "perUser"
}
}
}
The IETF RFC 6585 reference is only relevant for the HTTP transport layer — consider also referencing JSON-RPC 2.0 spec for error handling.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed — this defines the client-facing contract. Replaced the vague "MCP-appropriate error response" with a concrete Rejection Response Format section specifying:
Also added JSON-RPC 2.0 spec to references. |
||
| ### Data Security | ||
|
|
||
| No sensitive data is stored in Redis — only token counts and last-refill timestamps. User IDs appear in Redis key names but carry no additional PII. | ||
|
|
||
| ### Input Validation | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Threat Model Has Gaps[Security Advisor] Missing threats to add:
Also: the "Redis key injection" row should distinguish inherent vs residual risk (post-mitigation likelihood should be Low given the allowlist).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Several of these are already addressed by other changes in this push:
For the remaining items (slowloris, Redis DoS, user ID aliasing) — these are outside the scope of rate limiting and addressed by other layers. Added a "Scope and Limitations" note to Security Considerations acknowledging that rate limiting operates at the MCP method level; transport-level attacks, Redis availability, and IDP identity consistency are outside its scope. |
||
| Operation names used to construct Redis keys must be validated and sanitized to prevent key injection. Names should be checked against a strict allowlist of characters (alphanumeric, hyphens, underscores, forward slashes for vMCP backend-prefixed names). | ||
|
|
||
| ### Audit and Logging | ||
|
|
||
| Rate-limit rejections are logged with user identity, operation name, and which limit was hit (global, per-user, per-operation). Token counts and refill timestamps are not included in log output. | ||
|
|
||
| ## Alternatives Considered | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TLS Contradiction with THV-0035[Security Advisor] This row says "Require Redis authentication and TLS" but THV-0035 explicitly lists TLS as a non-goal: "TLS/mTLS configuration for Redis connections — future enhancement if needed." THV-0053 resolved this with "service mesh only" for TLS. Either align with THV-0035's position ("no TLS, rely on service mesh") and state that explicitly, or require TLS — but don't contradict the existing architectural decision. Also: the RFC should specify that Redis ACLs block
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch on the TLS contradiction. Updated to align with the existing Redis security posture — rate limiting shares the session storage Redis connection and follows the same security approach (authentication, network-level isolation). |
||
|
|
||
| ### Sliding Window Log | ||
|
|
||
| The original proposal used a sliding window log algorithm, which tracks each request timestamp in a sorted set and counts entries within the window. This was replaced with a **token bucket** for the following reasons: | ||
|
|
||
jerm-dro marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - **Burst handling**: Token bucket naturally allows legitimate bursts after idle periods, while sliding window enforces a strict ceiling regardless of usage pattern. | ||
| - **Simpler storage**: Token bucket requires a single Redis hash (two fields) per counter, compared to a sorted set with one entry per request for sliding window — significantly less memory under high throughput. | ||
| - **Atomic operations**: The token bucket check-and-decrement is a single Lua script operating on two fields, reducing Redis round-trip complexity. | ||
|
|
||
| ## References | ||
|
|
||
| - [THV-0017: Dynamic Webhook Middleware](./THV-0017-dynamic-webhook-middleware.md) — mentions rate limiting as an external webhook use case | ||
| - [THV-0047: Horizontal Scaling for vMCP and Proxy Runner](./THV-0047-vmcp-proxyrunner-horizontal-scaling.md) — relevant to distributed rate limiting concerns | ||
| - [THV-0035: Auth Server Redis Storage](./THV-0035-auth-server-redis-storage.md) — existing Redis dependency | ||
| - [IETF RFC 6585](https://tools.ietf.org/html/rfc6585) — HTTP 429 Too Many Requests status code | ||
|
|
||
| --- | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only One Alternative Considered[Tech Lead] The RFC should evaluate at least one or two more alternatives:
Documenting why these were rejected strengthens the case for the chosen approach.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair point. Added three more alternatives: Envoy/Istio (can't do MCP-aware enforcement), webhook-based via THV-0017 (adds latency on hot path), and fixed window counters (boundary burst problem). |
||
| ## RFC Lifecycle | ||
|
|
||
| <!-- This section is maintained by RFC reviewers --> | ||
|
|
||
| ### Review History | ||
|
|
||
| | Date | Reviewer | Decision | Notes | | ||
| |------|----------|----------|-------| | ||
| | 2026-03-18 | @jerm-dro | Draft | Initial submission | | ||
|
|
||
| ### Implementation Tracking | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missing RFC Sections[Tech Lead] Compared to peer RFCs (THV-0035, THV-0017, THV-0053) and the RFC template, these mandatory or expected sections are absent:
Recommend using THV-0035 and THV-0017 as reference points for the expected level of detail.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Acknowledged. This is a smaller-scope RFC — |
||
|
|
||
| | Repository | PR | Status | | ||
| |------------|-----|--------| | ||
| | toolhive | TBD | Not started | | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MCPRemoteProxy Not Addressed
[Kubernetes Expert]
MCPRemoteProxyalso proxies MCP traffic through the middleware chain and would benefit from the same rate limiting. The RFC should either explicitly include it or document why it's excluded.Also — should in-memory rate limiting be available for non-K8s / single-replica deployments? The session storage system already has this dual-mode pattern (memory vs Redis). Scoping exclusively to K8s means CLI-mode users get no rate limiting at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch on
MCPRemoteProxy— it shares the sameMiddlewareFactoryregistry and transparent proxy chain asMCPServer, so integrating it should be straightforward. That said, the priorities for this RFC areVirtualMCPServerandMCPServer. I've added a Plan section noting thatMCPRemoteProxysupport is deferred but can be integrated as a follow-up since the middleware architecture is the same.On in-memory rate limiting for CLI mode — same answer: deferred. The primary motivation is multi-user K8s environments where noisy-neighbor and exfiltration risks are most acute.