Skip to content

RFC: Rate limiting for MCP servers#57

Merged
jerm-dro merged 7 commits intomainfrom
jerm/2026-03-18-rate-limiting
Apr 1, 2026
Merged

RFC: Rate limiting for MCP servers#57
jerm-dro merged 7 commits intomainfrom
jerm/2026-03-18-rate-limiting

Conversation

@jerm-dro
Copy link
Copy Markdown
Contributor

@jerm-dro jerm-dro commented Mar 18, 2026

Summary

Adds an RFC for configurable rate limiting in ToolHive's MCPServer and VirtualMPCServer.

Scope:

  • Per-user and global rate limits at the server level
  • Per-operation limits for individual tools, prompts, and resources
  • Consistent config across MCPServer and VirtualMCPServer
  • Redis-backed shared counters for multi-replica correctness

This is an early draft focused on problem definition, user-facing configuration, and open questions — not implementation details.

🤖 Generated with Claude Code

jerm-dro and others added 4 commits March 18, 2026 15:05
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jerm-dro jerm-dro requested a review from Derek2Tu March 18, 2026 22:53
Copy link
Copy Markdown
Contributor

@JAORMX JAORMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the windowing section is the one part of this RFC that I think needs to be pinned down before we move forward. Right now it reads as "we'll figure it out during implementation"... but the algorithm choice affects burst behavior, storage, and how operators should reason about the limits they configure. Worth deciding here.

I'd suggest going with a token bucket. Here's why:

Your config schema (requestsPerWindow + windowSeconds) already maps to it naturally. A token bucket has two parameters: a refill rate (requestsPerWindow / windowSeconds) and a burst capacity (requestsPerWindow). Each token represents a single allowed request. The bucket starts full, refills at a steady rate, and when a request comes in it takes one token out. Empty bucket? Request rejected.

So "100 requests per 60 seconds" means: bucket holds 100 tokens, refills at ~1.67 tokens/second. The operator doesn't need to think about buckets or tokens at all... the config they already write just works.

Per-user limiting works the same way. Each user just gets their own bucket, keyed by identity. The algorithm is identical, only the Redis key changes:

  • Global limit: thv:rl:{server}:global
  • Per-user limit: thv:rl:{server}:user:{userId}
  • Per-user per-tool: thv:rl:{server}:user:{userId}:tool:{toolName}

In Redis, each bucket is a hash with two fields: token count and last refill timestamp. The check-and-decrement runs as a single atomic operation in Redis, so no race conditions across replicas.

Storage is O(1) per counter (two fields per hash). Total footprint is O(users × limits). So 500 users with 10 per-operation limits = 5,000 tiny hashes... basically nothing for Redis. Keys auto-expire when inactive too, so no garbage collection needed.

Compare that to a sliding window log, which stores a sorted set entry per request and is O(requests) per counter. That's the one to avoid.

Fixed window counters are simpler (single integer, also O(1)), but they have the classic boundary burst problem where a user can fire 2x their limit by timing requests across a window edge. Token bucket avoids that without adding real complexity.

A few things in the RFC that would need updating if you go with this:

  • The "Windowing" section should be reframed. Token bucket is continuous refill, not windowing. The current language about "approximate windowing" wouldn't be accurate anymore.
  • Burst behavior should be documented. An idle user can send a full burst of requestsPerWindow requests at once, since their bucket is full. That's a feature (it handles legitimate traffic spikes), but operators should understand it so they set capacity accordingly.
  • Retry-After becomes easy to compute. With token bucket you know exactly when the next token arrives (1 / refill_rate). The rejection behavior section is vague on this right now... worth noting that the algorithm gives you precise Retry-After values for free.

jerm-dro and others added 2 commits March 19, 2026 08:27
- Adopt token bucket algorithm per Ozz's review (replaces vague windowing section)
- Document burst behavior, Redis key structure, and precise Retry-After
- Clarify scope is Kubernetes-based deployments (Derek's question)
- Replace "operators" with "administrators" throughout (Derek's question)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jerm-dro
Copy link
Copy Markdown
Contributor Author

@JAORMX Thank you for the suggestion. I've updated the algorithm / solution section to use the proposed algorithm.

@JAORMX
Copy link
Copy Markdown
Contributor

JAORMX commented Mar 20, 2026

Hey @jerm-dro, two things I'd like to see before we move this forward:

The security considerations section is missing entirely, and it's a required one per the template. Rate limiting sits in the request hot-path and makes accept/reject decisions on every request... we need a threat model here. Think Redis as an attack surface, input validation on operation names used in Redis keys, audit logging for rejections (especially given agent exfiltration is a motivating threat), and Redis unavailability semantics (probably should be configurable like THV-0017's failure_policy).

Also, there's no alternatives considered section. At minimum, let's capture the algorithm discussion we already had. You originally proposed a sliding window approach and we went with token bucket instead. That decision and the reasoning behind it should live in the RFC so future readers have context.

Everything else is looking good. The token bucket write-up and additive limit semantics are solid.

Copy link
Copy Markdown
Contributor

@yrobla yrobla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed RFC-0057. Two blockers, two suggestions, two questions, and one nitpick — comments inline below.

@jerm-dro
Copy link
Copy Markdown
Contributor Author

@JAORMX Added both sections:

Security Considerations — includes a threat model table (Redis key injection, Redis compromise, identity spoofing, Redis exhaustion, unauthenticated DoS), plus Data Security, Input Validation, and Audit and Logging subsections. Cross-references THV-0035 for existing Redis security posture. Global rate limits are enforced post-authentication so unauthenticated requests never reach rate limit evaluation.

Alternatives Considered — documents the sliding window log → token bucket decision with rationale (burst handling, simpler storage, atomic operations).

Also addressed all of @yrobla's review comments in this push — renamed config fields to match the token bucket algorithm, clarified vMCP semantics, added missing Redis key patterns, documented Retry-After as best-effort, and resolved the Redis unavailability open question (fail open with observability).

@jerm-dro jerm-dro marked this pull request as ready for review March 30, 2026 20:12
@jerm-dro jerm-dro requested review from JAORMX and yrobla March 30, 2026 20:12
- Define request unit (tools/call, prompts/get, resources/read only)
- Rename requestsPerWindow/windowSeconds to capacity/refillPeriodSeconds
- Clarify vMCP perUser limits are per ingress identity across all backends
- Add missing global per-operation and per-user per-operation Redis key patterns
- Document Retry-After as best-effort lower bound
- Resolve Redis unavailability: fail open with structured log + metric
- Add Security Considerations section with threat model
- Add Alternatives Considered section (sliding window vs token bucket)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jerm-dro jerm-dro merged commit b664fb4 into main Apr 1, 2026
1 check passed
Copy link
Copy Markdown
Contributor

@ChrisJBurns ChrisJBurns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Agent RFC Review

Reviewed from 6 specialist perspectives: Kubernetes Expert, Security Advisor, MCP Protocol Expert, SRE, ToolHive Expert, and Tech Lead Orchestrator — using the agent roles defined in toolhive/.claude/agents.

Overall Verdict: Needs Major Revision

The RFC identifies a real, important problem and makes sound top-level architectural choices (token bucket + Redis + middleware). However, compared to peer RFCs in this repo (THV-0035, THV-0017, THV-0053), it reads as an outline rather than a complete design. Multiple mandatory sections are missing, and several architectural details need resolution before implementation can begin.

The core idea is needed — the MCP spec itself mandates tool call rate limiting. The RFC needs a second pass to fill in the detailed design, CRD types, error format, observability plan, and the missing sections called out in the inline comments below.

**Scope**: This RFC targets Kubernetes-based deployments of ToolHive.

## Goals

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MCPRemoteProxy Not Addressed

[Kubernetes Expert] MCPRemoteProxy also proxies MCP traffic through the middleware chain and would benefit from the same rate limiting. The RFC should either explicitly include it or document why it's excluded.

Also — should in-memory rate limiting be available for non-K8s / single-replica deployments? The session storage system already has this dual-mode pattern (memory vs Redis). Scoping exclusively to K8s means CLI-mode users get no rate limiting at all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on MCPRemoteProxy — it shares the same MiddlewareFactory registry and transparent proxy chain as MCPServer, so integrating it should be straightforward. That said, the priorities for this RFC are VirtualMCPServer and MCPServer. I've added a Plan section noting that MCPRemoteProxy support is deferred but can be integrated as a follow-up since the middleware architecture is the same.

On in-memory rate limiting for CLI mode — same answer: deferred. The primary motivation is multi-user K8s environments where noisy-neighbor and exfiltration risks are most acute.

**Request unit**: One token corresponds to one incoming `tools/call`, `prompts/get`, or `resources/read` invocation at the proxy. Other MCP methods (`tools/list`, `prompts/list`, `resources/list`, etc.) are not rate-limited — the motivation is to cap excessive use of operations that do real work, not to restrict discovery or listing. For `VirtualMCPServer`, rate limiting applies to incoming traffic only; outgoing calls to backends are not independently limited.

Rate limit counters are stored in Redis, which ToolHive already depends on. This ensures accurate enforcement across multiple replicas in horizontally-scaled deployments.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dual Middleware Chain Integration Undefined

[ToolHive Expert] MCPServer and VirtualMCPServer have structurally different middleware chains:

  • MCPServer: Uses MiddlewareFactory registry in pkg/runner/middleware.go (PopulateMiddlewareConfigs()GetSupportedMiddlewareFactories()). Chain applied in reverse in pkg/transport/proxy/transparent/transparent_proxy.go.
  • VirtualMCPServer: Manually wraps http.Handler in pkg/vmcp/server/server.go's Handler() method. Does NOT use the MiddlewareFactory pattern.

The RFC must describe integration into both chains.

Also, the rate limiter needs access to the parsed MCP request (to know tools/call vs tools/list). The mcp-parser middleware stores ParsedMCPRequest in context, so it must run before the rate limiter. This ordering dependency should be explicit.

A full middleware chain diagram (like THV-0017 provides) showing exactly where rate limiting sits relative to auth, mcp-parser, tool-filter, webhooks, telemetry, authz, and audit would be very helpful here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points on the ordering dependency — the rate limiter depends on the parsed MCP request from mcp-parser to distinguish tools/call from tools/list, and on auth for user identity. I've updated the RFC to make this ordering constraint explicit and updated the diagram to show Auth → MCP Parser → RateLimit → [remaining chain].

The specific integration mechanics for each middleware chain (MiddlewareFactory vs manual handler wrapping) are implementation details — the architectural constraint is that rate limiting must run after both auth and mcp-parser regardless of chain structure, and I've captured that.


Rate limits are configured via a `rateLimiting` field on the server spec. The same structure applies to both `MCPServer` and `VirtualMCPServer`.

#### Server-Level Limits
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method Coverage Gaps

[MCP Protocol Expert] The selected methods (tools/call, prompts/get, resources/read) are correct — the MCP spec's own Security Considerations explicitly states "Servers MUST rate limit tool invocations." Good.

However, two issues:

  1. completion/complete is missing. The MCP spec says "Servers SHOULD rate limit completion requests" and "Clients SHOULD debounce rapid completion requests." Either include it or explicitly document why it's excluded.

  2. The RFC should enumerate methods that are explicitly NEVER rate-limited, with rationale:

    • initialize / initialized — blocking these prevents session establishment entirely
    • ping — spec says receivers MUST respond promptly; rate-limiting pings causes false connection-death detection
    • All notifications/* — fire-and-forget, no response to reject
    • All server-to-client methods (sampling/createMessage, elicitation/create)
    • resources/subscribe / resources/unsubscribe — worth calling out explicitly since unbounded subscriptions could be a resource exhaustion vector even if not rate-limited

Also: how does this interact with batch JSON-RPC requests? A batch containing both tools/call and tools/list — is the rate limit applied per-call-in-batch or per-batch?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. completion/complete: Acknowledged, but keeping this out of scope for now. The primary risk vectors are tools/call, prompts/get, and resources/read. Completions can be added in a follow-up if needed. Added to Non-Goals.

  2. Explicit exclusion list: Lifecycle and discovery methods (initialize, ping, notifications, list methods) are not rate-limited as they don't perform substantive work. Added a one-liner to the RFC.

  3. Batch JSON-RPC: MCP transports don't use JSON-RPC batching in practice, but we should verify this isn't a bypass vector. Added a note to include an acceptance test confirming batch requests cannot circumvent rate limits.

#### Server-Level Limits

```yaml
apiVersion: toolhive.stacklok.dev/v1alpha1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redis Connection Strategy Unspecified

[ToolHive Expert, Kubernetes Expert] The RFC doesn't specify whether rate limiting creates its own Redis client, shares the session storage connection, or shares the auth server connection (THV-0035).

ToolHive currently has two Redis integration points:

  • Auth Server (THV-0035): redis.NewFailoverClient with Sentinel-only, ACL user auth
  • Session Storage: pkg/transport/session/redis_config.go, supports both standalone and Sentinel

Creating a separate connection pool would be wasteful. Recommendation: reuse the existing SessionStorageConfig Redis connection and validate at reconciliation time that Redis-backed session storage is configured when rateLimiting is enabled.

Also: rate limiting is a write-heavy workload (every request does read-modify-write via Lua). Auth storage is read-heavy. If sharing Redis, this workload difference should be acknowledged. Should a dedicated Redis instance be recommended for high-throughput deployments?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The intent is to reuse the existing session storage Redis connection — rate limiting will validate at reconciliation time that Redis-backed session storage is configured. The write-heavy workload concern is valid but the actual volume is modest — one Lua EVAL per rate-limited request on a tiny hash. For most deployments this is negligible alongside session storage. We'll revisit a dedicated Redis option if workload separation is justified by real usage patterns.

Updated the RFC to state that rate limiting reuses the session storage Redis connection and that Redis-backed session storage is a prerequisite.


**Validation**: Per-user rate limits require authentication to be enabled. If `perUser` limits are configured with anonymous inbound auth, the server will raise an error at startup.

#### Per-Operation Limits
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Duration Type Instead of Integer Seconds

[Kubernetes Expert] refillPeriodSeconds: 3600 is less readable than refillPeriod: "1h". The project already has a config.Duration type in pkg/vmcp/config/config.go that serializes as Go duration strings, and Kubernetes conventions prefer metav1.Duration for API types.

Replace refillPeriodSeconds (int) with refillPeriod (Duration) for consistency.

Also: capacity is ambiguous in Kubernetes API context (could be confused with storage/resource capacity). Consider renaming to maxTokens or bucketSize. Either way, add +kubebuilder:validation:Minimum=1 markers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on the duration type — refillPeriod: "1h" is more readable and consistent with existing ToolHive conventions. Renamed throughout: refillPeriodSecondsrefillPeriod (Duration), capacitymaxTokens. All YAML examples updated to use duration strings. Added a note that both fields require minimum-value validation.

| Rate limit bypass via identity spoofing | Low | Medium | Rate limiting relies on upstream auth middleware. It is only as strong as the authentication layer. |
| Denial of service via Redis exhaustion | Low | Medium | Keys auto-expire when inactive. Storage is O(1) per counter, bounding memory usage. |
| Unauthenticated DoS | Medium | High | Rate limiting sits after the auth middleware, so unauthenticated requests are rejected before reaching rate limit evaluation. Global limits only count authenticated traffic. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON-RPC Error Response Must Be Specified

[MCP Protocol Expert] "MCP-appropriate error response" is undefined. MCP uses JSON-RPC 2.0 — the RFC must specify:

  1. Error code: Use a code in the -32000 to -32099 range (reserved for implementation-defined server errors). MCP already uses -32002 and -32042. Suggest -32029. Do NOT use -32603 (internal error) — a rate limit is a deliberate policy rejection, not an internal error.

  2. Response structure:

{
  "jsonrpc": "2.0",
  "id": 2,
  "error": {
    "code": -32029,
    "message": "Rate limit exceeded",
    "data": {
      "retryAfterSeconds": 0.6,
      "limitType": "perUser"
    }
  }
}
  1. Retry-After location: Must go in error.data, NOT HTTP headers — stdio and SSE transports have no HTTP layer. The error.data field is the canonical location regardless of transport.

  2. Protocol error vs tool error: This is a protocol-level rejection (the tool was never invoked), so it must be a JSON-RPC error response, NOT a tool result with isError: true. This distinction is critical for MCP clients.

  3. HTTP 429: For Streamable HTTP transport, also return HTTP 429 with Retry-After header as a supplementary transport-level signal (helps proxies/LBs), but the JSON-RPC error is authoritative.

The IETF RFC 6585 reference is only relevant for the HTTP transport layer — consider also referencing JSON-RPC 2.0 spec for error handling.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — this defines the client-facing contract. Replaced the vague "MCP-appropriate error response" with a concrete Rejection Response Format section specifying:

  • JSON-RPC error code -32029 (implementation-defined server error range)
  • Retry-After in error.data (transport-agnostic, works across stdio/SSE/Streamable HTTP)
  • Explicit distinction: this is a protocol-level rejection (error), not a tool result with isError: true
  • HTTP 429 with Retry-After header as supplementary signal for Streamable HTTP
  • Example JSON-RPC error response included

Also added JSON-RPC 2.0 spec to references.


Rate-limit rejections are logged with user identity, operation name, and which limit was hit (global, per-user, per-operation). Token counts and refill timestamps are not included in log output.

## Alternatives Considered
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLS Contradiction with THV-0035

[Security Advisor] This row says "Require Redis authentication and TLS" but THV-0035 explicitly lists TLS as a non-goal: "TLS/mTLS configuration for Redis connections — future enhancement if needed." THV-0053 resolved this with "service mesh only" for TLS.

Either align with THV-0035's position ("no TLS, rely on service mesh") and state that explicitly, or require TLS — but don't contradict the existing architectural decision.

Also: the RFC should specify that Redis ACLs block MONITOR, SLOWLOG, CLIENT LIST, and DEBUG commands — user IDs appear in key names and could be exposed via these commands. Consider a separate ACL user for rate limiting with ~thv:rl:* key pattern restriction to limit blast radius if credentials are compromised.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the TLS contradiction. Updated to align with the existing Redis security posture — rate limiting shares the session storage Redis connection and follows the same security approach (authentication, network-level isolation).

No sensitive data is stored in Redis — only token counts and last-refill timestamps. User IDs appear in Redis key names but carry no additional PII.

### Input Validation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threat Model Has Gaps

[Security Advisor] Missing threats to add:

  1. TOCTOU race in Lua script (High): If implementation uses separate GET then DECR instead of single EVAL, concurrent requests bypass limits. Must mandate single atomic EVAL.

  2. Distributed clock skew (Medium): Replicas using local time.Now() instead of Redis server time get inconsistent refill rates.

  3. Cross-namespace key collision (Medium): Key schema omits namespace — servers with same name in different namespaces share counters.

  4. Slowloris / connection exhaustion (Medium): Rate limiting counts completed operations. An attacker holding open many slow connections ties up resources without triggering rate limits.

  5. Deliberate Redis DoS to disable rate limiting (Medium): If an attacker can make Redis unavailable, fail-open policy disables all rate limiting. The threat model mentions Redis exhaustion but frames it as storage, not as a strategy to disable rate limiting itself.

  6. User ID aliasing (Medium): If the IDP issues different sub claims for the same human (re-provisioned account, multiple registrations), one person consumes N× the per-user limit.

Also: the "Redis key injection" row should distinguish inherent vs residual risk (post-mitigation likelihood should be Low given the allowlist).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several of these are already addressed by other changes in this push:

  • TOCTOU race: Covered by the atomic Lua script design.
  • Clock skew: Covered by using redis.call('TIME').
  • Cross-namespace collision: Covered by adding namespace to key schema.

For the remaining items (slowloris, Redis DoS, user ID aliasing) — these are outside the scope of rate limiting and addressed by other layers. Added a "Scope and Limitations" note to Security Considerations acknowledging that rate limiting operates at the MCP method level; transport-level attacks, Redis availability, and IDP identity consistency are outside its scope.

- [IETF RFC 6585](https://tools.ietf.org/html/rfc6585) — HTTP 429 Too Many Requests status code

---

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only One Alternative Considered

[Tech Lead] The RFC should evaluate at least one or two more alternatives:

  1. External rate limiting via Envoy/Istio — infrastructure-level, no application code change. Why is application-level preferred?
  2. Webhook-based rate limiting via THV-0017 — the dynamic webhook middleware RFC already mentions rate limiting as an external webhook use case. Why build it into core instead of leveraging the extension mechanism?
  3. Fixed window counters — simpler than token bucket, less precise. Why was this not considered?

Documenting why these were rejected strengthens the case for the chosen approach.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Added three more alternatives: Envoy/Istio (can't do MCP-aware enforcement), webhook-based via THV-0017 (adds latency on hot path), and fixed window counters (boundary burst problem).

|------|----------|----------|-------|
| 2026-03-18 | @jerm-dro | Draft | Initial submission |

### Implementation Tracking
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing RFC Sections

[Tech Lead] Compared to peer RFCs (THV-0035, THV-0017, THV-0053) and the RFC template, these mandatory or expected sections are absent:

  • Detailed Design: Go type definitions (CRD types with kubebuilder tags), middleware interface, Lua script, sequence diagrams
  • Testing Strategy: Unit (miniredis), integration (testcontainers), E2E (Chainsaw), performance benchmarks
  • Migration / Rollback Plan: What happens during upgrade? How to disable if problems arise?
  • Performance Impact: Latency budget for the Lua script on the hot path. Target? Fallback if Redis latency exceeds threshold?
  • Compatibility: Backward (existing CRDs without rateLimiting), forward (extensibility for future algorithms)
  • Implementation Plan: Phased task breakdown — suggested 5-6 PRs within the 400-line/10-file limit:
    1. Core rate limiter package + Lua script (~300 lines)
    2. Rate limit middleware + MCP parser integration (~250 lines)
    3. CRD type definitions + validation (~350 lines)
    4. Operator controller wiring (~200 lines)
    5. Integration/E2E tests (~300 lines)
  • Open Questions: Several remain unresolved (see other comments)
  • Config hot-reload semantics: When rateLimiting is updated on the CRD, do pods restart or hot-reload? What happens to existing bucket state in Redis if limits change?
  • Documentation plan: User docs, configuration examples, operational runbook

Recommend using THV-0035 and THV-0017 as reference points for the expected level of detail.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. This is a smaller-scope RFC — rateLimiting is a new optional field with no migration concerns, and we have prior art for Redis testing patterns. The additional sections (testing strategy, Go types, sequence diagrams, benchmarks, hot-reload semantics, PR breakdown) are implementation details that will be addressed in the PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants