Configure rate limits on VirtualMCPServer PR B#5276
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5276 +/- ##
==========================================
+ Coverage 68.81% 68.84% +0.03%
==========================================
Files 627 628 +1
Lines 63594 63682 +88
==========================================
+ Hits 43762 43845 +83
- Misses 16583 16585 +2
- Partials 3249 3252 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
695a96c to
480152f
Compare
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
480152f to
619264d
Compare
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
619264d to
f3cf1e4
Compare
Large PR justification has been provided. Thank you!
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
Summary
This PR wires the
VirtualMCPServerrate-limiting config added in PR A into the vMCP runtime so configured limits are actually enforced before requests reach aggregated backends.spec.config.rateLimitingfrom the vMCP runtime config intovmcpserver.Config.call_toolrequests are rate-limited using the innerarguments.tool_name.call_toolbehavior.perUser.maxTokens=1, then verifies the second authenticatedtools/callis rejected.Fixes: #4552
Type of change
Test plan
Changes
pkg/vmcp/cli/serve.goRateLimitingintovmcpserver.Config.pkg/vmcp/server/server.goNew(), and refactors MCP middleware composition.pkg/vmcp/server/ratelimit.gopkg/ratelimit/middleware.gopkg/vmcp/server/ratelimit_test.gotest/e2e/thv-operator/virtualmcp/virtualmcp_rate_limiting_test.gomanual-testing-ratelimit-vmcp.mdDoes this introduce a user-facing change?
Yes. Cluster admins can now configure
VirtualMCPServer.spec.config.rateLimitingand vMCP will enforce the configured shared, per-user, and per-tool limits fortools/callrequests at runtime.Implementation plan
Approved implementation plan
sessionStorageandrateLimitingconfig.tools/callforcall_toolto the innerarguments.tool_namevalue.Special notes for reviewers
tools/callparameters.kubectl port-forwardfor vMCP/OIDC access instead of assuming NodePorts are reachable onlocalhost, which makes it work reliably with local kind clusters and CI environments.Manual testing
If the cluster is not already running with local images:
1. PR A: Validate CRD Admission Rules
1.1. Reject rate limiting without Redis session storage
Expected: the API server rejects the object with:
1.2. Reject per-user rate limiting without OIDC auth
Expected: the API server rejects the object with:
2. PR A: Verify Runtime ConfigMap Serialization
Create Redis first and wait for it to accept traffic. This avoids a transient vMCP restart while Redis is still starting.
Create a valid shared-limit vMCP. This uses anonymous auth so it can be inspected without setting up OIDC manually.
Wait for resources:
Expected: the
VirtualMCPServerreachesReady. It may briefly showDegradedwhile the backend health check warms up.Inspect the generated vMCP ConfigMap:
Expected:
config.yamlcontains:This confirms PR A’s field survives CRD -> converter -> ConfigMap serialization.
3. PR B: Verify Runtime Enforcement
The most reliable manual runtime check is the focused E2E test. It deploys Redis, a parameterized OIDC server, a backend MCPServer, and a VirtualMCPServer with
perUser.maxTokens=1. Then it sends two authenticatedtools/callrequests from the same user and expects the second request to be rejected.Expected:
This confirms PR B:
config.rateLimiting.sessionStorage.tools/callis rejected with rate-limit error behavior.4. Optional: Inspect Runtime Logs
Expected log line:
If Redis cannot be reached at startup, vMCP should fail to build the handler and logs should show:
5. Cleanup
The focused E2E test creates timestamped resources and cleans them up automatically.
Large PR Justification
This is a new feature package with a large test suite, and it needs to land as one coherent phase.