Skip to content

Commit 9ddce6a

Browse files
jerm-droclaude
andauthored
RFC: Horizontal Scaling for vMCP and Proxy Runner (#47)
* Add draft RFC for vMCP and proxyrunner horizontal scaling Introduces THV-XXXX covering background, problems, scope, high-level solution, and requirements for enabling safe horizontal scale-out of the vmcp and thv-proxyrunner components via externalized Redis session storage and session-aware routing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Address review feedback on vMCP horizontal scaling RFC - Fix Mermaid \n → <br/> in both diagrams - Update metadata layer description to include session IDs - Strengthen re-initialization language ("destructive" not "may not be safe") - Add current proxyrunner state context to §2.2 - Fix stdio scaling description: about concurrency, not exclusivity - Add fungibility constraint note to §1.4 and §5.3 R-OP-1 - Fix §3.1: single MCPServer backed by multiple proxyrunner replicas - Add vMCP scale-in to §3.1 in-scope - Update §3.2: proxyrunner scale-in only; proxyrunner:StatefulSet N:1 ratio - Add §3.3 Scaling Summary table - Update §4.1 diagram to show one:many proxyrunner→backend pods - Update vMCP session record to backends[] array with per-backend URLs/session IDs - Simplify proxyrunner session record to session→backend-pod mapping - Update §4.3 routing to reflect multi-backend session model - Add §4.6 proxyrunner value proposition note - Remove redundant R-PR-7 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Address second round of review feedback - §1.1 diagram: use subgraphs to show logical MCPServer boundary (one MCPServer = one proxyrunner Deployment + its StatefulSet) - §1.4: replace vague "This constraint" with specific statement that a stdio backend couples itself to a specific proxyrunner process - §2.2: correct current-state description — controller already supports multiple proxyrunner replicas for sse/streamable-http transports; the problem is lack of session-aware routing, not lack of replica support - §3.2: correct proxyrunner:StatefulSet ratio — each replica manages its own StatefulSet (1:1), not a shared StatefulSet (N:1) - §3.3: update Scaling Summary table to reflect 1:1 replica:StatefulSet - §4.1: update architecture diagram to show per-replica StatefulSets - §4.2: proxyrunner session record now includes identity subject for session hijacking prevention (per session-scoped work THV-0038) - §5.5: add Security Requirements (R-SEC-1, R-SEC-2) for session hijacking prevention Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Correct proxyrunner StatefulSet model based on code review All replicas of a proxyrunner Deployment share a single StatefulSet — they converge on the same desired state via Kubernetes server-side apply (field manager: toolhive-container-manager), with no leader election. The previous edit assumed a 1:1 replica:StatefulSet ratio, which is incorrect. Updated sections: - §1.1: add explanation of shared StatefulSet and server-side apply mechanics; note stdio replica cap vs sse/streamable-http - §2.2: correct current-state description — replicas share one StatefulSet; the problem is missing session-to-pod routing - §3.2: correct ratio back to N:1 (N replicas, 1 StatefulSet) - §3.3: update Scaling Summary table accordingly - §4.1: revert architecture diagram to single shared StatefulSet subgraph with multiple pods Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Scope CRD replica fields and correct current-state description Neither MCPServer nor VirtualMCPServer CRDs have a replicas field; both Deployments and the StatefulSet are hardcoded to 1. Add this as a core deliverable: spec.replicas (proxyrunner/vMCP pod count) and spec.backendReplicas (StatefulSet pod count) for declarative scaling. Explicitly document the one-StatefulSet-per-MCPServer invariant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Address second round of review comments - §1.1 diagram: remove replica count labels from nodes - §3.1: add proxyrunner scale-in (non-stdio) to in scope - §3.2: note 1:1 StatefulSet ratio as future stdio scaling path - §3.2: clarify inter-proxyrunner routing is best-effort - §3.2: replace proxyrunner scale-in out-of-scope bullet with graceful drain and backend StatefulSet scale-in bullets - §3.3: update table to reflect proxyrunner scale-in is in scope - §4.1: simplify diagram (no individual pod nodes) - §5.1: remove R-VMCP-6 (vMCP pod DNS exposure) - §5.4: fix R-DEP-4 to focus on backend scale-in as disruptive Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * tweaks * address comments * self-review * Add Required Changes section (§6) to RFC-51 Catalogs 16 concrete code changes needed to implement horizontal scaling for vMCP and proxyrunner, organized by component: CRD/operator changes (RC-1 through RC-5), transport session layer (RC-6, RC-7), vMCP session management (RC-8 through RC-10, RC-16), proxyrunner routing (RC-11 through RC-13), operational concerns (RC-14), and security (RC-15). Each change is mapped to requirements from §5 and documents the current state of the code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Rename RFC to match PR number (THV-0051 → THV-0047) and set status to In Review Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 7c46970 commit 9ddce6a

1 file changed

Lines changed: 649 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)