Skip to content

Commit ddf4273

Browse files
committed
fixes from review
1 parent 4f970e6 commit ddf4273

1 file changed

Lines changed: 135 additions & 72 deletions

File tree

rfcs/THV-0034-session-scoped-client-lifecycle.md renamed to rfcs/THV-0038-session-scoped-client-lifecycle.md

Lines changed: 135 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# RFC: Session-Scoped MCP Client Lifecycle Management
1+
# THV-0038: Session-Scoped MCP Client Lifecycle Management
22

33
- **Status**: Draft
44
- **Author(s)**: @yrobla, @jerm-dro
@@ -9,7 +9,7 @@
99

1010
## Summary
1111

12-
Move MCP backend client lifecycle from per-request creation/destruction to session-scoped management. Clients will be created once during session initialization, reused throughout the session lifetime, and closed during session cleanup. This simplifies the client pooling architecture and ensures consistent backend state preservation.
12+
Move MCP backend client lifecycle from per-request creation/destruction to session-scoped management. Clients will be created once during session initialization, reused throughout the session lifetime, and closed during session cleanup. This simplifies the architecture and ensures consistent backend state preservation.
1313

1414
## Problem Statement
1515

@@ -130,14 +130,17 @@ Sessions already exist and have defined lifetimes (TTL-based expiration). Aligni
130130

131131
**Extend AfterInitialize Hook**:
132132

133-
The `AfterInitialize` hook in the discovery middleware already performs capability discovery for each backend. Extend this to also create and initialize MCP clients:
133+
The `AfterInitialize` hook in the discovery middleware already performs capability discovery for each backend. Restructure this to create clients once and reuse them:
134134

135-
1. After capability discovery completes, iterate through all backends in the group
136-
2. For each backend, create a client using the backend target configuration
135+
1. Iterate through all backends in the group
136+
2. For each backend, create a client using the backend target configuration via `createAndInitializeClient()`
137137
3. Initialize the client immediately (MCP handshake)
138-
4. Store successful clients in a map keyed by workload ID
139-
5. For failed initializations: log warning and continue (partial initialization is acceptable)
140-
6. Store the complete client map in the session via `SetBackendClients()`
138+
4. Use this same client to perform capability discovery (call `ListCapabilities`)
139+
5. Store the initialized client in a map keyed by workload ID
140+
6. For failed initializations: log warning and continue (partial initialization is acceptable)
141+
7. Store the complete client map in the session via `SetBackendClients()`
142+
143+
**Key Improvement**: By reusing the discovery client as the session-scoped client, we avoid redundant connection setup and better meet the stated goal of reducing overhead.
141144

142145
**Error Handling**:
143146
- Individual client initialization failures don't fail session creation
@@ -164,18 +167,25 @@ The `AfterInitialize` hook in the discovery middleware already performs capabili
164167
- Called only during session setup, not per-request
165168

166169
**Refactor Request Methods**:
167-
- `CallTool`, `ReadResource`, `GetPrompt`: Replace client creation pattern with:
170+
- `CallTool`, `ReadResource`, `GetPrompt`, `ListCapabilities`: Replace client creation pattern with:
168171
- Call `getSessionClient()` to retrieve pre-initialized client
169172
- Remove `defer c.Close()` (client managed by session)
170173
- Remove `initializeClient()` call (already initialized)
171174
- Use client directly for the operation
172-
- `ListCapabilities`: Keep existing per-request pattern (only called during session init)
175+
- **Note**: During session initialization in the discovery middleware, `createAndInitializeClient()` is called once per backend, and that same client is used for capability discovery and registered as the session-scoped client, avoiding redundant connection setup
173176

174177
**Location**: `pkg/vmcp/client/client.go`
175178

176179
#### 4. Session Cleanup Integration
177180

178-
The transport layer's `LocalStorage` implementation already calls `Close()` on sessions before deletion (implemented in a previous PR). We leverage this existing mechanism to close backend clients.
181+
**Current State**: The `Session` interface does not currently have a `Close()` method. This RFC will add it.
182+
183+
**Required Changes**:
184+
1. Add `Close() error` method to the `Session` interface in `pkg/transport/session/manager.go`
185+
2. Implement `Close()` in `VMCPSession` to close all backend clients
186+
3. Update `LocalStorage` to call `session.Close()` before deletion in `Delete()` and `DeleteExpired()` methods
187+
188+
**Reference**: The storage layer was refactored to use a pluggable interface in [PR #1989](https://github.com/stacklok/toolhive/pull/1989), but session cleanup hooks were not added at that time.
179189

180190
**VMCPSession.Close() Implementation**:
181191
- Iterate through all clients in the `backendClients` map
@@ -185,12 +195,12 @@ The transport layer's `LocalStorage` implementation already calls `Close()` on s
185195

186196
**Cleanup Triggers**:
187197

188-
The existing session cleanup infrastructure ensures clients are closed when:
189-
- **TTL Expiration**: Session manager's cleanup worker calls `DeleteExpired()`, which calls `Close()` on expired sessions
190-
- **Explicit Deletion**: Manual session deletion via `Delete()` calls `Close()` before removing the session
191-
- **Manager Shutdown**: `Stop()` calls `Close()` on all remaining sessions
198+
By adding `Close()` to the `Session` interface and calling it from `LocalStorage`, clients will be closed when:
199+
- **TTL Expiration**: Session manager's cleanup worker calls `DeleteExpired()`, which will call `session.Close()` on expired sessions
200+
- **Explicit Deletion**: Manual session deletion via `Delete()` will call `session.Close()` before removing the session
201+
- **Manager Shutdown**: `Manager.Stop()` will call `Close()` on all remaining sessions
192202

193-
No changes needed to the transport layer - it already has the hooks in place.
203+
The session cleanup infrastructure already exists (from [PR #1989](https://github.com/stacklok/toolhive/pull/1989)); we're adding the `Close()` hook to integrate with it.
194204

195205
#### 5. Error Handling
196206

@@ -201,14 +211,14 @@ No changes needed to the transport layer - it already has the hooks in place.
201211

202212
**Connection Failures During Tool Calls**:
203213
- Return error to client (existing behavior)
204-
- Health monitor marks backend unhealthy
205-
- Subsequent sessions won't attempt to initialize unhealthy backends
214+
- Health monitor marks backend unhealthy (existing behavior)
215+
- Client initialization attempts continue for unhealthy backends (health-based filtering is future work, see Phase 4)
206216

207217
**Client Already Closed**:
208218
- If session expired and client is closed, return clear error
209219
- Client should refresh session via `/initialize` endpoint
210220

211-
## Security
221+
## Security Considerations
212222

213223
### Threat Model
214224

@@ -249,60 +259,64 @@ The only difference is **when** clients are created (session init vs first use),
249259

250260
## Alternatives Considered
251261

252-
### Alternative 1: Keep Pooling but Decouple from httpBackendClient
262+
### Alternative 1: Keep Per-Request Pattern with Connection Pooling
253263

254-
**Approach**: Make `pooledBackendClient` accept an interface instead of embedding `*httpBackendClient`.
264+
**Approach**: Continue creating/closing clients per request but add a connection pool underneath to reuse TCP connections.
255265

256266
**Pros**:
257-
- Less refactoring required
258-
- Maintains lazy initialization pattern
267+
- Minimal changes to existing code
268+
- Reduces TCP handshake overhead
259269

260270
**Cons**:
261-
- Still has pooling complexity
262-
- Doesn't address first-call latency
263-
- Doesn't eliminate redundant initialization logic
271+
- Doesn't address MCP protocol initialization overhead (still happens per request)
272+
- Doesn't solve state preservation problem (each request still gets fresh MCP client)
273+
- Authentication still resolved and validated per request
274+
- Adds complexity at wrong layer
264275

265-
**Decision**: Rejected. Decoupling addresses tight coupling but not the fundamental architectural issue.
276+
**Decision**: Rejected. This addresses symptoms but not the root cause.
266277

267-
### Alternative 2: Lazy Pool Initialization with Eager Client Creation
278+
### Alternative 2: Lazy Client Creation on First Use
268279

269-
**Approach**: Keep pool but create all clients immediately when pool is first accessed.
280+
**Approach**: Create clients on first tool call to a backend, then store in session for reuse.
270281

271282
**Pros**:
272-
- Minimal code changes
273-
- Backward compatible with existing pool interface
283+
- Avoids creating clients for unused backends
284+
- Delays initialization until needed
274285

275286
**Cons**:
276-
- Pool abstraction still adds unnecessary indirection
277-
- First tool call per session still incurs initialization cost
278-
- Doesn't simplify architecture
287+
- First tool call to each backend still has initialization latency
288+
- More complex state management (need to track which clients exist)
289+
- Doesn't leverage existing capability discovery phase
290+
- Inconsistent performance (first vs subsequent calls)
279291

280-
**Decision**: Rejected. This is a middle ground that retains most of the complexity.
292+
**Decision**: Rejected. Complexity outweighs benefits. Capability discovery already knows which backends exist.
281293

282-
### Alternative 3: Global Client Pool Across Sessions
294+
### Alternative 3: Global Client Cache Across Sessions
283295

284-
**Approach**: Share clients across sessions to amortize initialization cost.
296+
**Approach**: Share clients across all sessions using a global cache, keyed by backend + auth identity.
285297

286298
**Pros**:
287299
- Maximum connection reuse
288-
- Lowest per-session overhead
300+
- Lowest initialization overhead
289301

290302
**Cons**:
291303
- **State Pollution**: Backend state (Playwright contexts, DB transactions) would leak across sessions
292-
- **Security Risk**: Potential for cross-session data exposure
293-
- **Complexity**: Requires complex client reset/sanitization logic
304+
- **Security Risk**: Potential for cross-session data exposure if keying is incorrect
305+
- **Complexity**: Requires complex cache eviction, client sanitization, and identity tracking
306+
- **Violates Session Isolation**: Sessions should be independent
294307

295-
**Decision**: Rejected. Session isolation is a core requirement for vMCP.
308+
**Decision**: Rejected. Session isolation is a core requirement for vMCP. The security and complexity risks outweigh performance gains.
296309

297-
## Backward Compatibility
310+
## Compatibility
298311

299-
### Breaking Changes
312+
### Backward Compatibility
300313

301-
**None for External APIs**: The `/vmcp/v1/*` HTTP API remains unchanged. Clients see identical behavior.
314+
**External APIs**: No breaking changes. The `/vmcp/v1/*` HTTP API remains unchanged. Clients see identical behavior.
302315

303-
**Internal API Changes**:
316+
**Internal APIs**:
304317
- `BackendClient` interface: No signature changes, but behavior changes from "create on demand" to "retrieve from session"
305-
- Components depending on `pooledBackendClient` or `BackendClientPool` types will need updates
318+
- `httpBackendClient` constructor: Requires session manager parameter (added field)
319+
- Internal refactoring only - no downstream impact on other packages
306320

307321
### Forward Compatibility
308322

@@ -314,31 +328,43 @@ Future enhancements are easier with this design:
314328

315329
## Implementation
316330

317-
### Phase 1: Session Client Storage
331+
### Phase 1: Session Client Storage and Cleanup Hooks
332+
333+
Add client storage capabilities and cleanup infrastructure:
318334

319-
Add client storage capabilities to VMCPSession:
335+
**Session Interface Changes**:
336+
- Add `Close() error` method to `Session` interface in `pkg/transport/session/manager.go`
337+
- Update `LocalStorage.Delete()` to call `session.Close()` before deletion
338+
- Update `LocalStorage.DeleteExpired()` to call `session.Close()` on each expired session before deletion
339+
- Update `Manager.Stop()` to call `Close()` on all remaining sessions
320340

341+
**VMCPSession Implementation**:
321342
- Add `backendClients map[string]*client.Client` field to `VMCPSession` struct
322343
- Implement `GetBackendClient(workloadID string)` method for retrieving clients
323344
- Implement `SetBackendClients(clients map[string]*client.Client)` for storing client map
324-
- Implement `Close()` method to iterate and close all clients
345+
- Implement `Close()` method to iterate through all clients, close them, and collect errors
325346

326347
**Files to modify**:
327-
- `pkg/vmcp/session/vmcp_session.go`
348+
- `pkg/transport/session/manager.go` (add Close() to interface)
349+
- `pkg/transport/session/storage_local.go` (call session.Close() before deletion)
350+
- `pkg/vmcp/session/vmcp_session.go` (implement Close() and client storage)
328351

329352
**Testing**:
330353
- Unit tests for client storage/retrieval
331354
- Unit tests for Close() error handling
355+
- Tests verifying LocalStorage calls session.Close() before deletion
356+
- Tests for cleanup on TTL expiration and explicit deletion
332357

333358
### Phase 2: Client Initialization at Session Creation
334359

335-
Extend the discovery middleware to initialize clients during session setup:
360+
Restructure the discovery middleware to create clients once and reuse them:
336361

337362
- Modify `AfterInitialize` hook in discovery middleware
338-
- After capability discovery, create clients for each backend
339-
- Initialize each client immediately (MCP handshake)
340-
- Store successful clients in session via `SetBackendClients()`
363+
- For each backend, create and initialize client via `createAndInitializeClient()`
364+
- Use the same client to perform capability discovery (call `ListCapabilities`)
365+
- Store the initialized client in session via `SetBackendClients()`
341366
- Log warnings for failed initializations but continue (partial initialization acceptable)
367+
- **Key**: Avoids redundant connection setup by reusing discovery client as session client
342368

343369
**Files to modify**:
344370
- `pkg/vmcp/server/middleware/discovery/discovery.go`
@@ -355,12 +381,10 @@ Modify httpBackendClient to retrieve clients from session instead of creating pe
355381
- Add `sessionManager` field to `httpBackendClient` struct
356382
- Pass session manager to `NewHTTPBackendClient()` constructor
357383
- Create `getSessionClient(ctx, workloadID)` helper method
358-
- Create `createAndInitializeClient(ctx, target)` method for session init use
359-
- Refactor `CallTool` to use `getSessionClient()` instead of `clientFactory()`
360-
- Refactor `ReadResource` to use `getSessionClient()`
361-
- Refactor `GetPrompt` to use `getSessionClient()`
384+
- Create `createAndInitializeClient(ctx, target)` method for session init use (called by discovery middleware)
385+
- Refactor `CallTool`, `ReadResource`, `GetPrompt`, `ListCapabilities` to use `getSessionClient()` instead of `clientFactory()`
362386
- Remove `defer c.Close()` calls from all methods
363-
- Remove `initializeClient()` calls from all methods (except `ListCapabilities`)
387+
- Remove `initializeClient()` calls from all methods
364388

365389
**Files to modify**:
366390
- `pkg/vmcp/client/client.go`
@@ -391,7 +415,7 @@ Add observability for client lifecycle:
391415

392416
### Dependencies
393417

394-
- Existing transport session cleanup infrastructure (sessions already call `Close()` on expiration)
418+
- Existing transport session cleanup infrastructure from [PR #1989](https://github.com/stacklok/toolhive/pull/1989) (provides deletion hooks where we'll add `Close()` calls)
395419
- Existing discovery middleware (`AfterInitialize` hook)
396420
- Existing health monitoring system
397421

@@ -415,18 +439,57 @@ Add observability for client lifecycle:
415439
- Client resource cleanup on TTL expiration
416440
- High-throughput scenarios verify no connection leaks
417441

418-
## Review History
442+
## Documentation
443+
444+
**Architecture Documentation**:
445+
- Update `docs/arch/vmcp-architecture.md` with session-scoped client lifecycle diagrams
446+
- Document client initialization during session setup phase
447+
- Add sequence diagrams showing client creation timing
448+
449+
**Code Documentation**:
450+
- Add package-level comments explaining session-client relationship
451+
- Document `VMCPSession.GetBackendClient()` and `SetBackendClients()` methods
452+
- Add examples of client retrieval in method documentation
453+
454+
**Operational Guides**:
455+
- Update troubleshooting guide with client lifecycle debugging tips
456+
- Document metrics and logs for monitoring client initialization
457+
- Add runbook for investigating client initialization failures
458+
459+
## Open Questions
460+
461+
All major design questions have been resolved during RFC development:
462+
463+
1. ~~Should clients be created eagerly or lazily?~~**Resolved: Eager initialization during session setup alongside capability discovery**
464+
2. ~~What happens if backend client initialization fails during session creation?~~**Resolved: Log warning, mark backend unhealthy, continue with partial initialization**
465+
3. ~~Should we share clients across sessions for efficiency?~~**Resolved: No, session isolation is more important than marginal efficiency gains**
466+
4. ~~How should client closure be triggered?~~**Resolved: Leverage existing session cleanup infrastructure (TTL expiration, explicit deletion, server shutdown)**
467+
468+
## References
469+
470+
- [MCP Specification](https://modelcontextprotocol.io/specification/2025-06-18)
471+
- [ToolHive Issue #3062](https://github.com/stacklok/toolhive/issues/3062)
472+
- [mark3labs/mcp-go SDK](https://github.com/mark3labs/mcp-go) - Backend client implementation
473+
- [vMCP Architecture Documentation](../docs/arch/vmcp-architecture.md)
474+
- Related RFCs:
475+
- Session management infrastructure
476+
- Health monitoring system
477+
- Backend discovery mechanism
478+
479+
---
480+
481+
## RFC Lifecycle
482+
483+
<!-- This section is maintained by RFC reviewers -->
484+
485+
### Review History
419486

420-
| Date | Reviewer | Status | Comments |
421-
|------|----------|--------|----------|
422-
| 2026-02-04 | @yrobla | Author | Initial draft |
423-
| 2026-02-04 | @jerm-dro | Contributor | Original proposal |
487+
| Date | Reviewer | Decision | Notes |
488+
|------|----------|----------|-------|
489+
| 2026-02-04 | @yrobla, @jerm-dro | Draft | Initial submission |
424490

425-
## Implementation Tracking
491+
### Implementation Tracking
426492

427-
| Repository | PR | Status | Notes |
428-
|------------|-------|--------|-------|
429-
| toolhive | #TBD | Not Started | Phase 1: Session client storage |
430-
| toolhive | #TBD | Not Started | Phase 2: Client initialization at session creation |
431-
| toolhive | #TBD | Not Started | Phase 3: Refactor backend client methods |
432-
| toolhive | #TBD | Not Started | Phase 4: Observability & monitoring |
493+
| Repository | PR | Status |
494+
|------------|-----|--------|
495+
| toolhive | - | Pending |

0 commit comments

Comments
 (0)