RFC: vMCP session binding by identity tuple#73
Conversation
The current HMAC-of-bearer-token binding rejects legitimate OAuth refreshes — the access-token bytes change on each refresh, so users get logged out once per access-token TTL. Pin the binding to the (iss, sub) identity tuple instead, so the invariant matches what is actually stable across a session. Drops the per-deployment HMAC secret and the operator-side plumbing that distributed it. Closes a prior LocalUserMiddleware gap where every local user fell into the anonymous equivalence class. Supersedes the session-binding portion of THV-0038; preserves the cross-pod persistence design of THV-0047 with the new key. Tracks toolhive#5306. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
560164b to
1539721
Compare
tgrunnagle
left a comment
There was a problem hiding this comment.
LGTM, one non-blocking question
|
|
||
| - **Refresh invariance**: OIDC Core §2 requires `sub` to be stable per End-User within an issuer; RFC 6749 refresh does not change the principal. Bindings survive refresh. | ||
| - **Global uniqueness**: `(iss, sub)` is globally unique under OIDC Core §2 — `sub` is only locally unique within `iss`, but the `iss\x00sub` concatenation lifts that to global. | ||
| - **Token exchange (RFC 8693)**: The binding is at the front-door token, not at any upstream-exchanged token. RFC 8693 §4.1 impersonation preserves `sub`; delegation adds `act` but `sub` still describes the principal the token is *about*. No interaction with the binding. |
There was a problem hiding this comment.
Question: for future proofing, should we also bind the session to an optional act claim? If different agents are accessing the vMCP with a delegated token for the same user we should differentiate them.
If so, do we need to consider nested delegation?
tgrunnagle
left a comment
There was a problem hiding this comment.
Multi-Agent Consensus Review — HIGH-severity findings (non-blocking)
Agents consulted: security & threat model, OAuth/OIDC, MCP protocol, migration & ops, API design, RFC structure, general code quality.
The RFC's approach is sound: it correctly identifies the bug in toolhive#5306 as binding the wrong invariant, correctly chooses (iss, sub) over the rejected two-tier tsid + iss/sub design, correctly rejects the hybrid migration that would re-enable the hijack attack, and correctly preserves THV-0047's cross-pod persistence model with only the persisted value changed. The threat-model framing ("Both defend / New defends, old did not / Old defended, new does not") is unusually clear, and the RFC is faithful to the panel decisions documented in the linked issue.
A multi-agent review surfaced two HIGH-severity items worth resolving before implementation begins. These are not merge blockers — both are design questions that can be answered in this RFC iteration, and the rest of the design stands. The detailed inline comments below describe each issue and a concrete recommendation. A larger set of MEDIUM-severity items (global-uniqueness assumption, restore-branch-3 error class, UnauthenticatedSentinel rationale, refresh-invariance citation cleanup, session-affinity portability, anonymous-mode production guardrail, missing template sections, etc.) was identified locally but is not posted here — happy to share that fuller set if useful.
Generated with Claude Code
|
|
||
| ### AnonymousMiddleware scoping | ||
|
|
||
| `AnonymousMiddleware` emits identical `(iss="toolhive-local", sub="anonymous")` for every request. Consequently every anonymous user collapses into one binding equivalence class — per-identity hijack prevention is impossible *by construction* in anonymous deployments. This is not a regression: the old token-hash scheme produced the empty-string sentinel for the same case and exhibited identical behaviour. We surface the limitation at startup with a `WARN` and document the mode as dev-only. The session decorator still defends against the *upgrade attack* — a token presented to an anonymous session is rejected. |
There was a problem hiding this comment.
HIGH — Local user named "anonymous" collides exactly with the anonymous-mode sentinel binding (Consensus 9/10) — raised by: oauth-oidc, security, mcp-protocol, migration-ops
AnonymousMiddleware emits (iss="toolhive-local", sub="anonymous") (this line). The LocalUserMiddleware fix at line 161 says the middleware "populates with the username as sub" — so a local user literally named anonymous produces an identical (iss="toolhive-local", sub="anonymous") binding. A Redis dump containing toolhive-local\x00anonymous becomes ambiguous (anonymous-mode session vs local user "anonymous"), and the LocalUserMiddleware fix the RFC claims as a "strict improvement" silently regresses for that one username.
Recommendation: Either reserve anonymous as a forbidden local username in LocalUserMiddleware's validation (with a clear startup error), or use distinct iss values for the two modes — e.g. iss="toolhive-anonymous" for anonymous mode vs iss="toolhive-local" for LocalUserMiddleware. Option 2 removes the collision by construction. Document the chosen approach in the Fail-closed defaults section.
|
|
||
| ## Compatibility | ||
|
|
||
| ### Backward compatibility |
There was a problem hiding this comment.
HIGH — Rollback behaviour for new-format sessions hitting the old image is undefined (Consensus 8/10) — raised by: migration-ops
The Backward Compatibility section documents forward migration (legacy sessions hitting new code → invalidate-on-read), but is silent on the reverse: sessions written with only MetadataKeyIdentityBinding and no legacy keys are read by the old image, which looks for MetadataKeyTokenHash / MetadataKeyTokenSalt. Depending on how legacy RestoreHijackPrevention handles "metadata present but expected keys absent", this may surface as HTTP 500, or — worse — as a session restored without any hijack-prevention decorator applied (silent security regression). Operators considering Phase-1 rollback need an explicit behavioural guarantee.
Recommendation: Add a Rollback subsection that (a) states exactly how the prior image behaves when it encounters a session with only the new key, (b) either commits to a defensive write of a sentinel legacy key during Phase 1 or explicitly documents that rollback requires waiting out the session TTL, and (c) adds a rollback test mirroring the rolling-deploy race test (new-image-written session read by old factory).
Summary
(iss, sub)identity tuple so legitimate OAuth refreshes no longer terminate sessions (users currently get logged out once per access-token TTL).LocalUserMiddlewaregap where every local user fell into the anonymous equivalence class.Tracks toolhive#5306.
Review focus
(iss, sub)global uniqueness, RFC 8693 interactions).🤖 Generated with Claude Code