Skip to content

RFC: vMCP session binding by identity tuple#73

Open
jhrozek wants to merge 1 commit into
mainfrom
session-identity-binding
Open

RFC: vMCP session binding by identity tuple#73
jhrozek wants to merge 1 commit into
mainfrom
session-identity-binding

Conversation

@jhrozek
Copy link
Copy Markdown
Contributor

@jhrozek jhrozek commented May 21, 2026

Summary

  • Replaces the vMCP HMAC-of-bearer-token session binding with a stable (iss, sub) identity tuple so legitimate OAuth refreshes no longer terminate sessions (users currently get logged out once per access-token TTL).
  • Drops the per-deployment HMAC secret and the operator-side plumbing that distributed it.
  • Closes a prior LocalUserMiddleware gap where every local user fell into the anonymous equivalence class.
  • Supersedes the session-binding portion of THV-0038; preserves the cross-pod persistence design of THV-0047 with the new key.

Tracks toolhive#5306.

Review focus

  • Threat model in §Security Considerations — particularly the HMAC → plaintext-PII-at-rest trade-off and the operator mitigations.
  • OAuth/OIDC correctness (refresh invariance, (iss, sub) global uniqueness, RFC 8693 interactions).
  • Migration story (invalidate-on-read) and the rolling-deploy race note.

🤖 Generated with Claude Code

The current HMAC-of-bearer-token binding rejects legitimate OAuth
refreshes — the access-token bytes change on each refresh, so users
get logged out once per access-token TTL. Pin the binding to the
(iss, sub) identity tuple instead, so the invariant matches what
is actually stable across a session.

Drops the per-deployment HMAC secret and the operator-side plumbing
that distributed it. Closes a prior LocalUserMiddleware gap where
every local user fell into the anonymous equivalence class.

Supersedes the session-binding portion of THV-0038; preserves the
cross-pod persistence design of THV-0047 with the new key. Tracks
toolhive#5306.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jhrozek jhrozek force-pushed the session-identity-binding branch from 560164b to 1539721 Compare May 21, 2026 20:22
@jhrozek jhrozek requested review from jerm-dro and tgrunnagle May 21, 2026 20:33
Copy link
Copy Markdown
Contributor

@tgrunnagle tgrunnagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one non-blocking question


- **Refresh invariance**: OIDC Core §2 requires `sub` to be stable per End-User within an issuer; RFC 6749 refresh does not change the principal. Bindings survive refresh.
- **Global uniqueness**: `(iss, sub)` is globally unique under OIDC Core §2 — `sub` is only locally unique within `iss`, but the `iss\x00sub` concatenation lifts that to global.
- **Token exchange (RFC 8693)**: The binding is at the front-door token, not at any upstream-exchanged token. RFC 8693 §4.1 impersonation preserves `sub`; delegation adds `act` but `sub` still describes the principal the token is *about*. No interaction with the binding.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: for future proofing, should we also bind the session to an optional act claim? If different agents are accessing the vMCP with a delegated token for the same user we should differentiate them.

If so, do we need to consider nested delegation?

Copy link
Copy Markdown
Contributor

@tgrunnagle tgrunnagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Agent Consensus Review — HIGH-severity findings (non-blocking)

Agents consulted: security & threat model, OAuth/OIDC, MCP protocol, migration & ops, API design, RFC structure, general code quality.

The RFC's approach is sound: it correctly identifies the bug in toolhive#5306 as binding the wrong invariant, correctly chooses (iss, sub) over the rejected two-tier tsid + iss/sub design, correctly rejects the hybrid migration that would re-enable the hijack attack, and correctly preserves THV-0047's cross-pod persistence model with only the persisted value changed. The threat-model framing ("Both defend / New defends, old did not / Old defended, new does not") is unusually clear, and the RFC is faithful to the panel decisions documented in the linked issue.

A multi-agent review surfaced two HIGH-severity items worth resolving before implementation begins. These are not merge blockers — both are design questions that can be answered in this RFC iteration, and the rest of the design stands. The detailed inline comments below describe each issue and a concrete recommendation. A larger set of MEDIUM-severity items (global-uniqueness assumption, restore-branch-3 error class, UnauthenticatedSentinel rationale, refresh-invariance citation cleanup, session-affinity portability, anonymous-mode production guardrail, missing template sections, etc.) was identified locally but is not posted here — happy to share that fuller set if useful.


Generated with Claude Code


### AnonymousMiddleware scoping

`AnonymousMiddleware` emits identical `(iss="toolhive-local", sub="anonymous")` for every request. Consequently every anonymous user collapses into one binding equivalence class — per-identity hijack prevention is impossible *by construction* in anonymous deployments. This is not a regression: the old token-hash scheme produced the empty-string sentinel for the same case and exhibited identical behaviour. We surface the limitation at startup with a `WARN` and document the mode as dev-only. The session decorator still defends against the *upgrade attack* — a token presented to an anonymous session is rejected.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH — Local user named "anonymous" collides exactly with the anonymous-mode sentinel binding (Consensus 9/10) — raised by: oauth-oidc, security, mcp-protocol, migration-ops

AnonymousMiddleware emits (iss="toolhive-local", sub="anonymous") (this line). The LocalUserMiddleware fix at line 161 says the middleware "populates with the username as sub" — so a local user literally named anonymous produces an identical (iss="toolhive-local", sub="anonymous") binding. A Redis dump containing toolhive-local\x00anonymous becomes ambiguous (anonymous-mode session vs local user "anonymous"), and the LocalUserMiddleware fix the RFC claims as a "strict improvement" silently regresses for that one username.

Recommendation: Either reserve anonymous as a forbidden local username in LocalUserMiddleware's validation (with a clear startup error), or use distinct iss values for the two modes — e.g. iss="toolhive-anonymous" for anonymous mode vs iss="toolhive-local" for LocalUserMiddleware. Option 2 removes the collision by construction. Document the chosen approach in the Fail-closed defaults section.


## Compatibility

### Backward compatibility
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH — Rollback behaviour for new-format sessions hitting the old image is undefined (Consensus 8/10) — raised by: migration-ops

The Backward Compatibility section documents forward migration (legacy sessions hitting new code → invalidate-on-read), but is silent on the reverse: sessions written with only MetadataKeyIdentityBinding and no legacy keys are read by the old image, which looks for MetadataKeyTokenHash / MetadataKeyTokenSalt. Depending on how legacy RestoreHijackPrevention handles "metadata present but expected keys absent", this may surface as HTTP 500, or — worse — as a session restored without any hijack-prevention decorator applied (silent security regression). Operators considering Phase-1 rollback need an explicit behavioural guarantee.

Recommendation: Add a Rollback subsection that (a) states exactly how the prior image behaves when it encounters a session with only the new key, (b) either commits to a defensive write of a sentinel legacy key during Phase 1 or explicitly documents that rollback requires waiting out the session TTL, and (c) adds a rollback test mirroring the rolling-deploy race test (new-image-written session read by old factory).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants