Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/upstream-projects.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ projects:

- id: toolhive
repo: stacklok/toolhive
version: v0.29.3
version: v0.30.0
# toolhive is a monorepo covering the CLI, the Kubernetes
# operator, and the vMCP gateway. It also introduces cross-
# cutting features that land in concepts/, integrations/,
Expand Down
7 changes: 7 additions & 0 deletions docs/toolhive/concepts/embedded-auth-server.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,13 @@ MCP servers receive the upstream access token in the `Authorization: Bearer`
header. They don't need to implement custom authentication logic or manage
secrets.

If the backend MCP server is public and you only want client-side
authentication, set `disableUpstreamTokenInjection: true` on the embedded auth
server config. In that mode the JWT is still validated, but the proxy strips the
client's credential headers (`Authorization`, `Cookie`, and
`Proxy-Authorization`) instead of swapping them for an upstream token, so the
backend receives an unauthenticated request.

## Automatic token refresh

Upstream access tokens expire independently of the ToolHive JWT lifespan. When
Expand Down
19 changes: 10 additions & 9 deletions docs/toolhive/guides-k8s/auth-k8s.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -535,15 +535,16 @@ kubectl apply -f embedded-auth-config.yaml

**Configuration reference:**

| Field | Description |
| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `issuer` | HTTPS URL identifying this authorization server. Appears in the `iss` claim of issued JWTs. |
| `signingKeySecretRefs` | References to Secrets containing JWT signing keys. First key is active; additional keys support rotation. |
| `hmacSecretRefs` | References to Secrets with symmetric keys for signing authorization codes and refresh tokens. |
| `tokenLifespans` | Configurable durations for access tokens (default: 1h), refresh tokens (default: 168h), and auth codes (default: 10m). |
| `upstreamProviders` | Configuration for upstream identity providers. MCPServer and MCPRemoteProxy support one provider; VirtualMCPServer supports multiple providers for sequential authentication. |
| `baselineClientScopes` | Optional list of OAuth 2.0 scopes merged into every DCR-registered client's scope set. Use this when MCP clients register with a narrowed `scope` field but then request wider scopes at `/oauth/authorize`. See [Baseline scopes for DCR clients](../concepts/embedded-auth-server.mdx#baseline-scopes-for-dcr-clients). |
| `cimd` | Optional Client ID Metadata Document (CIMD) configuration. When `cimd.enabled` is `true`, the auth server accepts HTTPS URLs as `client_id` values and resolves them via CIMD, letting clients (for example, VS Code) authenticate without prior Dynamic Client Registration. See [Client ID Metadata Document (CIMD)](../concepts/embedded-auth-server.mdx#client-id-metadata-document-cimd). |
| Field | Description |
| ------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `issuer` | HTTPS URL identifying this authorization server. Appears in the `iss` claim of issued JWTs. |
| `signingKeySecretRefs` | References to Secrets containing JWT signing keys. First key is active; additional keys support rotation. |
| `hmacSecretRefs` | References to Secrets with symmetric keys for signing authorization codes and refresh tokens. |
| `tokenLifespans` | Configurable durations for access tokens (default: 1h), refresh tokens (default: 168h), and auth codes (default: 10m). |
| `upstreamProviders` | Configuration for upstream identity providers. MCPServer and MCPRemoteProxy support one provider; VirtualMCPServer supports multiple providers for sequential authentication. |
| `baselineClientScopes` | Optional list of OAuth 2.0 scopes merged into every DCR-registered client's scope set. Use this when MCP clients register with a narrowed `scope` field but then request wider scopes at `/oauth/authorize`. See [Baseline scopes for DCR clients](../concepts/embedded-auth-server.mdx#baseline-scopes-for-dcr-clients). |
| `cimd` | Optional Client ID Metadata Document (CIMD) configuration. When `cimd.enabled` is `true`, the auth server accepts HTTPS URLs as `client_id` values and resolves them via CIMD, letting clients (for example, VS Code) authenticate without prior Dynamic Client Registration. See [Client ID Metadata Document (CIMD)](../concepts/embedded-auth-server.mdx#client-id-metadata-document-cimd). |
| `disableUpstreamTokenInjection` | Optional. When `true`, the embedded auth server authenticates clients normally but the proxy strips `Authorization`, `Cookie`, and `Proxy-Authorization` from forwarded requests instead of swapping the JWT for an upstream token. Use this for public backends (such as documentation servers) that you still want to gate behind client auth. Cannot be combined with `tokenExchange` or `awsSts` on the same workload. |

**Step 5: Create the MCPOIDCConfig and MCPServer resources**

Expand Down
31 changes: 16 additions & 15 deletions docs/toolhive/guides-k8s/redis-session-storage.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ different configuration:
pattern. See
[Embedded auth server session storage](#embedded-auth-server-session-storage).

- **MCPServer and VirtualMCPServer horizontal scaling** - shares MCP session
state across pod replicas so any pod can handle any request. Uses a standalone
Redis instance with a simple password. Session data is not persisted to disk.
If the Redis pod restarts, active sessions are lost and clients must
reconnect. See
- **MCPServer, MCPRemoteProxy, and VirtualMCPServer horizontal scaling** -
shares MCP session state across pod replicas so any pod can handle any
request. Uses a standalone Redis instance with a simple password. Session data
is not persisted to disk. If the Redis pod restarts, active sessions are lost
and clients must reconnect. See
[Horizontal scaling session storage](#horizontal-scaling-session-storage).

Redis is also required for [rate limiting](./rate-limiting.mdx), which stores
Expand Down Expand Up @@ -705,10 +705,11 @@ session storage is working correctly.

## Horizontal scaling session storage

When you run multiple replicas of an `MCPServer` proxy runner or a
`VirtualMCPServer`, MCP sessions must be shared across pods so that any replica
can handle any client request. ToolHive stores this session state in Redis using
a simple password. No ACL user configuration or Sentinel is required.
When you run multiple replicas of an `MCPServer` proxy runner, an
`MCPRemoteProxy`, or a `VirtualMCPServer`, MCP sessions must be shared across
pods so that any replica can handle any client request. ToolHive stores this
session state in Redis using a simple password. No ACL user configuration or
Sentinel is required.

### Deploy a standalone Redis instance

Expand Down Expand Up @@ -935,12 +936,12 @@ kubectl delete pod -n toolhive-system \
## Sharing a Redis instance

You can reuse the same Redis instance for embedded auth server sessions,
MCPServer scaling, and VirtualMCPServer scaling by using different `keyPrefix`
values per use case. If you share an instance, use the Redis Sentinel
StatefulSet from the [embedded auth server section](#deploy-redis-sentinel),
which has persistent storage. The standalone `Deployment` from the scaling
section is not suitable as a shared instance because it has no persistent
storage.
MCPServer scaling, MCPRemoteProxy scaling, and VirtualMCPServer scaling by using
different `keyPrefix` values per use case. If you share an instance, use the
Redis Sentinel StatefulSet from the
[embedded auth server section](#deploy-redis-sentinel), which has persistent
storage. The standalone `Deployment` from the scaling section is not suitable as
a shared instance because it has no persistent storage.

The embedded auth server uses `thv:auth:*` by default; set distinct prefixes for
your scaling workloads:
Expand Down
62 changes: 62 additions & 0 deletions docs/toolhive/guides-k8s/remote-mcp-proxy.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -626,6 +626,68 @@ curl -X POST http://localhost:8080/mcp \
-d '{"jsonrpc":"2.0","method":"tools/list","id":1}'
```

## Run multiple replicas for high availability

By default, an `MCPRemoteProxy` runs as a single proxy pod. To survive pod
restarts and absorb higher request volumes, set `spec.replicas` and configure
shared session storage so MCP sessions resolve across replicas.

```yaml title="analytics-proxy-ha.yaml"
apiVersion: toolhive.stacklok.dev/v1beta1
kind: MCPRemoteProxy
metadata:
name: analytics-proxy
namespace: toolhive-system
spec:
remoteUrl: https://mcp.analytics.example.com
transport: streamable-http
# ... other config ...

# highlight-start
replicas: 3
sessionStorage:
provider: redis
address: redis.toolhive-system.svc.cluster.local:6379
db: 0
keyPrefix: analytics-proxy-sessions
passwordRef:
name: redis-password
key: password
sessionAffinity: None
# highlight-end
```

When `spec.replicas` is omitted, the operator does not write `replicas` on the
Deployment, so an HPA or other external controller can own scaling without the
operator overwriting the live count. Set `spec.replicas` to make the operator
authoritative.

`sessionStorage` configures a shared store for the proxy's session state.
Without it, each pod keeps sessions in memory only, so a request routed to a
different replica than the one that established the session fails with
`Session not found`. The store must be `redis` for multi-replica deployments;
the `memory` provider stays pod-local.

`sessionAffinity` controls how the Service routes connections. The default
`ClientIP` value pins each client to a single pod, which is correct when you
have no Redis-backed session storage but defeats the load-balancing benefit of
multiple replicas. With Redis-backed `sessionStorage`, set
`sessionAffinity: None` so requests distribute across pods and sessions resolve
through the shared store.

For end-to-end Redis deployment steps and Sentinel configuration, see
[Redis Sentinel session storage](./redis-session-storage.mdx).

:::warning

If `spec.replicas` is greater than 1 without Redis-backed `sessionStorage`, the
operator sets a `SessionStorageWarning` condition on the resource but still
applies the replica count. Pods start, but any request routed to a replica that
did not establish the session fails. Configure Redis before scaling past a
single replica.

:::

## Expose the proxy externally

To make the proxy accessible from outside the cluster, create an Ingress
Expand Down
41 changes: 41 additions & 0 deletions docs/toolhive/guides-vmcp/authentication.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -636,6 +636,47 @@ auth server. See
[Redis Sentinel session storage](../guides-k8s/redis-session-storage.mdx) for a
complete walkthrough.

### Skip upstream token injection for public backends

By default, the embedded auth server swaps the client's ToolHive JWT for the
matching upstream token before forwarding the request to a backend, using
[upstream token injection](#upstream-token-injection) or
[token exchange with upstream tokens](#token-exchange-with-upstream-tokens).
That assumes the backend itself requires authentication.

If the backend MCP server is public (for example, a documentation server with no
auth of its own) but you still want clients to authenticate to vMCP, set
`disableUpstreamTokenInjection: true` on `authServerConfig`. The embedded auth
server still runs the OAuth flow for clients, but the proxy then **strips** the
client's credential headers (`Authorization`, `Cookie`, and
`Proxy-Authorization`) after the JWT is validated, so the backend receives an
unauthenticated request.

```yaml title="VirtualMCPServer resource"
spec:
authServerConfig:
issuer: https://auth.example.com
# highlight-next-line
disableUpstreamTokenInjection: true
upstreamProviders:
- name: github
# ...
```

Use `outgoingAuth` with `headerInjection` (see
[Static header injection](#static-header-injection)) if the backend still needs
a static credential such as an API key.

:::warning[Incompatible with token exchange and AWS STS]

The strip happens after JWT validation but before token-exchange and AWS STS
middlewares would normally attach a new credential. Combining
`disableUpstreamTokenInjection: true` with `tokenExchange` or `awsSts` on the
same vMCP causes the embedded auth server to fail validation at startup. Use
this flag only when the backend should remain unauthenticated.

:::

### Complete example

This example deploys a vMCP with an embedded auth server that authenticates
Expand Down
48 changes: 48 additions & 0 deletions docs/toolhive/guides-vmcp/configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,54 @@ For detailed backend health monitoring, see
[Verify backend status](./backend-discovery.mdx#verify-backend-status) in the
Backend discovery guide.

## Forward client headers to backends

By default, vMCP opens a fresh request to each backend and sends only its own
headers; the client's request headers do not reach the backend. If a backend
authenticates per-user from a custom header (for example, an `x-api-key` it
resolves to a specific user), that header is dropped unless you allowlist it.

Use `spec.passthroughHeaders` to allowlist headers that vMCP should forward
verbatim from the client request to every backend it calls:

```yaml title="VirtualMCPServer resource"
spec:
groupRef:
name: my-backends
incomingAuth:
type: oidc
oidcConfigRef:
name: my-oidc-config
audience: my-vmcp
# highlight-start
passthroughHeaders:
- x-mcp-api-key
# highlight-end
```

Each listed header is captured from the client request at the incoming-auth
boundary and attached to that session's backend requests. Headers that are not
listed are ignored.

A few names are reserved and rejected at admission time, because letting them
pass through would let a client overwrite vMCP's own routing or identity
signals:

- `Host`
- Hop-by-hop headers (such as `Connection`, `Keep-Alive`, `Transfer-Encoding`)
- `X-Forwarded-*` headers
- `Proxy-Authorization` and other proxy-control headers

:::warning[Forwarded headers are attacker-influenceable]

Anything you list in `passthroughHeaders` flows through from the client
verbatim. A backend that grants authority based on the value (for example, a
shared API key that maps to a user) trusts the client to send the right value.
Use this only when a trusted upstream sets the header on behalf of the client,
or when the backend independently revalidates it.

:::

## Next steps

- [Configure backend discovery](./backend-discovery.mdx) to control how vMCP
Expand Down
1 change: 1 addition & 0 deletions docs/toolhive/reference/cli/thv_llm_proxy_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ thv llm proxy start [flags]

```
-h, --help help for start
--skip-browser Print the OIDC authorization URL instead of opening a browser, then wait for the callback. Use in headless/SSH/CI environments where no system browser is available.
--tls-skip-verify Skip TLS certificate verification for the upstream gateway (overrides stored config; local dev only)
```

Expand Down
1 change: 1 addition & 0 deletions docs/toolhive/reference/cli/thv_llm_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ thv llm setup [flags]
--issuer string OIDC issuer URL
--lazy Skip the interactive OIDC login and defer it until the first time a configured tool accesses the gateway. Tool config and persisted settings are written normally. Useful for unattended provisioning (e.g. an MDM profile).
--proxy-port int Localhost proxy listen port (omit to keep current; default: 14000)
--skip-browser Print the OIDC authorization URL instead of opening a browser, then wait for the callback. Use in headless/SSH/CI environments where no system browser is available.
--tls-skip-verify Skip TLS certificate verification for the upstream gateway (local dev only). For direct-mode tools (Claude Code, Gemini CLI) this sets NODE_TLS_REJECT_UNAUTHORIZED=0, disabling TLS for ALL of that tool's outbound connections. For proxy-mode tools only the proxy-to-gateway connection is affected.
```

Expand Down
Loading