|
| 1 | +# Runbook — `invalid_client: client registration expired` |
| 2 | + |
| 3 | +When an MCP client (Claude Code, codex_rmcp_client, Cursor, …) |
| 4 | +fails with: |
| 5 | + |
| 6 | +``` |
| 7 | +OAuth token refresh failed: Server returned error response: |
| 8 | +invalid_client: client registration expired |
| 9 | +``` |
| 10 | + |
| 11 | +it has been alive longer than the sealed `client_id`'s TTL. |
| 12 | +DCR is one-shot for most MCP clients, so they don't auto- |
| 13 | +re-register and the user sees the error directly. |
| 14 | + |
| 15 | +## Signals |
| 16 | + |
| 17 | +- Per-client log line: `client_registration_expired` |
| 18 | + (`handlers/helpers.go`, fired by `openAndValidateClient`). |
| 19 | +- Counter: `mcp_auth_access_denied_total{reason="invalid_client"}` |
| 20 | + with `error_description="client registration expired"` in the |
| 21 | + log line that accompanies it. |
| 22 | +- User report: "MCP server stopped working after a few days of |
| 23 | + uptime, restart fixes it." |
| 24 | + |
| 25 | +## Why it happens |
| 26 | + |
| 27 | +The sealed `client_id` returned by `POST /register` has a |
| 28 | +lifetime baked into its encrypted payload (`ExpiresAt`). After |
| 29 | +that timestamp, every endpoint that re-validates the |
| 30 | +`client_id` (`/authorize`, `/token` for both grant types) |
| 31 | +rejects with `invalid_client: client registration expired`. |
| 32 | + |
| 33 | +The lifetime is the `CLIENT_REGISTRATION_TTL` env var, default |
| 34 | +**7 days** (matches `refreshTokenTTL` so a client holding a |
| 35 | +still-valid refresh can always exchange it). Cap is 90 days. |
| 36 | + |
| 37 | +## Response |
| 38 | + |
| 39 | +### Client side |
| 40 | + |
| 41 | +The MCP client must re-register: re-run DCR (`POST /register`) |
| 42 | +to obtain a fresh `client_id` + retry the OAuth flow from |
| 43 | +`/authorize`. Most MCP clients do this on a fresh connection, |
| 44 | +so a restart of the client is the simplest fix. |
| 45 | + |
| 46 | +### Operator side |
| 47 | + |
| 48 | +If users are hitting this faster than `CLIENT_REGISTRATION_TTL` |
| 49 | +suggests they should: |
| 50 | + |
| 51 | +1. **Verify `CLIENT_REGISTRATION_TTL` is what you think.** |
| 52 | + `kubectl exec ... -- env | grep CLIENT_REGISTRATION_TTL`. |
| 53 | +2. **Check `REVOKE_BEFORE`.** `REVOKE_BEFORE` rejects access |
| 54 | + tokens whose `iat` predates the cutoff, but does NOT shorten |
| 55 | + `client_id` lifetime. If both fire on the same flow, the |
| 56 | + user sees `invalid_client` (client_id check runs first); the |
| 57 | + actual cause may be the token cutoff. Logs disambiguate. |
| 58 | +3. **Lengthen the TTL** for long-running deployments by setting |
| 59 | + `CLIENT_REGISTRATION_TTL=720h` (30d) or up to the 90d cap. |
| 60 | +4. **Consider Option 4** (auto-extend `client_id` on each |
| 61 | + `/token` use) — see `misc/next-steps.md`. Not yet |
| 62 | + implemented as of this writing. |
| 63 | + |
| 64 | +## Rolling-deploy transient |
| 65 | + |
| 66 | +Bumping `CLIENT_REGISTRATION_TTL` does **NOT** retroactively |
| 67 | +extend already-issued `client_id`s. The TTL is sealed into the |
| 68 | +encrypted payload at registration time. Existing clients |
| 69 | +running on the old TTL keep that TTL until they re-register, |
| 70 | +no matter what the env var says now. Plan accordingly: |
| 71 | +- A deploy that raises the TTL takes effect immediately for |
| 72 | + newly-registered clients. |
| 73 | +- Already-affected users won't be unblocked until their MCP |
| 74 | + client re-registers (manual restart, or running long enough |
| 75 | + to hit any other re-registration trigger). |
| 76 | + |
| 77 | +## What NOT to do |
| 78 | + |
| 79 | +- **Don't disable `CLIENT_REGISTRATION_TTL` checks.** The TTL |
| 80 | + bounds the residual reach of an exfiltrated `client_id` |
| 81 | + (which is unauthenticated metadata sent in the clear on |
| 82 | + `/authorize`). A 0 or near-infinite value silently extends |
| 83 | + that window. |
| 84 | +- **Don't try to extend an existing `client_id` server-side.** |
| 85 | + The sealed payload is immutable. The only way to extend is |
| 86 | + re-issuing a fresh `client_id`. |
| 87 | +- **Don't increase past 90d** (capped at startup). Wider |
| 88 | + windows add no operational value once the auto-extend design |
| 89 | + ships, and increase the residual-reuse threat in the |
| 90 | + meantime. |
| 91 | + |
| 92 | +## Prevention |
| 93 | + |
| 94 | +- **Default `CLIENT_REGISTRATION_TTL` matches refresh-token |
| 95 | + lifetime (7d).** A client that successfully refreshes its |
| 96 | + access token at least every 7d cycles its session before |
| 97 | + the `client_id` envelope lapses. Long-idle clients are |
| 98 | + the failure mode this runbook catches. |
| 99 | +- **Document re-registration in the MCP client's own UX.** |
| 100 | + Out of scope for the proxy; in scope for the client author |
| 101 | + if the client expects long-lived sessions without |
| 102 | + intervention. |
0 commit comments