Skip to content

Commit 2597d5f

Browse files
authored
fix(register): make client_id TTL configurable, default 7d (#28)
The sealed client_id minted by POST /register previously used a hardcoded 24h TTL. Refresh tokens last 7 days, so any MCP client running longer than 24h failed its first refresh-token rotation with `invalid_client: client registration expired`, even with a still-valid refresh token. Most MCP clients (codex_rmcp_client, Claude Code, Cursor, …) treat DCR as one-shot at startup and surface the error to the user instead of re-registering. Fix: - Default TTL bumped to 7 days, matching refreshTokenTTL so the sealed client_id always covers a full refresh-token cycle. - New CLIENT_REGISTRATION_TTL env var lets operators raise or lower the envelope. Go duration syntax; capped at 90d at startup (wider windows extend the residual-reuse threat of an exfiltrated client_id with no operational upside). - Register handler signature gains a clientTTL parameter. All call sites (main.go + tests) pass either the wired cfg.ClientRegistrationTTL or DefaultClientTTL. - Regression test (TestTokenRefresh_ClientLifecycleAroundTTL) covers three boundary points: 48h-old (the bug repro), 6d23h-old (just inside the 7d envelope), and 1m past expiry (must reject). - 5 config tests for the env var: default, custom, non- positive reject, above-cap reject, bad-format reject (the "7d" case — time.ParseDuration doesn't accept the d suffix; documented). Rolling-deploy note: the TTL is sealed into each client_id at registration time. Bumping the env var only affects newly- issued client_ids; existing registrations stay on whatever TTL was in effect when they were minted. Documented in the new docs/runbooks/client-registration-expired.md.
1 parent 4f33f36 commit 2597d5f

10 files changed

Lines changed: 305 additions & 30 deletions

File tree

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,7 @@ All configuration via **environment variables**. Bold = required.
140140
| `REDIS_REQUIRED` | `true` | Fail startup when `REDIS_URL` is unset. Set `false` only for dev / single-replica; stateless mode leaves codes/refresh replayable within TTL |
141141
| `REDIS_KEY_PREFIX` | `mcp-auth-proxy:` | Key prefix for shared Redis; set to empty to opt out of namespacing |
142142
| `REFRESH_RACE_GRACE_SEC` | `2` | Grace window in seconds during which a refresh-rotation collision is treated as a benign concurrent submit (parallel-tab refresh, slow-network double-submit) and returns 429 `refresh_concurrent_submit` without revoking the family. Outside the window every collision still revokes. Range `[0, 10]`; `0` disables (every collision = reuse). The 10s ceiling is a security cap — wider windows are statistically attacker-shaped |
143+
| `CLIENT_REGISTRATION_TTL` | `168h` (7d) | Lifetime of a sealed `client_id` minted by `POST /register`. Default matches the 7-day refresh-token TTL so a client holding a still-valid refresh can always exchange it; a shorter value silently kills long-running MCP clients (which treat DCR as one-shot at startup) the moment their access token first expires. Go duration syntax (`168h`, `720h`, …); capped at 90d. **Rolling-deploy note:** the TTL is sealed into each `client_id` at registration time, so bumping this env var only affects newly-issued client_ids — existing registrations stay on whatever TTL was in effect when they were minted. See [`docs/runbooks/client-registration-expired.md`](./docs/runbooks/client-registration-expired.md) |
143144
| `IDP_EXCHANGE_RATE_PER_SEC` | _(empty / disabled)_ | Cap on outbound proxy → IdP token-endpoint requests at `/callback`. Defense in depth: a flood of `/callback` hits that slips past the per-IP limiter (distributed sources, permissive XFF trust matrix) is bounded by this token bucket before reaching the IdP. Denied requests get 503 `temporarily_unavailable` + `error_code=idp_exchange_throttled` + `Retry-After: 1`. Set to a positive number (e.g. `20`) to enable |
144145
| `IDP_EXCHANGE_BURST` | `50` | Burst size for the IdP-exchange limiter when `IDP_EXCHANGE_RATE_PER_SEC > 0`. Higher burst absorbs a short spike (e.g. a deploy-time reconnect storm) without 503s; lower burst keeps the ceiling tighter. Ignored when `IDP_EXCHANGE_RATE_PER_SEC` is unset/zero (limiter is not constructed) |
145146
| `RATE_LIMIT_ENABLED` | `true` | Per-IP rate limiting on pre-auth endpoints and on the authenticated MCP route. Disable only behind a WAF that already enforces it |
@@ -173,7 +174,7 @@ payloads alone remain replayable within their TTL.
173174

174175
| Flow state | Encrypted into | TTL |
175176
|---|---|---|
176-
| Client registration | `client_id` | 24h |
177+
| Client registration | `client_id` | 7d (configurable via `CLIENT_REGISTRATION_TTL`) |
177178
| Authorize session | IdP `state` parameter | 10min |
178179
| Authorization code | `code` parameter | 60s |
179180
| Access token | Opaque bearer | 1h |

config/config.go

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,15 @@ type Config struct {
7272
// IdPExchangeBurst is the burst size for the IdP-exchange limiter
7373
// when IdPExchangeRatePerSec > 0. env: IDP_EXCHANGE_BURST.
7474
IdPExchangeBurst int
75+
// ClientRegistrationTTL is the lifetime of a sealed client_id
76+
// minted by POST /register. Defaults to 7 days so the envelope
77+
// always covers a full refresh-token cycle — a shorter value
78+
// would silently kill long-running MCP clients (which treat DCR
79+
// as one-shot at startup) the moment their access token first
80+
// expired. env: CLIENT_REGISTRATION_TTL (Go duration: 168h, 7d-
81+
// equivalent values like 168h0m0s, etc.; "d" suffix not supported
82+
// by time.ParseDuration).
83+
ClientRegistrationTTL time.Duration
7584
// CompatAllowStateless keeps the legacy Cursor/MCP Inspector behavior of
7685
// accepting /authorize requests without a client-supplied state. Default
7786
// false — strict mode refuses stateless requests so the client cannot
@@ -352,6 +361,27 @@ func Load() (*Config, error) {
352361
c.IdPExchangeBurst = n
353362
}
354363

364+
// CLIENT_REGISTRATION_TTL: how long a sealed client_id stays
365+
// openable. Default 7d so a client holding a still-valid
366+
// refresh token (refreshTokenTTL = 7d) can always exchange it.
367+
// Hard cap at 90d — a longer envelope materially extends the
368+
// window an exfiltrated client_id (which is unauthenticated
369+
// metadata) can be reused.
370+
c.ClientRegistrationTTL = 7 * 24 * time.Hour
371+
if raw := os.Getenv("CLIENT_REGISTRATION_TTL"); raw != "" {
372+
d, err := time.ParseDuration(raw)
373+
if err != nil {
374+
return nil, fmt.Errorf("CLIENT_REGISTRATION_TTL must be a Go duration (e.g. 168h, 24h, 720h): %w", err)
375+
}
376+
if d <= 0 {
377+
return nil, fmt.Errorf("CLIENT_REGISTRATION_TTL must be positive, got %s", d)
378+
}
379+
if d > 90*24*time.Hour {
380+
return nil, fmt.Errorf("CLIENT_REGISTRATION_TTL exceeds 90d cap, got %s", d)
381+
}
382+
c.ClientRegistrationTTL = d
383+
}
384+
355385
if ag := os.Getenv("ALLOWED_GROUPS"); ag != "" {
356386
for _, g := range strings.Split(ag, ",") {
357387
if g = strings.TrimSpace(g); g != "" {

config/config_test.go

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -361,6 +361,56 @@ func TestLoad_IdPExchangeBurst_RejectsZero(t *testing.T) {
361361
}
362362
}
363363

364+
func TestLoad_ClientRegistrationTTL_Default(t *testing.T) {
365+
setAllRequired(t)
366+
cfg, err := Load()
367+
if err != nil {
368+
t.Fatalf("unexpected error: %v", err)
369+
}
370+
if cfg.ClientRegistrationTTL != 7*24*time.Hour {
371+
t.Errorf("ClientRegistrationTTL default = %v, want 168h (7d)", cfg.ClientRegistrationTTL)
372+
}
373+
}
374+
375+
func TestLoad_ClientRegistrationTTL_Custom(t *testing.T) {
376+
setAllRequired(t)
377+
t.Setenv("CLIENT_REGISTRATION_TTL", "720h")
378+
cfg, err := Load()
379+
if err != nil {
380+
t.Fatalf("unexpected error: %v", err)
381+
}
382+
if cfg.ClientRegistrationTTL != 720*time.Hour {
383+
t.Errorf("ClientRegistrationTTL = %v, want 720h", cfg.ClientRegistrationTTL)
384+
}
385+
}
386+
387+
func TestLoad_ClientRegistrationTTL_RejectsNonPositive(t *testing.T) {
388+
setAllRequired(t)
389+
t.Setenv("CLIENT_REGISTRATION_TTL", "0s")
390+
_, err := Load()
391+
if err == nil || !strings.Contains(err.Error(), "positive") {
392+
t.Fatalf("want positive rejection, got %v", err)
393+
}
394+
}
395+
396+
func TestLoad_ClientRegistrationTTL_RejectsAboveCap(t *testing.T) {
397+
setAllRequired(t)
398+
t.Setenv("CLIENT_REGISTRATION_TTL", "2400h") // 100 days, > 90d cap
399+
_, err := Load()
400+
if err == nil || !strings.Contains(err.Error(), "90d cap") {
401+
t.Fatalf("want 90d-cap rejection, got %v", err)
402+
}
403+
}
404+
405+
func TestLoad_ClientRegistrationTTL_RejectsBadFormat(t *testing.T) {
406+
setAllRequired(t)
407+
t.Setenv("CLIENT_REGISTRATION_TTL", "7d") // time.ParseDuration doesn't accept "d"
408+
_, err := Load()
409+
if err == nil || !strings.Contains(err.Error(), "Go duration") {
410+
t.Fatalf("want format-error, got %v", err)
411+
}
412+
}
413+
364414
func TestLoad_PKCERequired_Default(t *testing.T) {
365415
setAllRequired(t)
366416
cfg, err := Load()
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# Runbook — `invalid_client: client registration expired`
2+
3+
When an MCP client (Claude Code, codex_rmcp_client, Cursor, …)
4+
fails with:
5+
6+
```
7+
OAuth token refresh failed: Server returned error response:
8+
invalid_client: client registration expired
9+
```
10+
11+
it has been alive longer than the sealed `client_id`'s TTL.
12+
DCR is one-shot for most MCP clients, so they don't auto-
13+
re-register and the user sees the error directly.
14+
15+
## Signals
16+
17+
- Per-client log line: `client_registration_expired`
18+
(`handlers/helpers.go`, fired by `openAndValidateClient`).
19+
- Counter: `mcp_auth_access_denied_total{reason="invalid_client"}`
20+
with `error_description="client registration expired"` in the
21+
log line that accompanies it.
22+
- User report: "MCP server stopped working after a few days of
23+
uptime, restart fixes it."
24+
25+
## Why it happens
26+
27+
The sealed `client_id` returned by `POST /register` has a
28+
lifetime baked into its encrypted payload (`ExpiresAt`). After
29+
that timestamp, every endpoint that re-validates the
30+
`client_id` (`/authorize`, `/token` for both grant types)
31+
rejects with `invalid_client: client registration expired`.
32+
33+
The lifetime is the `CLIENT_REGISTRATION_TTL` env var, default
34+
**7 days** (matches `refreshTokenTTL` so a client holding a
35+
still-valid refresh can always exchange it). Cap is 90 days.
36+
37+
## Response
38+
39+
### Client side
40+
41+
The MCP client must re-register: re-run DCR (`POST /register`)
42+
to obtain a fresh `client_id` + retry the OAuth flow from
43+
`/authorize`. Most MCP clients do this on a fresh connection,
44+
so a restart of the client is the simplest fix.
45+
46+
### Operator side
47+
48+
If users are hitting this faster than `CLIENT_REGISTRATION_TTL`
49+
suggests they should:
50+
51+
1. **Verify `CLIENT_REGISTRATION_TTL` is what you think.**
52+
`kubectl exec ... -- env | grep CLIENT_REGISTRATION_TTL`.
53+
2. **Check `REVOKE_BEFORE`.** `REVOKE_BEFORE` rejects access
54+
tokens whose `iat` predates the cutoff, but does NOT shorten
55+
`client_id` lifetime. If both fire on the same flow, the
56+
user sees `invalid_client` (client_id check runs first); the
57+
actual cause may be the token cutoff. Logs disambiguate.
58+
3. **Lengthen the TTL** for long-running deployments by setting
59+
`CLIENT_REGISTRATION_TTL=720h` (30d) or up to the 90d cap.
60+
4. **Consider Option 4** (auto-extend `client_id` on each
61+
`/token` use) — see `misc/next-steps.md`. Not yet
62+
implemented as of this writing.
63+
64+
## Rolling-deploy transient
65+
66+
Bumping `CLIENT_REGISTRATION_TTL` does **NOT** retroactively
67+
extend already-issued `client_id`s. The TTL is sealed into the
68+
encrypted payload at registration time. Existing clients
69+
running on the old TTL keep that TTL until they re-register,
70+
no matter what the env var says now. Plan accordingly:
71+
- A deploy that raises the TTL takes effect immediately for
72+
newly-registered clients.
73+
- Already-affected users won't be unblocked until their MCP
74+
client re-registers (manual restart, or running long enough
75+
to hit any other re-registration trigger).
76+
77+
## What NOT to do
78+
79+
- **Don't disable `CLIENT_REGISTRATION_TTL` checks.** The TTL
80+
bounds the residual reach of an exfiltrated `client_id`
81+
(which is unauthenticated metadata sent in the clear on
82+
`/authorize`). A 0 or near-infinite value silently extends
83+
that window.
84+
- **Don't try to extend an existing `client_id` server-side.**
85+
The sealed payload is immutable. The only way to extend is
86+
re-issuing a fresh `client_id`.
87+
- **Don't increase past 90d** (capped at startup). Wider
88+
windows add no operational value once the auto-extend design
89+
ships, and increase the residual-reuse threat in the
90+
meantime.
91+
92+
## Prevention
93+
94+
- **Default `CLIENT_REGISTRATION_TTL` matches refresh-token
95+
lifetime (7d).** A client that successfully refreshes its
96+
access token at least every 7d cycles its session before
97+
the `client_id` envelope lapses. Long-idle clients are
98+
the failure mode this runbook catches.
99+
- **Document re-registration in the MCP client's own UX.**
100+
Out of scope for the proxy; in scope for the client author
101+
if the client expects long-lived sessions without
102+
intervention.

e2e_test.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ func buildTestProxy(t *testing.T, oidcProvider *mockOIDCProvider, mcpServer *moc
205205

206206
r := chi.NewRouter()
207207
registerDiscoveryRoutes(r, proxyBaseURL, "/mcp", "", nil)
208-
r.Post("/register", handlers.Register(tm, zap.NewNop(), proxyBaseURL))
208+
r.Post("/register", handlers.Register(tm, zap.NewNop(), proxyBaseURL, handlers.DefaultClientTTL))
209209
r.Get("/authorize", handlers.Authorize(tm, zap.NewNop(), proxyBaseURL, oauth2Cfg, handlers.AuthorizeConfig{
210210
PKCERequired: true,
211211
CanonicalResource: proxyBaseURL + "/mcp",

0 commit comments

Comments
 (0)