Skip to content

Commit c8d495d

Browse files
authored
docs(security): STRIDE threat-model coverage matrix (#23)
New docs/threat-model.md enumerates 16 threats across STRIDE with one row each linking the mitigation to code, test, and (where applicable) runbook. Closes T1.4 from misc/next-steps.md and turns "we thought about this" into a reviewable artifact. Includes an honest out-of-scope section (IdP compromise, browser XSS, MITM with TLS verify off, side-channel timing, post-quantum, operator-bypass-of-PROD_MODE, per-mount RBAC) so future work can pick up gaps explicitly rather than assuming coverage. Closing section names how to keep the doc in sync during review: when an auth/authz path changes, the matrix row's Mitigation column must keep matching the code, or the row needs a doc update in the same PR. Cross-linked from README "Standards conformance" section, from specs.md companion-docs line at the top, and from main.go's /register rate-limit constant comment (row 1 is the load-bearing DCR-abuse mitigation).
1 parent ad18290 commit c8d495d

4 files changed

Lines changed: 96 additions & 1 deletion

File tree

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,9 @@ Drop this in front, point it at your existing IdP, done.
7676

7777
Full design notes live in [`specs.md`](./specs.md). The implementation
7878
claim matrix, compatibility notes, and current IdP evidence live in
79-
[`docs/conformance.md`](./docs/conformance.md).
79+
[`docs/conformance.md`](./docs/conformance.md). The STRIDE-style
80+
threat coverage (one row per threat → code + test + runbook) lives
81+
in [`docs/threat-model.md`](./docs/threat-model.md).
8082

8183
---
8284

docs/threat-model.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Threat model
2+
3+
Auditable mapping from identified threats to the code, tests, and
4+
runbooks that mitigate them. Companion to [`specs.md`](../specs.md)
5+
(what the proxy implements) and [`conformance.md`](conformance.md)
6+
(which RFCs it claims). Goal: when a reviewer asks *"what's the
7+
worst case here?"*, the answer is on a row in this document with a
8+
file path, a test name, and an operational runbook.
9+
10+
Categories use STRIDE: **S**poofing, **T**ampering, **R**epudiation,
11+
**I**nformation disclosure, **D**enial of service, **E**levation of
12+
privilege.
13+
14+
## Coverage matrix
15+
16+
| # | STRIDE | Threat | Mitigation | Code | Test | Runbook |
17+
|---|---|---|---|---|---|---|
18+
| 1 | T / I | **DCR abuse** — mass client registration to exhaust quota or phish via attacker-controlled `client_name` | Per-IP rate limit on `/register` (10/min default); control-byte strip on `client_name` at registration; `mcp_auth_clients_registered_total` for trend monitoring | `handlers/register.go`, `main.go` (`registerLimit`) | `handlers/handlers_test.go` (`TestRegister_*`) ||
19+
| 2 | S | **Active-IdP-session phishing** — malicious DCR client + an active IdP session = silent token issuance without user interaction | Proxy-rendered consent page (default `RENDER_CONSENT_PAGE=true`); per-render `jti` makes the consent token single-use (T1.2); `client_name` rendered via `html/template` contextual escaping; CSP locks the page (no JS, no remote subresources) | `handlers/authorize.go` (consent fork), `handlers/consent.go` | `handlers/consent_test.go`, `handlers/single_use_replay_test.go` | `docs/runbooks/consent-denials.md` |
20+
| 3 | T | **Redirect-URI / open-redirect abuse** — crafted `redirect_uri` matches loosely or smuggles attacker host | Exact-match against registered URIs with RFC 8252 loopback-port relaxation; fragment-bearing URIs rejected at DCR; fragment scrubbed again on the success redirect (defense in depth) | `handlers/helpers.go` (`redirectURIMatches`), `handlers/register.go`, `handlers/callback.go` (fragment scrub) | `handlers/handlers_test.go` (redirect tests) ||
21+
| 4 | S | **Authorization-code replay** — captured authorization code submitted twice within its TTL | `replay.Store.ClaimOnce` on `TokenID` at `/token`, run after parameter validation but before token issuance so a malformed legitimate retry doesn't burn the code; on detected replay the family of refresh tokens seeded by that code is revoked (RFC 6749 §4.1.2 MUST) | `handlers/token.go` (`handleAuthorizationCode`), `replay/redis.go` | `handlers/handlers_test.go` (replay tests) | `docs/runbooks/redis-outage.md` |
22+
| 5 | S | **Refresh-token replay / family compromise** — captured refresh submitted after legit rotation | Atomic Lua `ClaimOrCheckFamily` on Redis: claim + family revoke happen as one EVAL so the invariant "alreadyClaimed ⇒ family revoked" cannot be violated by a client disconnect or Redis blip | `handlers/token.go` (`handleRefreshToken`), `replay/redis.go` | `handlers/handlers_test.go` (refresh tests), `replay/redis_test.go` | `docs/runbooks/revoke.md` |
23+
| 6 | S | **Consent-token replay** — captured `consent_token` POSTed twice within its 5-min TTL | `JTI` `ClaimOnce` at POST `/consent` before either Approve or Deny; per-render `JTI` so back-button = re-consent (a new claim slot) | `handlers/consent.go` | `handlers/single_use_replay_test.go` (`TestConsent_SingleUse_*`) ||
24+
| 7 | S | **Callback-state replay** — captured `/callback` URL replayed (e.g. attacker-observable redirect) | `SessionID` `ClaimOnce` BEFORE the upstream OIDC token-endpoint exchange — replay never fans out to the IdP and never produces audit-log noise | `handlers/callback.go` | `handlers/single_use_replay_test.go` (`TestCallback_SingleUse_*`) ||
25+
| 8 | D | **Redis outage** — replay store unreachable | Fail-closed at every claim site: 503 `server_error` + `error_code=replay_store_unavailable`. `mcp_auth_access_denied_total{reason="replay_store_unavailable"}` is the single alerting source. `PROD_MODE=true` validates `REDIS_REQUIRED=true` + `REDIS_URL` set at startup | `replay/redis.go`, `config/config.go` (`Validate`), every claim site | `config/config_test.go` (production posture) | `docs/runbooks/redis-outage.md` |
26+
| 9 | D | **IdP outage** — upstream OIDC unavailable | 10s context timeout on `oauth2Cfg.Exchange`; surfaces 502 `server_error`; readiness probe lives on the metrics port only so an unauthenticated public-listener flood cannot flip every replica out of the K8s Service | `handlers/callback.go` (`oidcExchangeTTL`), `main.go` (port split) | `handlers/handlers_test.go` (callback timeout tests) | `docs/runbooks/idp-outage.md` |
27+
| 10 | S | **XFF / proxy-header spoofing**`X-Forwarded-For` set by an attacker to bypass per-IP rate limiting | `TRUSTED_PROXY_CIDRS` allowlist scopes which upstream hops can set forwarding headers; `TRUSTED_PROXY_HEADER` names the trusted header; `PROD_MODE=true` rejects `TRUST_PROXY_HEADERS=true` without a `TRUSTED_PROXY_CIDRS` allowlist | `config/config.go` (`TrustedProxyCIDRs`), `main.go` (`ipKeyFunc`) | `config/config_test.go` (XFF cases) ||
28+
| 11 | R | **Multi-replica drift** — two proxy replicas issue inconsistent decisions on the same code, refresh, or callback state | All claim sites use a shared Redis store; `FamilyIssuedAt` + `REVOKE_BEFORE` propagate bulk-cutoff revocations across replicas without per-replica state | `replay/redis.go`, `token/` (`FamilyIssuedAt`) | `replay/redis_test.go` | `docs/runbooks/key-rotation.md` |
29+
| 12 | D | **Sealed-input parse exhaustion** — oversized sealed blob slows `token.Open()` under load | Per-input length cap on every sealed-token open call (PR #18); 1 MB body cap via `MaxBytesReader` on `/token`, `/consent`, `/register` | `token/seal.go`, every handler entry point | `token/seal_test.go` (`FuzzOpenJSON`, `FuzzValidate`) ||
30+
| 13 | I | **`id_token` claim-shape drift** — IdP schema migration silently bypasses `ALLOWED_GROUPS` via empty-groups admit | Distinct `mcp_auth_groups_claim_shape_mismatch_total` counter and warn-level log on every shape mismatch — surfaces drift before it cascades into a `group` denial spike. Empty groups still trigger the standard `group` denial when `ALLOWED_GROUPS` is non-empty | `handlers/callback.go` | `handlers/handlers_test.go` (group-shape tests) ||
31+
| 14 | I | **IdP `error_description` phishing** — crafted `/callback?error=…&error_description=<phishing text>` reflects on the proxy origin or in a legit MCP-client error UI | Fixed proxy-owned description on BOTH the validated-session redirect and the no-session JSON paths; RFC 6749 §4.1.2.1 `error` allowlist (anything else collapses to `server_error`) | `handlers/callback.go` (idpError branch) | `handlers/handlers_test.go` (`TestCallback_OIDCError_*`) ||
32+
| 15 | E | **PKCE downgrade** — client without `code_challenge` at `/authorize` then supplies `code_verifier` at `/token`, papering over the missing PKCE binding | When `PKCE_REQUIRED=false` AND the client omitted `code_challenge`, the proxy mints its own server-side PKCE pair (H6) so the issued code is anchored to a verifier; `/token` rejects `code_verifier` against a code with no challenge (RFC 9700 §4.8.2) | `handlers/authorize.go`, `handlers/consent.go`, `handlers/token.go` | `handlers/handlers_test.go` (PKCE tests) ||
33+
| 16 | T | **Signing-key compromise / bulk revoke** — proxy signing material exfiltrated, in-flight tokens still valid | `REVOKE_BEFORE` env var: any access or refresh token sealed before the cutoff is rejected by `middleware/auth.go` and `handlers/token.go` regardless of whether the replay store has seen it. Operational pattern: rotate the signing key, set `REVOKE_BEFORE` to the cutoff timestamp, redeploy | `token/`, `middleware/auth.go`, `handlers/token.go` | `middleware/auth_test.go` | `docs/runbooks/key-rotation.md` |
34+
35+
## Out of scope / accepted residual risk
36+
37+
These are documented gaps — the proxy does not claim to defend
38+
against them. Listed so future work can pick them up explicitly
39+
rather than assuming they're already covered.
40+
41+
- **IdP compromise.** If Keycloak / Entra / Auth0 is owned, the
42+
proxy still trusts what it says. The signed `id_token`
43+
verification, `nonce` echo, and audience checks all assume the
44+
IdP itself is honest. Out of scope for this proxy; in scope for
45+
the IdP operator's own threat model.
46+
- **Browser-side XSS in the consent page.** The page is JS-free and
47+
CSP-locked (`default-src 'none'`, `style-src 'unsafe-inline'`,
48+
`script-src` defaults to none, `frame-ancestors 'none'`). A
49+
browser-engine bug that escapes contextual HTML escaping is not
50+
separately mitigated.
51+
- **Network-level MITM between proxy and IdP.** TLS verification is
52+
on by default in the `oauth2` library; an operator who disables
53+
it (or a CA compromise) lets a MITM observe the upstream code
54+
exchange. Standard PKI assumption.
55+
- **Side-channel timing on AES-GCM.** Go's `crypto/aes` is constant
56+
time on architectures with AES-NI (the production target). Not
57+
separately tested.
58+
- **Quantum / cryptanalytic attacks on AES-GCM-256 or SHA-256.**
59+
The proxy's sealed-token format is conventional symmetric crypto
60+
(AES-GCM for authenticated encryption; SHA-256 for key derivation
61+
and PKCE-S256). Not in scope.
62+
- **Operator misconfiguration that bypasses `PROD_MODE`.** The
63+
validation rejects unsafe combinations *when `PROD_MODE=true`*.
64+
An operator who runs in production with `PROD_MODE=false` and a
65+
permissive config gets the legacy posture by their own choice.
66+
The CI manifest gate (T1.3) blocks this for the shipped overlay
67+
but cannot block a hand-rolled deployment.
68+
- **Per-resource-mount RBAC beyond `ALLOWED_GROUPS`.** The proxy
69+
enforces a single group allowlist for the whole mount. Per-tool
70+
or per-method RBAC inside the MCP server itself is the
71+
responsibility of that server.
72+
73+
## How to use this document during review
74+
75+
When a code change touches an authentication or authorization path:
76+
77+
1. Identify which row(s) in the matrix the change sits under.
78+
2. Check that the **Mitigation** column still describes what the
79+
code does — if not, the row needs an update in the same PR.
80+
3. If the change introduces a new threat that doesn't map to an
81+
existing row, add a row (with **Code** + **Test** + **Runbook**
82+
columns filled in) before merge.
83+
4. If a row's **Code** path moves, update the link.
84+
5. If a row gains a test, update the **Test** column.
85+
86+
The matrix is load-bearing: it's the artifact a security reviewer
87+
or auditor reads first. Drift in this document means drift in the
88+
project's claimed posture.

main.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -298,6 +298,9 @@ func main() {
298298
mcpLimit := passthrough
299299
discoveryLimit := passthrough
300300
if cfg.RateLimitEnabled {
301+
// /register's 10/min cap is the load-bearing mitigation in
302+
// docs/threat-model.md row 1 (DCR abuse). Keep this in sync
303+
// with that document if the value changes.
301304
registerLimit = rateLimiter(10, time.Minute, "register", ipKeyFunc)
302305
authorizeLimit = rateLimiter(30, time.Minute, "authorize", ipKeyFunc)
303306
// /consent has its own bucket so the user-driven approve/deny

specs.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44

55
Reverse proxy in Go acting as an OAuth 2.1 Authorization Server compatible with the MCP auth spec, federating authentication to any OIDC Identity Provider (Keycloak, Microsoft Entra ID, Auth0, Okta, Google...) via auto-discovery. It lets MCP clients (claude.ai, Claude Code) access a private MCP server through a standard PKCE flow.
66

7+
Companion docs: [`docs/conformance.md`](docs/conformance.md) (RFC claim matrix + IdP evidence), [`docs/threat-model.md`](docs/threat-model.md) (STRIDE coverage with code + test + runbook links).
8+
79
---
810

911
## Standards Conformance

0 commit comments

Comments
 (0)