Skip to content

feat(callback): outbound rate-bucket on the IdP token exchange#26

Merged
babs merged 1 commit into
masterfrom
feat/outbound-rate-bucket
Apr 27, 2026
Merged

feat(callback): outbound rate-bucket on the IdP token exchange#26
babs merged 1 commit into
masterfrom
feat/outbound-rate-bucket

Conversation

@babs
Copy link
Copy Markdown
Owner

@babs babs commented Apr 27, 2026

Summary

Closes T2.2 from `misc/next-steps.md`. Adds a defense-in-depth outbound rate-limit on the proxy → IdP token-endpoint exchange (the second leg of `/callback`).

A flood of `/callback` hits that slips past the per-IP limiter (distributed sources, permissive XFF trust matrix) is bounded by the token bucket before reaching the IdP — preventing a remote-traffic spike from becoming the IdP's problem (and then cascading back as the proxy's problem when the IdP starts dropping requests).

Behavior

  • Atomic, non-blocking: `golang.org/x/time/rate.Limiter.Allow()`. `Wait()` would tie up handler goroutines on the IdP's behalf and turn an IdP outage into a proxy outage.
  • On deny: 503 `temporarily_unavailable` + `error_code=idp_exchange_throttled` + `Retry-After: 1`; increments `mcp_auth_idp_exchange_throttled_total`. The 503 + `temporarily_unavailable` shape matches RFC 6749 §4.1.2.1 for AS overload.
  • Ordered before the SessionID replay claim so a transiently-throttled legitimate user can retry the same `/callback` URL without burning their claim slot. Replay defense (per-state) and throttling (global) are orthogonal — placing throttle first keeps the UX safe under bursts.
  • Default OFF: operators opt in by setting `IDP_EXCHANGE_RATE_PER_SEC > 0` (optional `IDP_EXCHANGE_BURST`, default 50).
  • Per-replica scope: in-process limiter. An `N`-replica deployment admits up to `N × IDP_EXCHANGE_RATE_PER_SEC` to the IdP — divide your IdP-side ceiling by replica count when sizing.

Tests

  • `TestCallback_IdPExchangeThrottled` — drained limiter denies, 503 + correct error_code + Retry-After + metric +1, verifyFunc never runs.
  • `TestCallback_IdPExchangeNoLimiter` — back-compat: nil limiter, no throttle short-circuit.
  • 5 config tests: default-disabled, custom rate+burst, negative reject, non-number reject, burst-must-be->=1.

Docs

  • `README.md`: new env-var entries (`IDP_EXCHANGE_RATE_PER_SEC`, `IDP_EXCHANGE_BURST`) under configuration; new `mcp_auth_idp_exchange_throttled_total` entry under observability.
  • `docs/threat-model.md` row 9 (IdP outage): rate-bucket mitigation added, per-replica scope spelled out.
  • `docs/runbooks/idp-outage.md`: new "IdP overload — proxy → IdP rate-bucket" section with a 4-point tuning playbook.

Test plan

  • CI green (`test`, `lint`, `keycloak-e2e`, `fuzz-smoke`, `manifest-prod`, `govulncheck`, `build`).
  • No regression — existing tests pass `CallbackConfig{}` (zero-value, limiter nil) and get the prior unthrottled behavior. Backwards-compatible.
  • Manual: set `IDP_EXCHANGE_RATE_PER_SEC=1` + `IDP_EXCHANGE_BURST=1` in the demo stack and verify a second `/callback` hit within 1s returns 503 + `error_code=idp_exchange_throttled`.

Closes T2.2 from misc/next-steps.md. Adds a defense-in-depth
outbound rate-limit on the proxy → IdP token-endpoint exchange
(the second leg of /callback). When wired, a flood of /callback
hits that slips past the per-IP limiter (distributed sources,
permissive XFF trust matrix) is bounded by the token bucket
before reaching the IdP — preventing a remote-traffic spike from
becoming the IdP's problem (and then cascading back as the
proxy's problem when the IdP starts dropping requests).

  - golang.org/x/time/rate.Limiter, atomic Allow() (non-blocking).
    Wait() would tie up handler goroutines on the IdP's behalf
    and turn an IdP outage into a proxy outage.

  - On deny: 503 temporarily_unavailable + error_code
    idp_exchange_throttled + Retry-After: 1, increments
    mcp_auth_idp_exchange_throttled_total. The 503 +
    temporarily_unavailable shape matches RFC 6749 §4.1.2.1 for
    AS overload.

  - Limiter check runs BEFORE the SessionID replay claim so a
    transiently-throttled legitimate user can retry the same
    /callback URL without burning their claim slot — replay
    defense (per-state) and throttling (global) are orthogonal,
    placing throttle first keeps the UX safe under bursts.

  - Default OFF: operators opt in by setting
    IDP_EXCHANGE_RATE_PER_SEC > 0 (with optional
    IDP_EXCHANGE_BURST, default 50). Per-replica scope —
    operators sizing the env var multiply by replica count to
    get the IdP-side ceiling.

  - Tests: throttled (denies, metric +1, verify never runs);
    no-limiter (back-compat, pre-T2.2 behavior unchanged); 5
    config tests (default-disabled, custom rate+burst, negative
    reject, non-number reject, burst-must-be->=1).

Docs: README env vars + observability metric entry,
docs/threat-model.md row 9 (IdP outage) extended with the rate-
bucket mitigation + per-replica note, docs/runbooks/idp-outage.md
gained a tuning playbook section.
@babs babs merged commit d512047 into master Apr 27, 2026
7 checks passed
@babs babs deleted the feat/outbound-rate-bucket branch April 27, 2026 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant