Skip to content

Auth: refresh tokens with rotation (replaces in-memory blacklist) #658

@lbedner

Description

@lbedner

Context

The current auth flow has two cracks:

  1. The blacklist is in-memory. app/services/auth/token_blacklist.py
    keeps revoked jtis in a process-local dict. Restart the server and
    every previously-revoked token is valid again until its natural
    ACCESS_TOKEN_EXPIRE_MINUTES window closes. Logout durability is
    fiction across deploys.
  2. Sessions die every 30 minutes. Default ACCESS_TOKEN_EXPIRE_MINUTES=30,
    no refresh, so users get bounced to /login twice an hour. Real-world UX is
    meaningfully worse than what cookie-session apps deliver.

You can fix #1 by persisting the blacklist (DB or Redis), but that solves
the symptom and leaves #2 untouched. The standard answer to both is
short access tokens + persistent refresh tokens + rotation, which
also adds free stolen-token detection. Doing this lets us delete the
blacklist module entirely.

The design

Two tokens, two cookies

  • Access token: JWT, 15 minutes, signed claims (sub, role?, jti,
    exp). Cookie aegis_session, path=/, SameSite=Lax. Verified
    statelessly on every protected request.
  • Refresh token: opaque random 32-byte string (NOT a JWT — server-side
    row is the source of truth). Cookie aegis_refresh, path=/api/v1/auth/refresh,
    SameSite=Strict. Rides ONLY on refresh calls. Long life (e.g. 14 days,
    configurable).

New table

refresh_token
  token         text    PK   -- random 32B base64url, indexed
  user_id       int     FK   -- cascades on user delete
  family_id     uuid    idx  -- groups rotated tokens for reuse detection
  expires_at    timestamp
  revoked_at    timestamp NULL
  created_at    timestamp default now()

Endpoint changes

  • POST /auth/token — mints both tokens, sets both cookies, returns the
    access token in the body too (API clients still use bearer).
  • POST /auth/register — same (auto-sign-in already does this for the
    access token; add refresh).
  • POST /auth/refresh (NEW) — reads aegis_refresh cookie, validates
    the row (revoked_at IS NULL AND expires_at > NOW()), rotates
    (revokes old, inserts new in the same family), mints a new access
    JWT, sets both cookies fresh.
  • POST /auth/logoutUPDATE refresh_token SET revoked_at = NOW()
    for the caller's token; clears both cookies. Already partially in
    place — just swap blacklist call for refresh revocation.
  • OAuth callback — also issues a refresh token (currently sets only
    access cookie).

Rotation + reuse detection

Every successful /auth/refresh revokes the inbound token and issues a
new one in the same family_id. If a refresh request arrives bearing a
token whose row has revoked_at IS NOT NULL, that's a replay attempt:
revoke the entire family_id (every refresh ever issued in this chain),
log it, force re-auth. Standard OAuth2-style refresh rotation.

APIClient retry

app/core/client.py adds a refresh-on-401 layer. On 401 from any
endpoint other than /auth/refresh itself:

  1. POST /auth/refresh (cookies do the work).
  2. If 200 → retry the original request once.
  3. If still 401 (or refresh itself 401s) → fire on_unauthorized
    (existing path: sign_out/login).

The recursion guard added in #648 (or whatever number lands; it's the
_in_unauthorized flag in APIClient) already protects against
/auth/refresh triggering its own callback.

Sequence: access expires mid-session

Browser           APIClient                 FastAPI            DB
  |                  |                         |                |
  | GET /insights    |                         |                |
  |----------------->|                         |                |
  |                  |  (cookie: aegis_session,|                |
  |                  |    expired)             |                |
  |                  |------------------------>|                |
  |                  |                         | JWT exp -> 401 |
  |                  |<------------------------|                |
  |                  |                         |                |
  |                  | POST /auth/refresh      |                |
  |                  |  (cookie: aegis_refresh)|                |
  |                  |------------------------>|                |
  |                  |                         | look up row    |
  |                  |                         |--------------->|
  |                  |                         |<---------------|
  |                  |                         | revoke old,    |
  |                  |                         | insert new     |
  |                  |                         |--------------->|
  |                  |                         | mint new JWT   |
  |                  |  Set-Cookie x2          |                |
  |                  |<------------------------|                |
  |                  |                         |                |
  |                  | retry GET /insights     |                |
  |                  |------------------------>|                |
  |                  |                         | 200 + payload  |
  |                  |<------------------------|                |
  |<-----------------|                         |                |

User sees nothing. Just a 200.

Files to change

backend
  alembic/versions/                              new migration: refresh_token
  app/services/auth/
    refresh_service.py                           NEW: mint, rotate, revoke, detect-reuse
    token_blacklist.py                           DELETE
    auth_service.py                              drop blacklist consult; keep verify_token
  app/components/backend/api/auth/
    router.py.jinja                              /auth/refresh endpoint;
                                                 update /auth/token, /auth/register,
                                                 /auth/logout to use refresh service;
                                                 OAuth callback also mints refresh
  app/core/security.py.jinja                     set/clear refresh cookie helpers
                                                 (mirror existing aegis_session ones)
  app/core/config.py.jinja                       REFRESH_TOKEN_EXPIRE_DAYS,
                                                 REFRESH_COOKIE_NAME constants
  app/services/auth/deps.py.jinja                unchanged (still cookie-or-bearer)

frontend
  app/core/client.py                             refresh-on-401 retry layer

tests
  tests/api/test_auth_endpoints.py.jinja         refresh flow, rotation, reuse detection
  tests/services/test_refresh_service.py         NEW: unit tests
  tests/test_client.py                           refresh-retry path

Acceptance criteria

  • refresh_token table + migration ships.
  • /auth/token and /auth/register set both cookies; OAuth callback too.
  • /auth/refresh rotates and returns new cookies.
  • /auth/logout revokes the refresh; the access token expires naturally.
  • Reuse-after-revoke triggers family-wide revocation + audit event.
  • APIClient retries on 401 transparently; on_unauthorized only fires
    after refresh itself fails.
  • token_blacklist.py is deleted; nothing imports it.
  • Default ACCESS_TOKEN_EXPIRE_MINUTES drops to 15.
  • Default REFRESH_TOKEN_EXPIRE_DAYS lands at 14 (configurable).
  • Tests cover: happy refresh, expired refresh, revoked refresh, reused
    refresh (family revocation), refresh while no cookie present.

Out of scope

  • Redis-backed refresh store. DB is the right default for the same
    reasons it's the right default for any persistent blacklist: auth
    already requires the DB, refresh rows are bounded (one active per
    device per session), and indexed lookup is microseconds. A
    RefreshTokenStore protocol leaves room to plug Redis in later
    without redesigning anything.
  • Per-device session listing (a la "active sessions" UI). The
    family_id column makes it cheap to add later, but it's not v1.
  • Sliding-window refresh expiration. v1 mints a fixed-life refresh;
    rotation gives a new full-life refresh on each use, which is
    effectively sliding. Explicit sliding (extend on use without rotating)
    is a v2 knob if anyone asks.

Replaces

This issue obsoletes the in-memory-blacklist persistence problem. Closing
the blacklist module is part of this work.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions