Context
The current auth flow has two cracks:
- The blacklist is in-memory.
app/services/auth/token_blacklist.py
keeps revoked jtis in a process-local dict. Restart the server and
every previously-revoked token is valid again until its natural
ACCESS_TOKEN_EXPIRE_MINUTES window closes. Logout durability is
fiction across deploys.
- Sessions die every 30 minutes. Default
ACCESS_TOKEN_EXPIRE_MINUTES=30,
no refresh, so users get bounced to /login twice an hour. Real-world UX is
meaningfully worse than what cookie-session apps deliver.
You can fix #1 by persisting the blacklist (DB or Redis), but that solves
the symptom and leaves #2 untouched. The standard answer to both is
short access tokens + persistent refresh tokens + rotation, which
also adds free stolen-token detection. Doing this lets us delete the
blacklist module entirely.
The design
Two tokens, two cookies
- Access token: JWT, 15 minutes, signed claims (
sub, role?, jti,
exp). Cookie aegis_session, path=/, SameSite=Lax. Verified
statelessly on every protected request.
- Refresh token: opaque random 32-byte string (NOT a JWT — server-side
row is the source of truth). Cookie aegis_refresh, path=/api/v1/auth/refresh,
SameSite=Strict. Rides ONLY on refresh calls. Long life (e.g. 14 days,
configurable).
New table
refresh_token
token text PK -- random 32B base64url, indexed
user_id int FK -- cascades on user delete
family_id uuid idx -- groups rotated tokens for reuse detection
expires_at timestamp
revoked_at timestamp NULL
created_at timestamp default now()
Endpoint changes
POST /auth/token — mints both tokens, sets both cookies, returns the
access token in the body too (API clients still use bearer).
POST /auth/register — same (auto-sign-in already does this for the
access token; add refresh).
POST /auth/refresh (NEW) — reads aegis_refresh cookie, validates
the row (revoked_at IS NULL AND expires_at > NOW()), rotates
(revokes old, inserts new in the same family), mints a new access
JWT, sets both cookies fresh.
POST /auth/logout — UPDATE refresh_token SET revoked_at = NOW()
for the caller's token; clears both cookies. Already partially in
place — just swap blacklist call for refresh revocation.
- OAuth callback — also issues a refresh token (currently sets only
access cookie).
Rotation + reuse detection
Every successful /auth/refresh revokes the inbound token and issues a
new one in the same family_id. If a refresh request arrives bearing a
token whose row has revoked_at IS NOT NULL, that's a replay attempt:
revoke the entire family_id (every refresh ever issued in this chain),
log it, force re-auth. Standard OAuth2-style refresh rotation.
APIClient retry
app/core/client.py adds a refresh-on-401 layer. On 401 from any
endpoint other than /auth/refresh itself:
- POST
/auth/refresh (cookies do the work).
- If 200 → retry the original request once.
- If still 401 (or refresh itself 401s) → fire
on_unauthorized
(existing path: sign_out → /login).
The recursion guard added in #648 (or whatever number lands; it's the
_in_unauthorized flag in APIClient) already protects against
/auth/refresh triggering its own callback.
Sequence: access expires mid-session
Browser APIClient FastAPI DB
| | | |
| GET /insights | | |
|----------------->| | |
| | (cookie: aegis_session,| |
| | expired) | |
| |------------------------>| |
| | | JWT exp -> 401 |
| |<------------------------| |
| | | |
| | POST /auth/refresh | |
| | (cookie: aegis_refresh)| |
| |------------------------>| |
| | | look up row |
| | |--------------->|
| | |<---------------|
| | | revoke old, |
| | | insert new |
| | |--------------->|
| | | mint new JWT |
| | Set-Cookie x2 | |
| |<------------------------| |
| | | |
| | retry GET /insights | |
| |------------------------>| |
| | | 200 + payload |
| |<------------------------| |
|<-----------------| | |
User sees nothing. Just a 200.
Files to change
backend
alembic/versions/ new migration: refresh_token
app/services/auth/
refresh_service.py NEW: mint, rotate, revoke, detect-reuse
token_blacklist.py DELETE
auth_service.py drop blacklist consult; keep verify_token
app/components/backend/api/auth/
router.py.jinja /auth/refresh endpoint;
update /auth/token, /auth/register,
/auth/logout to use refresh service;
OAuth callback also mints refresh
app/core/security.py.jinja set/clear refresh cookie helpers
(mirror existing aegis_session ones)
app/core/config.py.jinja REFRESH_TOKEN_EXPIRE_DAYS,
REFRESH_COOKIE_NAME constants
app/services/auth/deps.py.jinja unchanged (still cookie-or-bearer)
frontend
app/core/client.py refresh-on-401 retry layer
tests
tests/api/test_auth_endpoints.py.jinja refresh flow, rotation, reuse detection
tests/services/test_refresh_service.py NEW: unit tests
tests/test_client.py refresh-retry path
Acceptance criteria
Out of scope
- Redis-backed refresh store. DB is the right default for the same
reasons it's the right default for any persistent blacklist: auth
already requires the DB, refresh rows are bounded (one active per
device per session), and indexed lookup is microseconds. A
RefreshTokenStore protocol leaves room to plug Redis in later
without redesigning anything.
- Per-device session listing (a la "active sessions" UI). The
family_id column makes it cheap to add later, but it's not v1.
- Sliding-window refresh expiration. v1 mints a fixed-life refresh;
rotation gives a new full-life refresh on each use, which is
effectively sliding. Explicit sliding (extend on use without rotating)
is a v2 knob if anyone asks.
Replaces
This issue obsoletes the in-memory-blacklist persistence problem. Closing
the blacklist module is part of this work.
Context
The current auth flow has two cracks:
app/services/auth/token_blacklist.pykeeps revoked
jtis in a process-local dict. Restart the server andevery previously-revoked token is valid again until its natural
ACCESS_TOKEN_EXPIRE_MINUTESwindow closes. Logout durability isfiction across deploys.
ACCESS_TOKEN_EXPIRE_MINUTES=30,no refresh, so users get bounced to /login twice an hour. Real-world UX is
meaningfully worse than what cookie-session apps deliver.
You can fix #1 by persisting the blacklist (DB or Redis), but that solves
the symptom and leaves #2 untouched. The standard answer to both is
short access tokens + persistent refresh tokens + rotation, which
also adds free stolen-token detection. Doing this lets us delete the
blacklist module entirely.
The design
Two tokens, two cookies
sub,role?,jti,exp). Cookieaegis_session,path=/,SameSite=Lax. Verifiedstatelessly on every protected request.
row is the source of truth). Cookie
aegis_refresh,path=/api/v1/auth/refresh,SameSite=Strict. Rides ONLY on refresh calls. Long life (e.g. 14 days,configurable).
New table
Endpoint changes
POST /auth/token— mints both tokens, sets both cookies, returns theaccess token in the body too (API clients still use bearer).
POST /auth/register— same (auto-sign-in already does this for theaccess token; add refresh).
POST /auth/refresh(NEW) — readsaegis_refreshcookie, validatesthe row (
revoked_at IS NULL AND expires_at > NOW()), rotates(revokes old, inserts new in the same family), mints a new access
JWT, sets both cookies fresh.
POST /auth/logout—UPDATE refresh_token SET revoked_at = NOW()for the caller's token; clears both cookies. Already partially in
place — just swap blacklist call for refresh revocation.
access cookie).
Rotation + reuse detection
Every successful
/auth/refreshrevokes the inbound token and issues anew one in the same
family_id. If a refresh request arrives bearing atoken whose row has
revoked_at IS NOT NULL, that's a replay attempt:revoke the entire
family_id(every refresh ever issued in this chain),log it, force re-auth. Standard OAuth2-style refresh rotation.
APIClient retry
app/core/client.pyadds a refresh-on-401 layer. On 401 from anyendpoint other than
/auth/refreshitself:/auth/refresh(cookies do the work).on_unauthorized(existing path:
sign_out→/login).The recursion guard added in #648 (or whatever number lands; it's the
_in_unauthorizedflag in APIClient) already protects against/auth/refreshtriggering its own callback.Sequence: access expires mid-session
User sees nothing. Just a 200.
Files to change
Acceptance criteria
refresh_tokentable + migration ships./auth/tokenand/auth/registerset both cookies; OAuth callback too./auth/refreshrotates and returns new cookies./auth/logoutrevokes the refresh; the access token expires naturally.on_unauthorizedonly firesafter refresh itself fails.
token_blacklist.pyis deleted; nothing imports it.ACCESS_TOKEN_EXPIRE_MINUTESdrops to 15.REFRESH_TOKEN_EXPIRE_DAYSlands at 14 (configurable).refresh (family revocation), refresh while no cookie present.
Out of scope
reasons it's the right default for any persistent blacklist: auth
already requires the DB, refresh rows are bounded (one active per
device per session), and indexed lookup is microseconds. A
RefreshTokenStoreprotocol leaves room to plug Redis in laterwithout redesigning anything.
family_idcolumn makes it cheap to add later, but it's not v1.rotation gives a new full-life refresh on each use, which is
effectively sliding. Explicit sliding (extend on use without rotating)
is a v2 knob if anyone asks.
Replaces
This issue obsoletes the in-memory-blacklist persistence problem. Closing
the blacklist module is part of this work.