Skip to content

Commit bd31069

Browse files
committed
add docs
1 parent c5b4977 commit bd31069

6 files changed

Lines changed: 249 additions & 81 deletions

File tree

docs/database-management.md

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
# Database management
2+
3+
MLPA talks to two PostgreSQL databases. This doc covers what lives where, how the
4+
connection pools are configured, and the query timeout budgets (why we have a few
5+
of them and which query uses which).
6+
7+
## The two databases
8+
9+
| DB | Owner | What MLPA does with it |
10+
|----|-------|------------------------|
11+
| `litellm` (`LITELLM_DB_NAME`) | LiteLLM | Reads/writes a couple of tables directly for things the free-tier LiteLLM API doesn't expose (block/unblock, budget tier change, user listing/counts). |
12+
| `app_attest` (`APP_ATTEST_DB_NAME`) | MLPA (via Alembic) | App Attest challenges + keys, and the signup capacity state. |
13+
14+
Each DB gets its own asyncpg pool, wrapped in a `PGService`
15+
(`src/mlpa/core/pg_services/`):
16+
17+
- `LiteLLMPGService``litellm`
18+
- `AppAttestPGService``app_attest` (also holds a reference to the litellm
19+
service, because the capacity gate reads from both)
20+
21+
## Tables
22+
23+
### litellm DB (LiteLLM owns the schema)
24+
25+
**`LiteLLM_EndUserTable`** - one row per end user.
26+
27+
- `user_id` is `{base_identity}:{service_type}`, e.g. `fxa_uid:ai`. The colon is
28+
load-bearing, we `split_part(user_id, ':', ...)` all over the place.
29+
- MLPA uses: `user_id`, `budget_id` (which tier the user is on), `blocked`.
30+
- Touched by: `get_user`, `list_users`, `update_user_budget`, `block_user`,
31+
`count_users_by_service_type`, `list_managed_base_identities`,
32+
`has_managed_user_rows`.
33+
34+
**`LiteLLM_BudgetTable`** - the budget tiers (one per service type).
35+
36+
- `budget_id`, `max_budget`, `rpm_limit`, `tpm_limit`, `budget_duration`.
37+
- MLPA upserts all tiers from config on startup (`create_budget()`), so changing
38+
a limit in `config.py` takes effect on next restart, not live.
39+
40+
### app_attest DB (MLPA owns the schema, managed by Alembic)
41+
42+
**`challenges`** - App Attest challenge nonce.
43+
44+
| Column | Type | Notes |
45+
|--------|------|-------|
46+
| `key_id_b64` | `VARCHAR(255)` PK | the attested key id |
47+
| `challenge` | `VARCHAR(255)` | the nonce we issued |
48+
| `created_at` | `TIMESTAMPTZ` | expires after `CHALLENGE_EXPIRY_SECONDS` (300s) |
49+
50+
**`public_keys`** - the iOS attested key + replay counter.
51+
52+
| Column | Type | Notes |
53+
|--------|------|-------|
54+
| `key_id_b64` | `VARCHAR(255)` PK | |
55+
| `public_key_pem` | `TEXT` | attested public key |
56+
| `counter` | `BIGINT` | assertion counter, only goes up (replay protection) |
57+
| `created_at` / `updated_at` | `TIMESTAMPTZ` | |
58+
59+
**`mlpa_user_capacity`** - the signup cap counter. Single row.
60+
61+
| Column | Type | Notes |
62+
|--------|------|-------|
63+
| `id` | `SMALLINT` PK `CHECK (id = 1)` | singleton, always 1 |
64+
| `max_identities` | `BIGINT` | the cap (`MLPA_MAX_SIGNED_IN_USERS`) |
65+
| `current_identities` | `BIGINT` | how many distinct identities are claimed |
66+
| `updated_at` | `TIMESTAMPTZ` | |
67+
68+
**`mlpa_user_capacity_identities`** - one row per claimed identity.
69+
70+
| Column | Type | Notes |
71+
|--------|------|-------|
72+
| `base_identity` | `TEXT` PK | the `{base_identity}` part of `user_id` |
73+
| `created_at` | `TIMESTAMPTZ` | |
74+
75+
The two capacity tables are reconciled from `LiteLLM_EndUserTable` on startup
76+
(`ensure_capacity_state()`), so they don't drift from the real user base.
77+
78+
## Connection pool
79+
80+
Set up in `PGService.connect()`, configured from `config.py`:
81+
82+
| Setting | Default | What |
83+
|---------|---------|------|
84+
| `PG_POOL_MIN_SIZE` | 1 | min connections |
85+
| `PG_POOL_MAX_SIZE` | 10 | max connections |
86+
| `PG_PREPARED_STMT_CACHE_MAX_SIZE` | 100 | prepared statement cache |
87+
88+
On connect we set these server-side (per session, so they apply to every query
89+
on the pool):
90+
91+
- `statement_timeout` = `PG_STATEMENT_TIMEOUT_MS`
92+
- `idle_in_transaction_session_timeout` = `PG_IDLE_IN_TX_TIMEOUT_MS`
93+
- `application_name` = `mlpa:{db_name}` (handy for `pg_stat_activity`)
94+
95+
## Timeout budgets
96+
97+
The idea: keep a tight default so a runaway query gets killed by Postgres even if
98+
the client or event loop hangs (no connection pile-up). Then raise the budget
99+
only for the few queries that legitimately need longer.
100+
101+
| Budget | Default | Used for |
102+
|--------|---------|----------|
103+
| `PG_STATEMENT_TIMEOUT_MS` | 3000 (3s) | pool default, every query unless raised |
104+
| `PG_IDLE_IN_TX_TIMEOUT_MS` | 10000 (10s) | reaps sessions left idle mid-transaction |
105+
| `PG_ADMIN_READ_TIMEOUT_MS` | 15000 (15s) | admin reads that full-scan the user table |
106+
| `PG_MAINTENANCE_STATEMENT_TIMEOUT_MS` | 30000 (30s) | startup reconciliation (bigger scans) |
107+
| `MLPA_ADMISSION_LOCK_TIMEOUT_MS` | 5000 (5s) | `lock_timeout` for the capacity row `FOR UPDATE` |
108+
| `PG_COMMAND_TIMEOUT_S` | None | optional asyncpg client-side backstop, off by default |
109+
110+
All values are ms (except `PG_COMMAND_TIMEOUT_S`, which is seconds). 0 = unlimited.
111+
112+
### How a budget gets applied
113+
114+
Two ways:
115+
116+
1. **Pool-wide** via `server_settings` (the 3s `statement_timeout` and 10s
117+
idle-in-tx). This is the baseline for everything.
118+
2. **Per-transaction** via `SET LOCAL`, using two context managers in
119+
`PGService`. `SET LOCAL` only lasts for the transaction, so the connection
120+
goes back to the pool defaults on release.
121+
122+
- `statement_timeout(ms)` - raises `statement_timeout` AND idle-in-tx to the
123+
same `ms`. idle-in-tx has to match, otherwise the 10s reaper could kill a
124+
transaction we deliberately gave a longer budget.
125+
- `admission_transaction()` - the capacity gate path. Sets `lock_timeout` =
126+
`MLPA_ADMISSION_LOCK_TIMEOUT_MS`, and `statement_timeout` = `lock_timeout +
127+
PG_STATEMENT_TIMEOUT_MS` (so 5s + 3s = 8s). The statement budget has to sit
128+
above the lock budget, because Postgres counts lock-wait time toward
129+
`statement_timeout`. If it didn't, the 3s default would cap the lock wait
130+
before `lock_timeout` ever fired.
131+
132+
### Which query uses which budget
133+
134+
| Budget | Queries |
135+
|--------|---------|
136+
| default 3s | challenge + key CRUD, `get_user`, `update_user_budget`, `block_user`, `create_budget` upsert |
137+
| admin-read 15s | `list_users` (COUNT(*) + deep OFFSET), `count_users_by_service_type` (GROUP BY `split_part`), `has_managed_user_rows` (EXISTS) |
138+
| maintenance 30s | `list_managed_base_identities` (DISTINCT scan), `_reconcile_capacity_claims` (bulk DELETE + INSERT) |
139+
| admission 8s | `admit_managed_base_identity`, `maybe_release_managed_base_identity_if_no_managed_users` |
140+
141+
The admin-read and maintenance ones all hit the same problem: the `user_id` is
142+
`base:service_type`, so any filter or group on the service-type part uses
143+
`split_part`/`position`, which is unindexable. That means a full-table scan that
144+
grows with the user base and can blow past 3s. So they get a bigger budget
145+
instead.
146+
147+
### Cross-pool read ordering
148+
149+
The capacity reconcile and release paths read from the litellm pool and then open
150+
a transaction on the app_attest pool. That read always happens BEFORE the
151+
app_attest transaction opens. If you did it inside, the app_attest session would
152+
sit idle-in-transaction across the cross-pool `await`, and the idle-in-tx reaper
153+
could kill it (aborting the work and leaking a capacity claim). See
154+
`_reconcile_capacity_claims` and `maybe_release_managed_base_identity_if_no_managed_users`.
155+
156+
### Client-side backstop
157+
158+
`PG_COMMAND_TIMEOUT_S` is asyncpg's own client-side cancel and it's off by
159+
default. Careful: it is NOT relaxed by the per-transaction `SET LOCAL` budgets. If
160+
you turn it on, set it above `PG_MAINTENANCE_STATEMENT_TIMEOUT_MS` (30s) or it
161+
will cancel the maintenance/admin reads.
162+
163+
### These timeouts only apply to MLPA
164+
165+
All of the above is set as asyncpg `server_settings` on MLPA's own connection
166+
pools, at connect time. It's per-session, not database-wide and not on the DB
167+
role. Nothing in the migrations or scripts sets `statement_timeout` at the
168+
`ALTER DATABASE` / `ALTER ROLE` level.
169+
170+
So anything else that connects to these databases on its own session is NOT
171+
affected. That includes the cleanup cron job in the llm-proxy infra, LiteLLM
172+
itself, and the Cloud SQL console. They run with the Postgres default (usually
173+
unlimited) unless someone sets a timeout for that role separately. The cron job
174+
can take as long as it needs, the 3s default won't touch it.
175+
176+
## Migrations
177+
178+
Alembic manages the `app_attest` DB only. LiteLLM manages its own schema.
179+
180+
```bash
181+
uv run alembic upgrade head # apply
182+
uv run alembic downgrade -1 # roll back one
183+
uv run alembic revision -m "..." # new migration
184+
```
185+
186+
The `mlpa_user_capacity*` tables are created by migration, then reconciled on
187+
every startup via `ensure_capacity_state()`. Deploy runs
188+
`scripts/migrate-app-attest-database.sh` with `-x sqlalchemy.url=...`.
189+
190+
## Startup work
191+
192+
The `lifespan` in `run.py` does two DB things on boot:
193+
194+
1. `litellm_pg.create_budget()` - upsert all budget tiers from config.
195+
2. `app_attest_pg.ensure_capacity_state()` - seed the singleton capacity row
196+
(fatal if it fails, without the row every admission 500s), then reconcile the
197+
claim table (best-effort, if it fails the row keeps a stale count and
198+
admissions still work).

src/mlpa/core/config.py

Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -295,24 +295,19 @@ def valid_service_type_for_model(self, service_type: str, model: str) -> bool:
295295
PG_POOL_MAX_SIZE: int = 10
296296
PG_PREPARED_STMT_CACHE_MAX_SIZE: int = 100
297297
READINESS_CHECK_TIMEOUT_S: float = 2.0
298-
# Postgres query timeouts. Server-enforced (statement_timeout) so a runaway
299-
# query is killed even if the client/event loop hangs, preventing connection
300-
# pile-up. Values are milliseconds; 0 = unlimited (Postgres semantics).
298+
# Server-enforced query timeout (ms, 0 = unlimited): kills a runaway query
299+
# even if the client or event loop hangs, so connections don't pile up.
301300
PG_STATEMENT_TIMEOUT_MS: int = 3000
302-
# Reaps transactions left idle between statements (releases held locks).
303-
# Should be >= statement_timeout since a tx legitimately spans round-trips.
301+
# Reaps sessions left idle mid-transaction (releasing their locks). Keep this
302+
# >= statement_timeout, since a transaction can legitimately span round-trips.
304303
PG_IDLE_IN_TX_TIMEOUT_MS: int = 10000
305-
# Raised budget for heavy startup work (capacity reconciliation), applied
306-
# per-transaction via SET LOCAL. 0 = unlimited.
304+
# Raised budget for heavy startup reconciliation, applied per-transaction via SET LOCAL.
307305
PG_MAINTENANCE_STATEMENT_TIMEOUT_MS: int = 30000
308-
# Raised budget for client-facing admin reads that do unindexable full-table
309-
# scans (user listing, counts-by-service-type). 0 = unlimited.
306+
# Raised budget for admin reads that full-scan the user table (listing, counts).
310307
PG_ADMIN_READ_TIMEOUT_MS: int = 15000
311-
# Optional asyncpg client-side backstop (seconds). None = disabled.
312-
# WARNING: this is a pool-level client-side cancel that is NOT relaxed by the
313-
# per-transaction SET LOCAL statement_timeout. If enabled, set it above the
314-
# largest per-statement budget (PG_MAINTENANCE_STATEMENT_TIMEOUT_MS), or it
315-
# will silently cancel the maintenance/admin-read queries.
308+
# Optional asyncpg client-side timeout (seconds, None = off). Unlike the SET
309+
# LOCAL budgets above, this one isn't relaxed by them, so keep it above
310+
# PG_MAINTENANCE_STATEMENT_TIMEOUT_MS or it'll cancel those queries.
316311
PG_COMMAND_TIMEOUT_S: float | None = None
317312

318313
# LLM request default values

src/mlpa/core/pg_services/app_attest_pg_service.py

Lines changed: 10 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -104,12 +104,10 @@ async def ensure_capacity_state(self) -> None:
104104
"""
105105
Seed the singleton capacity row, then reconcile the claim table.
106106
107-
The seed is critical and fatal on failure: without the row every
108-
admission 500s, so a failure should crash startup rather than serve
109-
broken. Reconciliation is best-effort (see _reconcile_capacity_claims):
110-
if it fails the row still exists with a stale count and admissions work.
107+
The seed is fatal on failure: without the row every admission 500s, so we
108+
let it crash startup. Reconciliation is best-effort — if it fails the row
109+
still holds a stale count and admissions keep working.
111110
"""
112-
# Seed the singleton row (fatal on failure).
113111
async with self.pool.acquire() as conn:
114112
async with conn.transaction():
115113
await conn.execute(
@@ -130,7 +128,6 @@ async def ensure_capacity_state(self) -> None:
130128
env.MLPA_MAX_SIGNED_IN_USERS,
131129
)
132130

133-
# Reconcile the claim table (best-effort).
134131
try:
135132
await self._reconcile_capacity_claims()
136133
except Exception as e:
@@ -143,16 +140,15 @@ async def _reconcile_capacity_claims(self) -> None:
143140
"""Rebuild the claim table from LiteLLM and refresh current_identities."""
144141
managed_service_types = list(env.MLPA_CAPPED_SERVICE_TYPES)
145142

146-
# Read from the litellm pool before opening the app_attest transaction:
147-
# doing it inside would leave the session idle-in-transaction across a
143+
# Read the litellm pool before opening the app_attest transaction: doing
144+
# it inside would leave the session idle-in-transaction across the
148145
# cross-pool await, where idle_in_transaction_session_timeout could reap it.
149146
base_identities = await self.litellm_pg.list_managed_base_identities(
150147
managed_service_types
151148
)
152149

153-
# Bulk delete + insert scales with the user base and can exceed the tight
154-
# pool-wide statement_timeout. Statements run back-to-back (no inter-
155-
# statement await), so the raised statement_timeout alone suffices.
150+
# The bulk delete + insert grows with the user base, so run it under the
151+
# raised maintenance budget rather than the tight pool default.
156152
async with self.statement_timeout(
157153
env.PG_MAINTENANCE_STATEMENT_TIMEOUT_MS
158154
) as conn:
@@ -258,10 +254,9 @@ async def maybe_release_managed_base_identity_if_no_managed_users(
258254

259255
managed_service_types = list(env.MLPA_CAPPED_SERVICE_TYPES)
260256

261-
# Read the litellm state before opening the app_attest transaction: doing
262-
# it inside would hold the FOR UPDATE lock idle-in-transaction across a
263-
# cross-pool await, where idle_in_transaction_session_timeout could reap
264-
# it and abort the release, leaking the claim (mirrors ensure_capacity_state).
257+
# Read the litellm state before opening the app_attest transaction (same
258+
# cross-pool idle-in-transaction risk as ensure_capacity_state); reaping
259+
# the session here would abort the release and leak the claim.
265260
has_managed_user_rows = await self.litellm_pg.has_managed_user_rows(
266261
base_identity,
267262
managed_service_types,

src/mlpa/core/pg_services/litellm_pg_service.py

Lines changed: 10 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -77,8 +77,7 @@ async def block_user(self, user_id: str, blocked: bool = True) -> dict:
7777

7878
async def list_users(self, limit: int = 50, offset: int = 0) -> dict:
7979
try:
80-
# COUNT(*) + deep OFFSET scan the full table; admin-read budget
81-
# rather than the tight pool-wide default.
80+
# COUNT(*) + deep OFFSET full-scan the table, so use the admin-read budget.
8281
async with self.statement_timeout(env.PG_ADMIN_READ_TIMEOUT_MS) as conn:
8382
total = await conn.fetchval(
8483
'SELECT COUNT(*) FROM "LiteLLM_EndUserTable"'
@@ -109,8 +108,8 @@ async def count_users_by_service_type(self) -> dict:
109108
`{base_user_id}:{service_type}`.
110109
"""
111110
try:
112-
# GROUP BY split_part(...) is unindexable, so always a full-table
113-
# scan; admin-read budget rather than the tight pool-wide default.
111+
# GROUP BY split_part(...) is unindexable, so always a full scan: use
112+
# the admin-read budget.
114113
async with self.statement_timeout(env.PG_ADMIN_READ_TIMEOUT_MS) as conn:
115114
rows = await conn.fetch(
116115
"""
@@ -148,9 +147,8 @@ async def list_managed_base_identities(
148147
"""
149148
Return distinct base identities for cap-managed service types.
150149
151-
The DISTINCT scan over the full end-user table can exceed the tight
152-
pool-wide statement_timeout on a large user base, so it runs under the
153-
maintenance budget (startup reconciliation work).
150+
The DISTINCT full-scan is startup reconciliation work, so it runs under
151+
the maintenance budget.
154152
"""
155153
async with self.statement_timeout(
156154
env.PG_MAINTENANCE_STATEMENT_TIMEOUT_MS
@@ -172,12 +170,9 @@ async def has_managed_user_rows(
172170
"""
173171
Return True if the base identity has any cap-managed LiteLLM end-user rows.
174172
"""
175-
# The split_part/position predicate is unindexable, so a no-match EXISTS
176-
# scans the full end-user table and can exceed the tight pool-wide
177-
# statement_timeout on a large user base (same pattern as
178-
# count_users_by_service_type / list_managed_base_identities). Run under
179-
# the admin-read budget so a legitimate slow scan on the LiteLLM table is
180-
# not killed at 3s, which would leak a capacity claim on the release path.
173+
# Unindexable predicate, so a no-match EXISTS full-scans the table. Use
174+
# the admin-read budget: killing a slow scan at 3s would leak a capacity
175+
# claim on the release path.
181176
async with self.statement_timeout(env.PG_ADMIN_READ_TIMEOUT_MS) as conn:
182177
return bool(
183178
await conn.fetchval(
@@ -206,8 +201,8 @@ async def create_budget(self):
206201

207202
for service_type, budget_config in user_feature_budgets.items():
208203
try:
209-
# Fast single-row PK upsert: stays a plain autocommit call (it
210-
# cannot realistically hit the pool-wide statement_timeout).
204+
# Fast single-row PK upsert: a plain autocommit call won't hit
205+
# the pool statement_timeout.
211206
await self.pool.fetchrow(
212207
"""
213208
INSERT INTO "LiteLLM_BudgetTable"

src/mlpa/core/pg_services/pg_service.py

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,7 @@ async def _timed_transaction(
6666
"""
6767
Yield a connection in a transaction with statement_timeout (and
6868
optionally idle_in_transaction_session_timeout / lock_timeout) set via
69-
SET LOCAL, scoped to the transaction so the connection reverts to the
70-
pool-wide defaults on release.
69+
SET LOCAL, so the connection reverts to the pool defaults on release.
7170
"""
7271
async with self.pool.acquire() as conn:
7372
async with conn.transaction():
@@ -87,12 +86,10 @@ async def _timed_transaction(
8786
@asynccontextmanager
8887
async def statement_timeout(self, timeout_ms: int):
8988
"""
90-
Raise statement_timeout for statements that legitimately exceed the
91-
tight pool-wide default (e.g. unindexable full-table scans).
92-
93-
idle_in_transaction_session_timeout is lifted to the same budget so the
94-
pool-wide reaper (10s) cannot abort a transaction we deliberately granted
95-
a longer statement budget if an await ever lands between its statements.
89+
Raise statement_timeout for a transaction that legitimately exceeds the
90+
tight pool default (e.g. unindexable full-table scans). idle-in-tx is
91+
lifted to match, so the pool reaper can't abort it if an await lands
92+
between statements.
9693
"""
9794
async with self._timed_transaction(
9895
timeout_ms, idle_in_tx_timeout_ms=timeout_ms
@@ -102,10 +99,10 @@ async def statement_timeout(self, timeout_ms: int):
10299
@asynccontextmanager
103100
async def admission_transaction(self):
104101
"""
105-
Signup-capacity admission path: a bounded lock_timeout for the FOR UPDATE
106-
on the singleton capacity row, plus a statement_timeout set above it so
107-
the lock wait is governed by lock_timeout rather than silently capped by
108-
the pool-wide statement_timeout (Postgres counts lock-wait toward it).
102+
Signup-capacity admission path. Bounds the FOR UPDATE wait on the
103+
capacity row with lock_timeout, and sets statement_timeout above it so
104+
the wait is governed by lock_timeout rather than the tight pool default
105+
(Postgres counts lock-wait toward statement_timeout).
109106
"""
110107
lock_ms = env.MLPA_ADMISSION_LOCK_TIMEOUT_MS
111108
stmt_ms = lock_ms + env.PG_STATEMENT_TIMEOUT_MS

0 commit comments

Comments
 (0)