|
| 1 | +# Database management |
| 2 | + |
| 3 | +MLPA talks to two PostgreSQL databases. This doc covers what lives where, how the |
| 4 | +connection pools are configured, and the query timeout budgets (why we have a few |
| 5 | +of them and which query uses which). |
| 6 | + |
| 7 | +## The two databases |
| 8 | + |
| 9 | +| DB | Owner | What MLPA does with it | |
| 10 | +|----|-------|------------------------| |
| 11 | +| `litellm` (`LITELLM_DB_NAME`) | LiteLLM | Reads/writes a couple of tables directly for things the free-tier LiteLLM API doesn't expose (block/unblock, budget tier change, user listing/counts). | |
| 12 | +| `app_attest` (`APP_ATTEST_DB_NAME`) | MLPA (via Alembic) | App Attest challenges + keys, and the signup capacity state. | |
| 13 | + |
| 14 | +Each DB gets its own asyncpg pool, wrapped in a `PGService` |
| 15 | +(`src/mlpa/core/pg_services/`): |
| 16 | + |
| 17 | +- `LiteLLMPGService` → `litellm` |
| 18 | +- `AppAttestPGService` → `app_attest` (also holds a reference to the litellm |
| 19 | + service, because the capacity gate reads from both) |
| 20 | + |
| 21 | +## Tables |
| 22 | + |
| 23 | +### litellm DB (LiteLLM owns the schema) |
| 24 | + |
| 25 | +**`LiteLLM_EndUserTable`** - one row per end user. |
| 26 | + |
| 27 | +- `user_id` is `{base_identity}:{service_type}`, e.g. `fxa_uid:ai`. The colon is |
| 28 | + load-bearing, we `split_part(user_id, ':', ...)` all over the place. |
| 29 | +- MLPA uses: `user_id`, `budget_id` (which tier the user is on), `blocked`. |
| 30 | +- Touched by: `get_user`, `list_users`, `update_user_budget`, `block_user`, |
| 31 | + `count_users_by_service_type`, `list_managed_base_identities`, |
| 32 | + `has_managed_user_rows`. |
| 33 | + |
| 34 | +**`LiteLLM_BudgetTable`** - the budget tiers (one per service type). |
| 35 | + |
| 36 | +- `budget_id`, `max_budget`, `rpm_limit`, `tpm_limit`, `budget_duration`. |
| 37 | +- MLPA upserts all tiers from config on startup (`create_budget()`), so changing |
| 38 | + a limit in `config.py` takes effect on next restart, not live. |
| 39 | + |
| 40 | +### app_attest DB (MLPA owns the schema, managed by Alembic) |
| 41 | + |
| 42 | +**`challenges`** - App Attest challenge nonce. |
| 43 | + |
| 44 | +| Column | Type | Notes | |
| 45 | +|--------|------|-------| |
| 46 | +| `key_id_b64` | `VARCHAR(255)` PK | the attested key id | |
| 47 | +| `challenge` | `VARCHAR(255)` | the nonce we issued | |
| 48 | +| `created_at` | `TIMESTAMPTZ` | expires after `CHALLENGE_EXPIRY_SECONDS` (300s) | |
| 49 | + |
| 50 | +**`public_keys`** - the iOS attested key + replay counter. |
| 51 | + |
| 52 | +| Column | Type | Notes | |
| 53 | +|--------|------|-------| |
| 54 | +| `key_id_b64` | `VARCHAR(255)` PK | | |
| 55 | +| `public_key_pem` | `TEXT` | attested public key | |
| 56 | +| `counter` | `BIGINT` | assertion counter, only goes up (replay protection) | |
| 57 | +| `created_at` / `updated_at` | `TIMESTAMPTZ` | | |
| 58 | + |
| 59 | +**`mlpa_user_capacity`** - the signup cap counter. Single row. |
| 60 | + |
| 61 | +| Column | Type | Notes | |
| 62 | +|--------|------|-------| |
| 63 | +| `id` | `SMALLINT` PK `CHECK (id = 1)` | singleton, always 1 | |
| 64 | +| `max_identities` | `BIGINT` | the cap (`MLPA_MAX_SIGNED_IN_USERS`) | |
| 65 | +| `current_identities` | `BIGINT` | how many distinct identities are claimed | |
| 66 | +| `updated_at` | `TIMESTAMPTZ` | | |
| 67 | + |
| 68 | +**`mlpa_user_capacity_identities`** - one row per claimed identity. |
| 69 | + |
| 70 | +| Column | Type | Notes | |
| 71 | +|--------|------|-------| |
| 72 | +| `base_identity` | `TEXT` PK | the `{base_identity}` part of `user_id` | |
| 73 | +| `created_at` | `TIMESTAMPTZ` | | |
| 74 | + |
| 75 | +The two capacity tables are reconciled from `LiteLLM_EndUserTable` on startup |
| 76 | +(`ensure_capacity_state()`), so they don't drift from the real user base. |
| 77 | + |
| 78 | +## Connection pool |
| 79 | + |
| 80 | +Set up in `PGService.connect()`, configured from `config.py`: |
| 81 | + |
| 82 | +| Setting | Default | What | |
| 83 | +|---------|---------|------| |
| 84 | +| `PG_POOL_MIN_SIZE` | 1 | min connections | |
| 85 | +| `PG_POOL_MAX_SIZE` | 10 | max connections | |
| 86 | +| `PG_PREPARED_STMT_CACHE_MAX_SIZE` | 100 | prepared statement cache | |
| 87 | + |
| 88 | +On connect we set these server-side (per session, so they apply to every query |
| 89 | +on the pool): |
| 90 | + |
| 91 | +- `statement_timeout` = `PG_STATEMENT_TIMEOUT_MS` |
| 92 | +- `idle_in_transaction_session_timeout` = `PG_IDLE_IN_TX_TIMEOUT_MS` |
| 93 | +- `application_name` = `mlpa:{db_name}` (handy for `pg_stat_activity`) |
| 94 | + |
| 95 | +## Timeout budgets |
| 96 | + |
| 97 | +The idea: keep a tight default so a runaway query gets killed by Postgres even if |
| 98 | +the client or event loop hangs (no connection pile-up). Then raise the budget |
| 99 | +only for the few queries that legitimately need longer. |
| 100 | + |
| 101 | +| Budget | Default | Used for | |
| 102 | +|--------|---------|----------| |
| 103 | +| `PG_STATEMENT_TIMEOUT_MS` | 3000 (3s) | pool default, every query unless raised | |
| 104 | +| `PG_IDLE_IN_TX_TIMEOUT_MS` | 10000 (10s) | reaps sessions left idle mid-transaction | |
| 105 | +| `PG_ADMIN_READ_TIMEOUT_MS` | 15000 (15s) | admin reads that full-scan the user table | |
| 106 | +| `PG_MAINTENANCE_STATEMENT_TIMEOUT_MS` | 30000 (30s) | startup reconciliation (bigger scans) | |
| 107 | +| `MLPA_ADMISSION_LOCK_TIMEOUT_MS` | 5000 (5s) | `lock_timeout` for the capacity row `FOR UPDATE` | |
| 108 | +| `PG_COMMAND_TIMEOUT_S` | None | optional asyncpg client-side backstop, off by default | |
| 109 | + |
| 110 | +All values are ms (except `PG_COMMAND_TIMEOUT_S`, which is seconds). 0 = unlimited. |
| 111 | + |
| 112 | +### How a budget gets applied |
| 113 | + |
| 114 | +Two ways: |
| 115 | + |
| 116 | +1. **Pool-wide** via `server_settings` (the 3s `statement_timeout` and 10s |
| 117 | + idle-in-tx). This is the baseline for everything. |
| 118 | +2. **Per-transaction** via `SET LOCAL`, using two context managers in |
| 119 | + `PGService`. `SET LOCAL` only lasts for the transaction, so the connection |
| 120 | + goes back to the pool defaults on release. |
| 121 | + |
| 122 | + - `statement_timeout(ms)` - raises `statement_timeout` AND idle-in-tx to the |
| 123 | + same `ms`. idle-in-tx has to match, otherwise the 10s reaper could kill a |
| 124 | + transaction we deliberately gave a longer budget. |
| 125 | + - `admission_transaction()` - the capacity gate path. Sets `lock_timeout` = |
| 126 | + `MLPA_ADMISSION_LOCK_TIMEOUT_MS`, and `statement_timeout` = `lock_timeout + |
| 127 | + PG_STATEMENT_TIMEOUT_MS` (so 5s + 3s = 8s). The statement budget has to sit |
| 128 | + above the lock budget, because Postgres counts lock-wait time toward |
| 129 | + `statement_timeout`. If it didn't, the 3s default would cap the lock wait |
| 130 | + before `lock_timeout` ever fired. |
| 131 | + |
| 132 | +### Which query uses which budget |
| 133 | + |
| 134 | +| Budget | Queries | |
| 135 | +|--------|---------| |
| 136 | +| default 3s | challenge + key CRUD, `get_user`, `update_user_budget`, `block_user`, `create_budget` upsert | |
| 137 | +| admin-read 15s | `list_users` (COUNT(*) + deep OFFSET), `count_users_by_service_type` (GROUP BY `split_part`), `has_managed_user_rows` (EXISTS) | |
| 138 | +| maintenance 30s | `list_managed_base_identities` (DISTINCT scan), `_reconcile_capacity_claims` (bulk DELETE + INSERT) | |
| 139 | +| admission 8s | `admit_managed_base_identity`, `maybe_release_managed_base_identity_if_no_managed_users` | |
| 140 | + |
| 141 | +The admin-read and maintenance ones all hit the same problem: the `user_id` is |
| 142 | +`base:service_type`, so any filter or group on the service-type part uses |
| 143 | +`split_part`/`position`, which is unindexable. That means a full-table scan that |
| 144 | +grows with the user base and can blow past 3s. So they get a bigger budget |
| 145 | +instead. |
| 146 | + |
| 147 | +### Cross-pool read ordering |
| 148 | + |
| 149 | +The capacity reconcile and release paths read from the litellm pool and then open |
| 150 | +a transaction on the app_attest pool. That read always happens BEFORE the |
| 151 | +app_attest transaction opens. If you did it inside, the app_attest session would |
| 152 | +sit idle-in-transaction across the cross-pool `await`, and the idle-in-tx reaper |
| 153 | +could kill it (aborting the work and leaking a capacity claim). See |
| 154 | +`_reconcile_capacity_claims` and `maybe_release_managed_base_identity_if_no_managed_users`. |
| 155 | + |
| 156 | +### Client-side backstop |
| 157 | + |
| 158 | +`PG_COMMAND_TIMEOUT_S` is asyncpg's own client-side cancel and it's off by |
| 159 | +default. Careful: it is NOT relaxed by the per-transaction `SET LOCAL` budgets. If |
| 160 | +you turn it on, set it above `PG_MAINTENANCE_STATEMENT_TIMEOUT_MS` (30s) or it |
| 161 | +will cancel the maintenance/admin reads. |
| 162 | + |
| 163 | +### These timeouts only apply to MLPA |
| 164 | + |
| 165 | +All of the above is set as asyncpg `server_settings` on MLPA's own connection |
| 166 | +pools, at connect time. It's per-session, not database-wide and not on the DB |
| 167 | +role. Nothing in the migrations or scripts sets `statement_timeout` at the |
| 168 | +`ALTER DATABASE` / `ALTER ROLE` level. |
| 169 | + |
| 170 | +So anything else that connects to these databases on its own session is NOT |
| 171 | +affected. That includes the cleanup cron job in the llm-proxy infra, LiteLLM |
| 172 | +itself, and the Cloud SQL console. They run with the Postgres default (usually |
| 173 | +unlimited) unless someone sets a timeout for that role separately. The cron job |
| 174 | +can take as long as it needs, the 3s default won't touch it. |
| 175 | + |
| 176 | +## Migrations |
| 177 | + |
| 178 | +Alembic manages the `app_attest` DB only. LiteLLM manages its own schema. |
| 179 | + |
| 180 | +```bash |
| 181 | +uv run alembic upgrade head # apply |
| 182 | +uv run alembic downgrade -1 # roll back one |
| 183 | +uv run alembic revision -m "..." # new migration |
| 184 | +``` |
| 185 | + |
| 186 | +The `mlpa_user_capacity*` tables are created by migration, then reconciled on |
| 187 | +every startup via `ensure_capacity_state()`. Deploy runs |
| 188 | +`scripts/migrate-app-attest-database.sh` with `-x sqlalchemy.url=...`. |
| 189 | + |
| 190 | +## Startup work |
| 191 | + |
| 192 | +The `lifespan` in `run.py` does two DB things on boot: |
| 193 | + |
| 194 | +1. `litellm_pg.create_budget()` - upsert all budget tiers from config. |
| 195 | +2. `app_attest_pg.ensure_capacity_state()` - seed the singleton capacity row |
| 196 | + (fatal if it fails, without the row every admission 500s), then reconcile the |
| 197 | + claim table (best-effort, if it fails the row keeps a stale count and |
| 198 | + admissions still work). |
0 commit comments