Skip to content

Commit 43b2fec

Browse files
committed
docs: record Day 81 storage work + correct the Pro-plan / region framing
DEVLOG Day 81 covers the cached_jobs storage-management pass: the hybrid-search lean/full switch (commit a39804f), the PERFDB-4 base-table DDL capture (commit 07eec13), the VACUUM FULL that reclaimed 123 MB (504 -> 381 MB), and the discovery that the jobagent Supabase org is on the Pro plan (so 504 MB was never a live cap problem — the 500 MB framing is for a future free-tier downgrade). deployment.md: the "Hybrid-search lean/full switch" runbook now leads with the Pro-plan context (8 GB, not 500 MB — the cap framing is for a future downgrade), records the VACUUM as run (not hypothetical), notes free-downgrade-eligibility, and adds the at-launch caveat (free tier auto-pauses after 7d idle + has no backups; billing changes are dashboard-only). (AGENT.md §3 also updated locally — adds the Pro-plan fact + corrects the Supabase region from "EU" to the actual ap-south-1/Mumbai per get_project; untracked working briefing, not in this commit.)
1 parent 07eec13 commit 43b2fec

2 files changed

Lines changed: 100 additions & 13 deletions

File tree

docs/DEVLOG.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4172,3 +4172,63 @@ Live smoke post-deploy: `GET /health/sentry-debug` no auth → **401**
41724172
`GET /workspace/analyze-jobs/<fake>` no auth → 401 (SECURITY-1 still
41734173
enforced), security headers still healthy on both subdomains, 31/31
41744174
hermetic new cleanup tests pass locally.
4175+
4176+
## Day 81: cached_jobs storage — lean/full switch, PERFDB-4 base DDL, VACUUM
4177+
4178+
A storage-management pass on `cached_jobs`, prompted by the question
4179+
"can we drop to the Supabase free tier when we can't pay for Pro, and
4180+
flip back when we can?" Two commits + one operational action.
4181+
4182+
**The switch (commit `a39804f`).** Most of it already existed from the
4183+
Tier 1→2 work: the `JOB_SEARCH_HYBRID_ENABLED` flag already gates BOTH
4184+
the search path (`search_cached_jobs_hybrid` vs `…_ranked`) and the
4185+
embed-on-write path, and self-degrades to lexical on error. What was
4186+
missing was the *downgrade* DDL + a runbook. Added
4187+
`docs/sql/supabase-cached-jobs-lean-mode.sql` (drop HNSW index → drop
4188+
hybrid RPC → drop embedding column → VACUUM FULL; idempotent) and a
4189+
"Hybrid-search lean/full switch" runbook in `docs/deployment.md`. The
4190+
upgrade-back half is the existing `…-pgvector.sql` + `…-hybrid.sql` +
4191+
`scripts/backfill_job_embeddings.py`. Lean mode keeps all ~20k jobs —
4192+
it only drops the embeddings, falling back to synonym-expanded lexical
4193+
search (loses concept-matching like "ML engineer" ↔ "machine learning
4194+
specialist").
4195+
4196+
**PERFDB-4 resolved (commit `07eec13`).** The `cached_jobs` base-table
4197+
DDL lived ONLY in the prod DB (the largest reproducibility gap from the
4198+
launch audit). Captured it verbatim from the live catalog
4199+
(`pg_get_indexdef` + `pg_attribute` + `pg_get_expr` +
4200+
`pg_get_constraintdef`) into the tracked
4201+
`docs/sql/supabase-cached-jobs-base.sql`. The catalog caught things
4202+
reconstruction-from-memory would have missed — **two `pg_trgm` GIN
4203+
indexes** (title + company fuzzy match) and the exact CASE expressions
4204+
of all three GENERATED STORED columns (`search_tsv`, `work_mode`,
4205+
`employment_type_norm`). base.sql is the complete Tier 1 *lean* schema;
4206+
the embedding column + HNSW stay in pgvector.sql so the column has one
4207+
source of truth and base.sql doubles as the free-tier rebuild file.
4208+
4209+
**The VACUUM (operational, 2026-06-23).** Live check found the DB at
4210+
**504 MB**, with `cached_jobs` 96% of it. heap+TOAST (335 MB) far
4211+
exceeded live column data (~220 MB) — ~115 MB of dead-tuple bloat from
4212+
the 4-hourly refresh churn (autovacuum reclaims-for-reuse but never
4213+
shrinks the file; only VACUUM FULL rewrites compactly + returns pages to
4214+
the OS). Ran `VACUUM FULL public.cached_jobs`: **504 → 381 MB** (−123
4215+
MB; heap+TOAST 335→241, indexes 150→121). No data lost, semantic search
4216+
intact.
4217+
4218+
**The framing correction.** Discovered via `get_organization` that the
4219+
jobagent org `Job_Application_Copilot` is on the **Supabase Pro plan**
4220+
(8 GB DB, daily backups, no idle auto-pause) — so 504 MB was never a
4221+
live emergency (the "over the cap" language is the *free*-tier number,
4222+
for the hypothetical future downgrade). Also corrected AGENT.md §3,
4223+
which mislabeled the Supabase region as "EU" — `get_project` confirms
4224+
**`ap-south-1`** (Mumbai); the EU label belongs to Sentry/PostHog, not
4225+
the DB. Verified the project is free-downgrade-eligible (DB 381/500 MB,
4226+
storage 0/1 GB, MAU 0/50k, 1 project/org) but flagged that downgrading
4227+
*at launch* is penny-wise: free tier auto-pauses after 7d idle and has
4228+
no backups. Supabase billing changes are dashboard-only (no
4229+
management-API capability), so the actual Pro→free flip is the owner's
4230+
to click — the docs leave it downgrade-ready.
4231+
4232+
No application code changed this day — the flag wiring was already in
4233+
`src/cached_jobs_store.py` + `src/config.py`; everything else is SQL +
4234+
docs + one maintenance command.

docs/deployment.md

Lines changed: 40 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -273,19 +273,46 @@ botched drop can't lose the exact index config. The lean-mode script
273273
only ever drops the *semantic* add-ons (embedding column + HNSW); it
274274
never touches the base table or the lexical indexes.
275275

276-
**Live storage check (jobagent prod, 2026-06-22):** the DB was **504
277-
MB — already over the free-tier 500 MB cap.** `cached_jobs` is 485 MB of
278-
that (14,085 rows, all embedded): heap+TOAST 335 MB, indexes 150 MB.
279-
The semantic layer alone = the `embedding` column (83 MB live) + the
280-
HNSW index (110 MB) = **193 MB**. Lean mode therefore takes the DB to
281-
**~300 MB**, comfortably under the cap. Note the heap+TOAST (335 MB) far
282-
exceeds the live column data (~220 MB) — the 4-hourly refresh churn
283-
leaves ~100 MB of dead-tuple bloat, so a plain `VACUUM FULL
284-
public.cached_jobs` *without* dropping embeddings is a cheaper immediate
285-
lever that may by itself bring the DB back under 500 MB while keeping
286-
semantic search. Use lean mode when you need the headroom long-term on
287-
the free tier; use a standalone VACUUM FULL when you just need to duck
288-
back under the cap for a while.
276+
**IMPORTANT — the 500 MB cap is NOT a current constraint.** The
277+
jobagent Supabase org (`Job_Application_Copilot`) is on the **Pro plan
278+
(8 GB database, daily backups, no idle auto-pause)**. So none of the
279+
"over the cap" framing below is a live emergency — it all describes the
280+
*future* scenario where the org is downgraded **Pro → free** to save the
281+
~$25/mo (no users yet). The lean/full switch + the VACUUM lever exist to
282+
make that downgrade possible and survivable, not to rescue a current
283+
overage.
284+
285+
**Live storage check + VACUUM (jobagent prod, 2026-06-23):** the DB was
286+
**504 MB** (fine on Pro's 8 GB, but over the *free*-tier 500 MB number).
287+
`cached_jobs` was 485 MB of that (14,085 rows, all embedded): heap+TOAST
288+
335 MB, indexes 150 MB; the semantic layer alone = the `embedding`
289+
column (83 MB live) + the HNSW index (110 MB) = **193 MB**. The
290+
heap+TOAST (335 MB) far exceeded the live column data (~220 MB) — ~115
291+
MB of dead-tuple bloat from the 4-hourly refresh churn (autovacuum
292+
reclaims-for-reuse but never shrinks the file). A **`VACUUM FULL
293+
public.cached_jobs` was run** to reclaim it: **504 → 381 MB** (−123 MB;
294+
heap+TOAST 335→241, indexes 150→121). No data lost, semantic search
295+
intact. This made the project **free-downgrade-eligible** (free limits
296+
all clear: DB 381/500 MB, storage 0/1 GB, MAU 0/50k, 1 project/org).
297+
298+
Two levers, two purposes:
299+
- **Standalone `VACUUM FULL`** (what was run) — reclaims churn bloat,
300+
keeps semantic search. ~2-min ACCESS EXCLUSIVE lock. The churn
301+
re-accumulates ~30-60 MB/month, so it's a periodic reset (re-run
302+
when the DB nears ~480 MB), not a permanent fix.
303+
- **Lean mode** (the switch above) — drops embeddings + HNSW to ~300 MB
304+
durably; loses semantic concept-matching. The fallback for staying
305+
under 500 MB long-term without re-VACUUMing or paying for Pro.
306+
307+
**Downgrade-at-launch caveat:** free tier (a) **auto-pauses a project
308+
after 7 days of inactivity** — a launched-but-quiet app sleeps and the
309+
next visitor hits a cold backend, and (b) has **no automated backups**
310+
(Pro includes daily + 7-day PITR). For a launching product with real
311+
user data, $25/mo Pro buys backups + always-on; downgrading is sound
312+
*pre*-launch (no users, no risk) but penny-wise at launch. If you do
313+
downgrade, set up a self-managed `pg_dump` backup routine FIRST, then
314+
flip the plan in the dashboard (Organization → Billing → Change plan;
315+
the management API/MCP cannot change billing — it's dashboard-only).
289316

290317
## Operational gotchas (the runbook entries that cost real time)
291318

0 commit comments

Comments
 (0)