docs: record Day 81 storage work + correct the Pro-plan / region framing

LEANDERANTONY · LEANDERANTONY · commit 43b2fecd7fd0 · 2026-06-23T00:47:00.000+05:30
DEVLOG Day 81 covers the cached_jobs storage-management pass: the hybrid-search lean/full switch (commit a39804f), the PERFDB-4 base-table DDL capture (commit 07eec13), the VACUUM FULL that reclaimed 123 MB (504 -> 381 MB), and the discovery that the jobagent Supabase org is on the Pro plan (so 504 MB was never a live cap problem — the 500 MB framing is for a future free-tier downgrade). deployment.md: the "Hybrid-search lean/full switch" runbook now leads with the Pro-plan context (8 GB, not 500 MB — the cap framing is for a future downgrade), records the VACUUM as run (not hypothetical), notes free-downgrade-eligibility, and adds the at-launch caveat (free tier auto-pauses after 7d idle + has no backups; billing changes are dashboard-only). (AGENT.md §3 also updated locally — adds the Pro-plan fact + corrects the Supabase region from "EU" to the actual ap-south-1/Mumbai per get_project; untracked working briefing, not in this commit.)
diff --git a/docs/DEVLOG.md b/docs/DEVLOG.md
@@ -4172,3 +4172,63 @@ Live smoke post-deploy: `GET /health/sentry-debug` no auth → **401**
 `GET /workspace/analyze-jobs/<fake>` no auth → 401 (SECURITY-1 still
 enforced), security headers still healthy on both subdomains, 31/31
 hermetic new cleanup tests pass locally.
+
+## Day 81: cached_jobs storage — lean/full switch, PERFDB-4 base DDL, VACUUM
+
+A storage-management pass on `cached_jobs`, prompted by the question
+"can we drop to the Supabase free tier when we can't pay for Pro, and
+flip back when we can?" Two commits + one operational action.
+
+**The switch (commit `a39804f`).** Most of it already existed from the
+Tier 1→2 work: the `JOB_SEARCH_HYBRID_ENABLED` flag already gates BOTH
+the search path (`search_cached_jobs_hybrid` vs `…_ranked`) and the
+embed-on-write path, and self-degrades to lexical on error. What was
+missing was the *downgrade* DDL + a runbook. Added
+`docs/sql/supabase-cached-jobs-lean-mode.sql` (drop HNSW index → drop
+hybrid RPC → drop embedding column → VACUUM FULL; idempotent) and a
+"Hybrid-search lean/full switch" runbook in `docs/deployment.md`. The
+upgrade-back half is the existing `…-pgvector.sql` + `…-hybrid.sql` +
+`scripts/backfill_job_embeddings.py`. Lean mode keeps all ~20k jobs —
+it only drops the embeddings, falling back to synonym-expanded lexical
+search (loses concept-matching like "ML engineer" ↔ "machine learning
+specialist").
+
+**PERFDB-4 resolved (commit `07eec13`).** The `cached_jobs` base-table
+DDL lived ONLY in the prod DB (the largest reproducibility gap from the
+launch audit). Captured it verbatim from the live catalog
+(`pg_get_indexdef` + `pg_attribute` + `pg_get_expr` +
+`pg_get_constraintdef`) into the tracked
+`docs/sql/supabase-cached-jobs-base.sql`. The catalog caught things
+reconstruction-from-memory would have missed — **two `pg_trgm` GIN
+indexes** (title + company fuzzy match) and the exact CASE expressions
+of all three GENERATED STORED columns (`search_tsv`, `work_mode`,
+`employment_type_norm`). base.sql is the complete Tier 1 *lean* schema;
+the embedding column + HNSW stay in pgvector.sql so the column has one
+source of truth and base.sql doubles as the free-tier rebuild file.
+
+**The VACUUM (operational, 2026-06-23).** Live check found the DB at
+**504 MB**, with `cached_jobs` 96% of it. heap+TOAST (335 MB) far
+exceeded live column data (~220 MB) — ~115 MB of dead-tuple bloat from
+the 4-hourly refresh churn (autovacuum reclaims-for-reuse but never
+shrinks the file; only VACUUM FULL rewrites compactly + returns pages to
+the OS). Ran `VACUUM FULL public.cached_jobs`: **504 → 381 MB** (−123
+MB; heap+TOAST 335→241, indexes 150→121). No data lost, semantic search
+intact.
+
+**The framing correction.** Discovered via `get_organization` that the
+jobagent org `Job_Application_Copilot` is on the **Supabase Pro plan**
+(8 GB DB, daily backups, no idle auto-pause) — so 504 MB was never a
+live emergency (the "over the cap" language is the *free*-tier number,
+for the hypothetical future downgrade). Also corrected AGENT.md §3,
+which mislabeled the Supabase region as "EU" — `get_project` confirms
+**`ap-south-1`** (Mumbai); the EU label belongs to Sentry/PostHog, not
+the DB. Verified the project is free-downgrade-eligible (DB 381/500 MB,
+storage 0/1 GB, MAU 0/50k, 1 project/org) but flagged that downgrading
+*at launch* is penny-wise: free tier auto-pauses after 7d idle and has
+no backups. Supabase billing changes are dashboard-only (no
+management-API capability), so the actual Pro→free flip is the owner's
+to click — the docs leave it downgrade-ready.
+
+No application code changed this day — the flag wiring was already in
+`src/cached_jobs_store.py` + `src/config.py`; everything else is SQL +
+docs + one maintenance command.
diff --git a/docs/deployment.md b/docs/deployment.md
@@ -273,19 +273,46 @@ botched drop can't lose the exact index config. The lean-mode script
 only ever drops the *semantic* add-ons (embedding column + HNSW); it
 never touches the base table or the lexical indexes.
 
-**Live storage check (jobagent prod, 2026-06-22):** the DB was **504
-MB — already over the free-tier 500 MB cap.** `cached_jobs` is 485 MB of
-that (14,085 rows, all embedded): heap+TOAST 335 MB, indexes 150 MB.
-The semantic layer alone = the `embedding` column (83 MB live) + the
-HNSW index (110 MB) = **193 MB**. Lean mode therefore takes the DB to
-**~300 MB**, comfortably under the cap. Note the heap+TOAST (335 MB) far
-exceeds the live column data (~220 MB) — the 4-hourly refresh churn
-leaves ~100 MB of dead-tuple bloat, so a plain `VACUUM FULL
-public.cached_jobs` *without* dropping embeddings is a cheaper immediate
-lever that may by itself bring the DB back under 500 MB while keeping
-semantic search. Use lean mode when you need the headroom long-term on
-the free tier; use a standalone VACUUM FULL when you just need to duck
-back under the cap for a while.
+**IMPORTANT — the 500 MB cap is NOT a current constraint.** The
+jobagent Supabase org (`Job_Application_Copilot`) is on the **Pro plan
+(8 GB database, daily backups, no idle auto-pause)**. So none of the
+"over the cap" framing below is a live emergency — it all describes the
+*future* scenario where the org is downgraded **Pro → free** to save the
+~$25/mo (no users yet). The lean/full switch + the VACUUM lever exist to
+make that downgrade possible and survivable, not to rescue a current
+overage.
+
+**Live storage check + VACUUM (jobagent prod, 2026-06-23):** the DB was
+**504 MB** (fine on Pro's 8 GB, but over the *free*-tier 500 MB number).
+`cached_jobs` was 485 MB of that (14,085 rows, all embedded): heap+TOAST
+335 MB, indexes 150 MB; the semantic layer alone = the `embedding`
+column (83 MB live) + the HNSW index (110 MB) = **193 MB**. The
+heap+TOAST (335 MB) far exceeded the live column data (~220 MB) — ~115
+MB of dead-tuple bloat from the 4-hourly refresh churn (autovacuum
+reclaims-for-reuse but never shrinks the file). A **`VACUUM FULL
+public.cached_jobs` was run** to reclaim it: **504 → 381 MB** (−123 MB;
+heap+TOAST 335→241, indexes 150→121). No data lost, semantic search
+intact. This made the project **free-downgrade-eligible** (free limits
+all clear: DB 381/500 MB, storage 0/1 GB, MAU 0/50k, 1 project/org).
+
+Two levers, two purposes:
+- **Standalone `VACUUM FULL`** (what was run) — reclaims churn bloat,
+  keeps semantic search. ~2-min ACCESS EXCLUSIVE lock. The churn
+  re-accumulates ~30-60 MB/month, so it's a periodic reset (re-run
+  when the DB nears ~480 MB), not a permanent fix.
+- **Lean mode** (the switch above) — drops embeddings + HNSW to ~300 MB
+  durably; loses semantic concept-matching. The fallback for staying
+  under 500 MB long-term without re-VACUUMing or paying for Pro.
+
+**Downgrade-at-launch caveat:** free tier (a) **auto-pauses a project
+after 7 days of inactivity** — a launched-but-quiet app sleeps and the
+next visitor hits a cold backend, and (b) has **no automated backups**
+(Pro includes daily + 7-day PITR). For a launching product with real
+user data, $25/mo Pro buys backups + always-on; downgrading is sound
+*pre*-launch (no users, no risk) but penny-wise at launch. If you do
+downgrade, set up a self-managed `pg_dump` backup routine FIRST, then
+flip the plan in the dashboard (Organization → Billing → Change plan;
+the management API/MCP cannot change billing — it's dashboard-only).
 
 ## Operational gotchas (the runbook entries that cost real time)