Skip to content

Commit a39804f

Browse files
committed
docs(ops): add hybrid-search lean/full switch (fit Supabase free tier on demand)
Adds a reversible downgrade path so the project can drop to the Supabase free-tier 500 MB cap when a paid plan isn't in place, and restore Tier 2 semantic search when it is. Most of the machinery already existed from the Tier 1 -> Tier 2 work; this lands the missing downgrade DDL + the operator runbook. - docs/sql/supabase-cached-jobs-lean-mode.sql: the LEAN half. Drops the HNSW index, the hybrid RPC, and the embedding column (the bulk of cached_jobs's footprint — the ~20k rows of job text are cheap, the per-row 1536-dim vectors + HNSW graph are what blow past 500 MB), then a VACUUM FULL to return the pages to the OS. Idempotent. The vector extension + all Tier 1 lexical/filter indexes are deliberately left intact. - docs/deployment.md: "Hybrid-search lean/full switch" runbook — full->lean and lean->full step orders, the JOB_SEARCH_HYBRID_ENABLED flag (already gates both search + embed-on-write, self-degrades on error), and the PERFDB-4 prerequisite (pin the base-table DDL via pg_dump before the first downgrade so "restore to full" is reproducible). - docs/README.md: register the new SQL file in the migrations index. The UPGRADE half is the existing supabase-cached-jobs-pgvector.sql + -hybrid.sql + scripts/backfill_job_embeddings.py (idempotent/resumable, ~20k rows ~= ~200 embedding calls). No application code change — the flag wiring is already in src/cached_jobs_store.py + src/config.py. Not yet done (needs live DB access — Supabase MCP was disconnected this session): (1) confirm the exact embeddings+HNSW footprint to verify lean mode lands under 500 MB; (2) capture the cached_jobs base-table DDL into a tracked migration (PERFDB-4) as the restore safety net. Both have ready-to-run queries in the runbook.
1 parent d5518a6 commit a39804f

3 files changed

Lines changed: 165 additions & 0 deletions

File tree

docs/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ The `docs/sql/*.sql` files are reference copies of the Supabase migrations appli
5252
| `docs/sql/supabase-cached-jobs-search.sql` | `search_cached_jobs_ranked` RPC: text-search + filters + sort + `LIMIT`/`OFFSET` pagination over `cached_jobs` (Tier 1 lexical search). **service_role-only** EXECUTE — the REVOKEs are part of the canonical definition |
5353
| `docs/sql/supabase-cached-jobs-pgvector.sql` | Tier 2 semantic-search schema: the `vector` extension, the `cached_jobs.embedding vector(1536)` column, and the HNSW cosine index |
5454
| `docs/sql/supabase-cached-jobs-hybrid.sql` | `search_cached_jobs_hybrid` RPC: Reciprocal Rank Fusion of the Tier 1 lexical ranking and a pgvector semantic ranking (HNSW candidate pools). **service_role-only** EXECUTE |
55+
| `docs/sql/supabase-cached-jobs-lean-mode.sql` | **The downgrade half of the hybrid-search on/off switch.** Drops the Tier 2 semantic add-ons (HNSW index + hybrid RPC + `embedding` column) and reclaims their storage so the project fits the Supabase free-tier 500 MB cap; the pgvector + hybrid files above are the upgrade-back half. Idempotent. See the "Hybrid-search lean/full switch" runbook in `deployment.md` |
5556
| `docs/sql/job_cache_cron_setup.sql` | **Template, not source of truth.** The `cached_jobs` refresh pg_cron schedule. Defaults to `*/30`; production runs `0 */4`. `SELECT jobname, schedule FROM cron.job;` is authoritative |
5657

5758
Update trigger: only when a new migration lands. Old `.sql` files are append-only.

docs/deployment.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,80 @@ spend. A row with no corresponding user request is the signature of a
202202
stuck retry or a forgotten manual eval, not a rogue cron (there is no
203203
LLM-spending cron — see the inventory at the top).
204204

205+
## Hybrid-search lean/full switch (fit the Supabase free tier on demand)
206+
207+
Job search runs in one of two modes, toggled without a code change. The
208+
**full** mode is the current production state: Tier 2 hybrid search
209+
(lexical + pgvector semantic, fused by RRF). The **lean** mode is the
210+
pre-Tier-2 state: Tier 1 lexical-only (synonym-expanded full-text). The
211+
ONLY reason to go lean is **storage**: the `embedding vector(1536)`
212+
column + its HNSW index are the bulk of `cached_jobs`'s footprint (the
213+
~20k rows of job *text* are cheap — the per-row 1536-dim vectors and the
214+
HNSW graph are what push the database past the Supabase **free-tier 500
215+
MB** cap). Going lean reclaims that space so the project fits the free
216+
plan; going full restores semantic search when a paid plan is in place.
217+
218+
**This is NOT a row-count change.** Lean mode hosts the *same* ~20k
219+
jobs — it just drops the embeddings. Lexical search (exact-keyword +
220+
synonym/abbreviation expansion) keeps working; what you lose is
221+
concept-level matching (e.g. "ML engineer" ↔ "machine learning
222+
specialist" with no shared keyword).
223+
224+
**The pieces (most already exist):**
225+
- Env flag `JOB_SEARCH_HYBRID_ENABLED` — gates BOTH the search path
226+
(`search_cached_jobs_hybrid` vs `search_cached_jobs_ranked`) and the
227+
embed-on-write path in the 4-hourly refresh. Off = lexical-only +
228+
zero embedding spend. The hybrid RPC also self-degrades to lexical on
229+
any error, so a stale flag can never 500 the search.
230+
- `docs/sql/supabase-cached-jobs-lean-mode.sql` — the downgrade DDL
231+
(drop HNSW index → drop hybrid RPC → drop embedding column → VACUUM
232+
FULL). Idempotent.
233+
- `docs/sql/supabase-cached-jobs-pgvector.sql` + `…-hybrid.sql` — the
234+
upgrade DDL (re-add column + HNSW, re-create hybrid RPC). Idempotent
235+
(`IF NOT EXISTS` / `CREATE OR REPLACE`).
236+
- `scripts/backfill_job_embeddings.py` — re-embeds every row on the way
237+
back to full. Idempotent + resumable (`embedding IS NULL` only).
238+
~20k rows ≈ ~200 embedding API calls ≈ a few cents, a few minutes.
239+
240+
**FULL → LEAN (downgrade to the free tier):**
241+
1. Set `JOB_SEARCH_HYBRID_ENABLED=false` in the VPS `.env`; redeploy api
242+
(`docker compose -p ai_job_application_agent up -d --force-recreate api`).
243+
Flip the flag FIRST so no request hits the hybrid RPC after its
244+
backing column is gone.
245+
2. Apply statements 1–3 of `supabase-cached-jobs-lean-mode.sql` (drop
246+
index → drop RPC → drop column) via the Supabase SQL editor.
247+
3. Run `VACUUM FULL public.cached_jobs;` as its OWN statement (can't run
248+
in a transaction; brief ACCESS EXCLUSIVE lock, ~seconds on ~20k rows
249+
— search is unavailable for that window). This is what actually
250+
returns the freed pages to the OS so `pg_database_size` drops.
251+
4. Confirm: `SELECT pg_size_pretty(pg_database_size(current_database()));`
252+
is under the free-tier cap.
253+
254+
**LEAN → FULL (upgrade after a paid plan is in place):**
255+
1. Apply `supabase-cached-jobs-pgvector.sql` (re-adds `embedding` +
256+
HNSW; the `vector` extension was left enabled so this is one step).
257+
2. Run `python -m scripts.backfill_job_embeddings` (or `docker exec
258+
ai-job-application-agent-api python -m scripts.backfill_job_embeddings`)
259+
to embed all rows. Resumable — re-run if interrupted.
260+
3. Apply `supabase-cached-jobs-hybrid.sql` (re-creates the hybrid RPC
261+
the lean-mode drop removed).
262+
4. Set `JOB_SEARCH_HYBRID_ENABLED=true`; redeploy api. Embed-on-write
263+
resumes for new rows from the next refresh.
264+
265+
**Prerequisite — do this BEFORE the first downgrade:** the full
266+
`cached_jobs` base-table DDL (the table, the `search_tsv` generated
267+
tsvector, the GIN index, the `unique (source, job_id)`, the
268+
`work_mode`/`employment_type_norm` generated columns + partial indexes,
269+
the recency btree) currently lives ONLY in the prod DB — it is NOT in a
270+
tracked migration (the parked **PERFDB-4** finding in `report.md`).
271+
Capture it first with `pg_dump --schema-only -t cached_jobs` into a
272+
tracked `docs/sql/supabase-cached-jobs-base.sql`, so "restore to full"
273+
is reproducible and a botched drop can't lose the exact index config.
274+
The lean-mode script only ever drops the *semantic* add-ons (embedding
275+
column + HNSW); it never touches the base table or the lexical indexes —
276+
but pinning the base DDL is the safety net that makes the whole switch
277+
safe to operate.
278+
205279
## Operational gotchas (the runbook entries that cost real time)
206280

207281
1. **Docker Compose project-name is load-bearing.** The VPS runs
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
-- ---------------------------------------------------------------------------
2+
-- supabase-cached-jobs-lean-mode — DOWNGRADE Tier 2 → Tier 1 (reclaim storage)
3+
-- ---------------------------------------------------------------------------
4+
-- THE "LEAN MODE" HALF OF THE HYBRID-SEARCH ON/OFF SWITCH.
5+
--
6+
-- Purpose: when the project must fit the Supabase FREE tier (500 MB database
7+
-- cap), this file strips the Tier 2 semantic layer off `cached_jobs` and
8+
-- reclaims the storage it occupies — the `embedding vector(1536)` column and
9+
-- its HNSW index, which together are the bulk of the table's footprint
10+
-- (the ~20k rows of job TEXT are cheap; the per-row 1536-dim vectors + the
11+
-- HNSW graph are what blow past 500 MB). Search degrades to Tier 1 lexical
12+
-- (synonym-expanded full-text), which is exactly what the product ran before
13+
-- the Tier 2 upgrade (ADR-033).
14+
--
15+
-- This is REVERSIBLE. The "full mode" half is the existing pair
16+
-- `supabase-cached-jobs-pgvector.sql` (re-adds the column + HNSW index) +
17+
-- `scripts/backfill_job_embeddings.py` (re-embeds every row, ~20k rows ≈
18+
-- ~200 embedding API calls ≈ a few cents + a few minutes). See the runbook
19+
-- in `docs/deployment.md` ("Hybrid-search lean/full switch") for the exact
20+
-- flip order both directions.
21+
--
22+
-- WHAT THE APP DOES WHILE LEAN (no code change needed — already wired):
23+
-- * Set `JOB_SEARCH_HYBRID_ENABLED=false` (env). The store's `search()`
24+
-- stays on the Tier 1 `search_cached_jobs_ranked` RPC and NEVER calls
25+
-- the hybrid RPC (src/cached_jobs_store.py), and `_embed_new_rows`
26+
-- early-returns so the 4-hourly refresh stops embedding new rows (zero
27+
-- OpenAI embedding spend while lean). The hybrid RPC also self-degrades
28+
-- to lexical on any error, so even a stale flag can't 500 the search.
29+
-- * Flip the flag FIRST, redeploy the API, THEN apply this file. That
30+
-- ordering means no request is ever routed at the hybrid RPC after its
31+
-- backing column is gone.
32+
--
33+
-- ORDER OF OPERATIONS (operator):
34+
-- 1. Set JOB_SEARCH_HYBRID_ENABLED=false in the VPS `.env`; redeploy api.
35+
-- 2. Apply statements 1–3 below (drop index → drop RPC → drop column).
36+
-- 3. Run statement 4 (VACUUM FULL) as a SEPARATE standalone statement —
37+
-- it cannot run inside a transaction block and takes a brief
38+
-- ACCESS EXCLUSIVE lock (~seconds on ~20k rows; search is unavailable
39+
-- for that window — acceptable for a deliberate downgrade).
40+
-- 4. Confirm `pg_database_size` is back under the free-tier cap.
41+
--
42+
-- IDEMPOTENT: every statement is `IF EXISTS`, so re-applying is a no-op and
43+
-- safe to run even if you're already lean.
44+
--
45+
-- WHAT THIS DELIBERATELY DOES NOT TOUCH:
46+
-- * The `vector` extension stays enabled — it occupies ~0 storage (just
47+
-- type/operator definitions) and leaving it makes the upgrade-back path
48+
-- one step shorter. Re-adding the column later needs the extension.
49+
-- * The GIN index on `search_tsv`, the `unique (source, job_id)`, the
50+
-- generated `work_mode`/`employment_type_norm` columns + their partial
51+
-- indexes, and the recency btree — all are Tier 1 lexical/filter
52+
-- infrastructure and MUST survive. Only the embedding column + its HNSW
53+
-- index are semantic-only and safe to drop.
54+
--
55+
-- SECURITY: only touches schema on `public.cached_jobs` (RLS-enabled, no
56+
-- policies — service-role-only). Dropping a column / index / function does
57+
-- not change that posture.
58+
-- ---------------------------------------------------------------------------
59+
60+
-- 1. Drop the HNSW semantic index. This alone reclaims the single largest
61+
-- chunk (the HNSW graph) immediately — index space is returned on drop
62+
-- with no VACUUM needed. (DROP COLUMN below would cascade-drop it anyway;
63+
-- we drop it explicitly first so the intent is legible and so a partial
64+
-- re-run is still correct.)
65+
DROP INDEX IF EXISTS public.cached_jobs_embedding_hnsw_idx;
66+
67+
-- 2. Drop the hybrid RPC. It references `cached_jobs.embedding`, so once the
68+
-- column is gone it would error at call time (plpgsql late-binds, so the
69+
-- drop in step 3 wouldn't block, but an orphaned function that 500s if
70+
-- ever called is worse than no function). The flag is already off so
71+
-- nothing calls it; `supabase-cached-jobs-hybrid.sql` re-creates it on
72+
-- the way back to full mode. The signature must match exactly.
73+
DROP FUNCTION IF EXISTS public.search_cached_jobs_hybrid(text,text,text[],boolean,integer,integer,text[],text[],text,integer,vector);
74+
75+
-- 3. Drop the embedding column. Reclaims the per-row vector payload (1536-dim
76+
-- float4 ≈ 6 KB/row, TOASTed). The heap pages aren't physically shrunk
77+
-- until VACUUM FULL (step 4) — DROP COLUMN only marks the attribute
78+
-- dropped on existing rows.
79+
ALTER TABLE public.cached_jobs
80+
DROP COLUMN IF EXISTS embedding;
81+
82+
-- 4. Reclaim the freed heap/TOAST pages back to the OS so pg_database_size
83+
-- actually drops under the free-tier cap. RUN THIS SEPARATELY — VACUUM
84+
-- FULL cannot run inside a transaction block (so it can't go through a
85+
-- migration wrapper; run it via a plain SQL statement in the SQL editor
86+
-- or psql), and it takes an ACCESS EXCLUSIVE lock for its duration.
87+
-- VACUUM FULL public.cached_jobs;
88+
--
89+
-- (Left commented so applying statements 1–3 via apply_migration doesn't
90+
-- choke on the in-transaction restriction. Uncomment + run on its own.)

0 commit comments

Comments
 (0)