docs(ops): add hybrid-search lean/full switch (fit Supabase free tier on demand)

LEANDERANTONY · LEANDERANTONY · commit a39804f19128 · 2026-06-22T23:21:37.000+05:30
Adds a reversible downgrade path so the project can drop to the Supabase
free-tier 500 MB cap when a paid plan isn't in place, and restore Tier 2
semantic search when it is. Most of the machinery already existed from
the Tier 1 -&gt; Tier 2 work; this lands the missing downgrade DDL + the
operator runbook.

- docs/sql/supabase-cached-jobs-lean-mode.sql: the LEAN half. Drops the
  HNSW index, the hybrid RPC, and the embedding column (the bulk of
  cached_jobs's footprint — the ~20k rows of job text are cheap, the
  per-row 1536-dim vectors + HNSW graph are what blow past 500 MB), then
  a VACUUM FULL to return the pages to the OS. Idempotent. The vector
  extension + all Tier 1 lexical/filter indexes are deliberately left
  intact.
- docs/deployment.md: "Hybrid-search lean/full switch" runbook —
  full-&gt;lean and lean-&gt;full step orders, the JOB_SEARCH_HYBRID_ENABLED
  flag (already gates both search + embed-on-write, self-degrades on
  error), and the PERFDB-4 prerequisite (pin the base-table DDL via
  pg_dump before the first downgrade so "restore to full" is
  reproducible).
- docs/README.md: register the new SQL file in the migrations index.

The UPGRADE half is the existing supabase-cached-jobs-pgvector.sql +
-hybrid.sql + scripts/backfill_job_embeddings.py (idempotent/resumable,
~20k rows ~= ~200 embedding calls). No application code change — the
flag wiring is already in src/cached_jobs_store.py + src/config.py.

Not yet done (needs live DB access — Supabase MCP was disconnected this
session): (1) confirm the exact embeddings+HNSW footprint to verify lean
mode lands under 500 MB; (2) capture the cached_jobs base-table DDL into
a tracked migration (PERFDB-4) as the restore safety net. Both have
ready-to-run queries in the runbook.
diff --git a/docs/README.md b/docs/README.md
@@ -52,6 +52,7 @@ The `docs/sql/*.sql` files are reference copies of the Supabase migrations appli
 | `docs/sql/supabase-cached-jobs-search.sql` | `search_cached_jobs_ranked` RPC: text-search + filters + sort + `LIMIT`/`OFFSET` pagination over `cached_jobs` (Tier 1 lexical search). **service_role-only** EXECUTE — the REVOKEs are part of the canonical definition |
 | `docs/sql/supabase-cached-jobs-pgvector.sql` | Tier 2 semantic-search schema: the `vector` extension, the `cached_jobs.embedding vector(1536)` column, and the HNSW cosine index |
 | `docs/sql/supabase-cached-jobs-hybrid.sql` | `search_cached_jobs_hybrid` RPC: Reciprocal Rank Fusion of the Tier 1 lexical ranking and a pgvector semantic ranking (HNSW candidate pools). **service_role-only** EXECUTE |
+| `docs/sql/supabase-cached-jobs-lean-mode.sql` | **The downgrade half of the hybrid-search on/off switch.** Drops the Tier 2 semantic add-ons (HNSW index + hybrid RPC + `embedding` column) and reclaims their storage so the project fits the Supabase free-tier 500 MB cap; the pgvector + hybrid files above are the upgrade-back half. Idempotent. See the "Hybrid-search lean/full switch" runbook in `deployment.md` |
 | `docs/sql/job_cache_cron_setup.sql` | **Template, not source of truth.** The `cached_jobs` refresh pg_cron schedule. Defaults to `*/30`; production runs `0 */4`. `SELECT jobname, schedule FROM cron.job;` is authoritative |
 
 Update trigger: only when a new migration lands. Old `.sql` files are append-only.
diff --git a/docs/deployment.md b/docs/deployment.md
@@ -202,6 +202,80 @@ spend. A row with no corresponding user request is the signature of a
 stuck retry or a forgotten manual eval, not a rogue cron (there is no
 LLM-spending cron — see the inventory at the top).
 
+## Hybrid-search lean/full switch (fit the Supabase free tier on demand)
+
+Job search runs in one of two modes, toggled without a code change. The
+**full** mode is the current production state: Tier 2 hybrid search
+(lexical + pgvector semantic, fused by RRF). The **lean** mode is the
+pre-Tier-2 state: Tier 1 lexical-only (synonym-expanded full-text). The
+ONLY reason to go lean is **storage**: the `embedding vector(1536)`
+column + its HNSW index are the bulk of `cached_jobs`'s footprint (the
+~20k rows of job *text* are cheap — the per-row 1536-dim vectors and the
+HNSW graph are what push the database past the Supabase **free-tier 500
+MB** cap). Going lean reclaims that space so the project fits the free
+plan; going full restores semantic search when a paid plan is in place.
+
+**This is NOT a row-count change.** Lean mode hosts the *same* ~20k
+jobs — it just drops the embeddings. Lexical search (exact-keyword +
+synonym/abbreviation expansion) keeps working; what you lose is
+concept-level matching (e.g. "ML engineer" ↔ "machine learning
+specialist" with no shared keyword).
+
+**The pieces (most already exist):**
+- Env flag `JOB_SEARCH_HYBRID_ENABLED` — gates BOTH the search path
+  (`search_cached_jobs_hybrid` vs `search_cached_jobs_ranked`) and the
+  embed-on-write path in the 4-hourly refresh. Off = lexical-only +
+  zero embedding spend. The hybrid RPC also self-degrades to lexical on
+  any error, so a stale flag can never 500 the search.
+- `docs/sql/supabase-cached-jobs-lean-mode.sql` — the downgrade DDL
+  (drop HNSW index → drop hybrid RPC → drop embedding column → VACUUM
+  FULL). Idempotent.
+- `docs/sql/supabase-cached-jobs-pgvector.sql` + `…-hybrid.sql` — the
+  upgrade DDL (re-add column + HNSW, re-create hybrid RPC). Idempotent
+  (`IF NOT EXISTS` / `CREATE OR REPLACE`).
+- `scripts/backfill_job_embeddings.py` — re-embeds every row on the way
+  back to full. Idempotent + resumable (`embedding IS NULL` only).
+  ~20k rows ≈ ~200 embedding API calls ≈ a few cents, a few minutes.
+
+**FULL → LEAN (downgrade to the free tier):**
+1. Set `JOB_SEARCH_HYBRID_ENABLED=false` in the VPS `.env`; redeploy api
+   (`docker compose -p ai_job_application_agent up -d --force-recreate api`).
+   Flip the flag FIRST so no request hits the hybrid RPC after its
+   backing column is gone.
+2. Apply statements 1–3 of `supabase-cached-jobs-lean-mode.sql` (drop
+   index → drop RPC → drop column) via the Supabase SQL editor.
+3. Run `VACUUM FULL public.cached_jobs;` as its OWN statement (can't run
+   in a transaction; brief ACCESS EXCLUSIVE lock, ~seconds on ~20k rows
+   — search is unavailable for that window). This is what actually
+   returns the freed pages to the OS so `pg_database_size` drops.
+4. Confirm: `SELECT pg_size_pretty(pg_database_size(current_database()));`
+   is under the free-tier cap.
+
+**LEAN → FULL (upgrade after a paid plan is in place):**
+1. Apply `supabase-cached-jobs-pgvector.sql` (re-adds `embedding` +
+   HNSW; the `vector` extension was left enabled so this is one step).
+2. Run `python -m scripts.backfill_job_embeddings` (or `docker exec
+   ai-job-application-agent-api python -m scripts.backfill_job_embeddings`)
+   to embed all rows. Resumable — re-run if interrupted.
+3. Apply `supabase-cached-jobs-hybrid.sql` (re-creates the hybrid RPC
+   the lean-mode drop removed).
+4. Set `JOB_SEARCH_HYBRID_ENABLED=true`; redeploy api. Embed-on-write
+   resumes for new rows from the next refresh.
+
+**Prerequisite — do this BEFORE the first downgrade:** the full
+`cached_jobs` base-table DDL (the table, the `search_tsv` generated
+tsvector, the GIN index, the `unique (source, job_id)`, the
+`work_mode`/`employment_type_norm` generated columns + partial indexes,
+the recency btree) currently lives ONLY in the prod DB — it is NOT in a
+tracked migration (the parked **PERFDB-4** finding in `report.md`).
+Capture it first with `pg_dump --schema-only -t cached_jobs` into a
+tracked `docs/sql/supabase-cached-jobs-base.sql`, so "restore to full"
+is reproducible and a botched drop can't lose the exact index config.
+The lean-mode script only ever drops the *semantic* add-ons (embedding
+column + HNSW); it never touches the base table or the lexical indexes —
+but pinning the base DDL is the safety net that makes the whole switch
+safe to operate.
+
 ## Operational gotchas (the runbook entries that cost real time)
 
 1. **Docker Compose project-name is load-bearing.** The VPS runs
diff --git a/docs/sql/supabase-cached-jobs-lean-mode.sql b/docs/sql/supabase-cached-jobs-lean-mode.sql
@@ -0,0 +1,90 @@
+-- ---------------------------------------------------------------------------
+-- supabase-cached-jobs-lean-mode — DOWNGRADE Tier 2 → Tier 1 (reclaim storage)
+-- ---------------------------------------------------------------------------
+-- THE "LEAN MODE" HALF OF THE HYBRID-SEARCH ON/OFF SWITCH.
+--
+-- Purpose: when the project must fit the Supabase FREE tier (500 MB database
+-- cap), this file strips the Tier 2 semantic layer off `cached_jobs` and
+-- reclaims the storage it occupies — the `embedding vector(1536)` column and
+-- its HNSW index, which together are the bulk of the table's footprint
+-- (the ~20k rows of job TEXT are cheap; the per-row 1536-dim vectors + the
+-- HNSW graph are what blow past 500 MB). Search degrades to Tier 1 lexical
+-- (synonym-expanded full-text), which is exactly what the product ran before
+-- the Tier 2 upgrade (ADR-033).
+--
+-- This is REVERSIBLE. The "full mode" half is the existing pair
+-- `supabase-cached-jobs-pgvector.sql` (re-adds the column + HNSW index) +
+-- `scripts/backfill_job_embeddings.py` (re-embeds every row, ~20k rows ≈
+-- ~200 embedding API calls ≈ a few cents + a few minutes). See the runbook
+-- in `docs/deployment.md` ("Hybrid-search lean/full switch") for the exact
+-- flip order both directions.
+--
+-- WHAT THE APP DOES WHILE LEAN (no code change needed — already wired):
+--   * Set `JOB_SEARCH_HYBRID_ENABLED=false` (env). The store's `search()`
+--     stays on the Tier 1 `search_cached_jobs_ranked` RPC and NEVER calls
+--     the hybrid RPC (src/cached_jobs_store.py), and `_embed_new_rows`
+--     early-returns so the 4-hourly refresh stops embedding new rows (zero
+--     OpenAI embedding spend while lean). The hybrid RPC also self-degrades
+--     to lexical on any error, so even a stale flag can't 500 the search.
+--   * Flip the flag FIRST, redeploy the API, THEN apply this file. That
+--     ordering means no request is ever routed at the hybrid RPC after its
+--     backing column is gone.
+--
+-- ORDER OF OPERATIONS (operator):
+--   1. Set JOB_SEARCH_HYBRID_ENABLED=false in the VPS `.env`; redeploy api.
+--   2. Apply statements 1–3 below (drop index → drop RPC → drop column).
+--   3. Run statement 4 (VACUUM FULL) as a SEPARATE standalone statement —
+--      it cannot run inside a transaction block and takes a brief
+--      ACCESS EXCLUSIVE lock (~seconds on ~20k rows; search is unavailable
+--      for that window — acceptable for a deliberate downgrade).
+--   4. Confirm `pg_database_size` is back under the free-tier cap.
+--
+-- IDEMPOTENT: every statement is `IF EXISTS`, so re-applying is a no-op and
+-- safe to run even if you're already lean.
+--
+-- WHAT THIS DELIBERATELY DOES NOT TOUCH:
+--   * The `vector` extension stays enabled — it occupies ~0 storage (just
+--     type/operator definitions) and leaving it makes the upgrade-back path
+--     one step shorter. Re-adding the column later needs the extension.
+--   * The GIN index on `search_tsv`, the `unique (source, job_id)`, the
+--     generated `work_mode`/`employment_type_norm` columns + their partial
+--     indexes, and the recency btree — all are Tier 1 lexical/filter
+--     infrastructure and MUST survive. Only the embedding column + its HNSW
+--     index are semantic-only and safe to drop.
+--
+-- SECURITY: only touches schema on `public.cached_jobs` (RLS-enabled, no
+-- policies — service-role-only). Dropping a column / index / function does
+-- not change that posture.
+-- ---------------------------------------------------------------------------
+
+-- 1. Drop the HNSW semantic index. This alone reclaims the single largest
+--    chunk (the HNSW graph) immediately — index space is returned on drop
+--    with no VACUUM needed. (DROP COLUMN below would cascade-drop it anyway;
+--    we drop it explicitly first so the intent is legible and so a partial
+--    re-run is still correct.)
+DROP INDEX IF EXISTS public.cached_jobs_embedding_hnsw_idx;
+
+-- 2. Drop the hybrid RPC. It references `cached_jobs.embedding`, so once the
+--    column is gone it would error at call time (plpgsql late-binds, so the
+--    drop in step 3 wouldn't block, but an orphaned function that 500s if
+--    ever called is worse than no function). The flag is already off so
+--    nothing calls it; `supabase-cached-jobs-hybrid.sql` re-creates it on
+--    the way back to full mode. The signature must match exactly.
+DROP FUNCTION IF EXISTS public.search_cached_jobs_hybrid(text,text,text[],boolean,integer,integer,text[],text[],text,integer,vector);
+
+-- 3. Drop the embedding column. Reclaims the per-row vector payload (1536-dim
+--    float4 ≈ 6 KB/row, TOASTed). The heap pages aren't physically shrunk
+--    until VACUUM FULL (step 4) — DROP COLUMN only marks the attribute
+--    dropped on existing rows.
+ALTER TABLE public.cached_jobs
+    DROP COLUMN IF EXISTS embedding;
+
+-- 4. Reclaim the freed heap/TOAST pages back to the OS so pg_database_size
+--    actually drops under the free-tier cap. RUN THIS SEPARATELY — VACUUM
+--    FULL cannot run inside a transaction block (so it can't go through a
+--    migration wrapper; run it via a plain SQL statement in the SQL editor
+--    or psql), and it takes an ACCESS EXCLUSIVE lock for its duration.
+--      VACUUM FULL public.cached_jobs;
+--
+--    (Left commented so applying statements 1–3 via apply_migration doesn't
+--    choke on the in-transaction restriction. Uncomment + run on its own.)