Refresh worker: fix Workday 0-jobs + Ashby chunk timeouts

LEANDERANTONY · claude · LEANDERANTONY · commit 18fde26b0565 · 2026-05-09T04:45:25.000+05:30
Two targeted fixes diagnosed against the live VPS this evening
after the cache layer was activated.

Workday — `_PAGE_SIZE = 50 → 20`
  Workday's CXS API silently tightened its per-page limit. Probed
  all 11 configured tenants (nvidia.wd5, micron.wd1, hpe.wd5,
  citi.wd5, walmart.wd5, hpe.wd5, adobe.wd5, boeing.wd1,
  disney.wd5, hp.wd5, blackrock.wd1, workday.wd5):
    limit=5/10/20 → 200 OK
    limit=50     → 400 with body
                   {"errorCode":"HTTP_400", "message":""}
  The old `_PAGE_SIZE = 50` made the FIRST page POST fail for
  every tenant, so `_fetch_board_jobs` raised, fetch_all_postings
  yielded ('error', ...) for all 11 boards, and the cache worker
  ended up with zero workday postings to upsert. ~13 K Fortune-
  500 jobs were silently missing from the index.
  20 matches the browser default and is the largest size every
  validated tenant currently accepts.

Ashby — per-source chunk_size override
  Ashby postings carry significantly larger description bodies
  than Greenhouse/Lever/Workday. The cache table's `search_tsv`
  is a GENERATED STORED tsvector that gets re-derived on every
  insert, and Supabase's default `statement_timeout` for the
  `service_role` REST path is 60 s. At chunk_size=100 we observed
  five consecutive `canceling statement due to statement timeout`
  failures per refresh, silently dropping ~500 rows.

  Lowered Ashby's chunk_size to 30; left other sources at 100.
  Per refresh: ashby goes from ~18 requests to ~60 but each
  finishes in well under 60 s, and every row lands.

Other sources unchanged. The fix only kicks in on the next CI
deploy (which this push triggers — the api container will
recreate from the new image and the next cron tick at HH:30 / :00
will exercise both paths).

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/backend/services/job_cache_service.py b/backend/services/job_cache_service.py
@@ -143,9 +143,19 @@ def refresh_cached_jobs(
         # endpoint — earlier 200-row chunks intermittently failed
         # mid-refresh after sustained writes (likely the supabase REST
         # tier's per-connection write budget). 100 is a good middle
-        # ground: low overhead, high success rate, ~1.5x more requests.
+        # ground for the lighter-payload sources.
+        #
+        # Ashby is the exception: its postings carry much larger
+        # description bodies, and the GENERATED STORED `search_tsv`
+        # column has to be re-derived on every row insert. At
+        # chunk_size=100 we observed five consecutive statement
+        # timeouts per refresh ("canceling statement due to statement
+        # timeout") on Supabase's default 60 s `statement_timeout`,
+        # silently losing ~500 rows. chunk_size=30 finishes each
+        # chunk in well under 60 s; total Ashby refresh goes from
+        # ~18 requests to ~60, but every row lands.
         if all_postings:
-            chunk_size = 100
+            chunk_size = 30 if source_name == "ashby" else 100
             for i in range(0, len(all_postings), chunk_size):
                 chunk = all_postings[i : i + chunk_size]
                 try:
diff --git a/src/job_sources/workday.py b/src/job_sources/workday.py
@@ -57,8 +57,14 @@
 # 90% of a typical Fortune 500 hiring window without ballooning the
 # refresh into hundreds of HTTP calls.
 _MAX_JOBS_PER_BOARD = 250
-# Workday accepts up to 50 per page reliably; 20 is the browser default.
-_PAGE_SIZE = 50
+# Workday's CXS API now rejects `limit > 20` with a bare HTTP 400
+# (errorCode "HTTP_400", empty message — diagnosed against
+# nvidia.wd5 / micron.wd1 / hpe.wd5 / etc., all 11 tenants). The
+# old code used 50 (the previously-permitted ceiling) and silently
+# got every board kicked out at the first page, producing 0 jobs
+# for the entire workday source. 20 matches the browser default and
+# every tenant in the validated pool accepts it without complaint.
+_PAGE_SIZE = 20
 # Workday IP-rate-limits aggressively (we got 400s after ~80 POSTs in
 # a few minutes during validation). Two throttles below mitigate:
 #   1) Lower per-provider concurrency (3 not 8) so we don't slam them