Skip to content

Commit 18fde26

Browse files
LEANDERANTONYclaude
andcommitted
Refresh worker: fix Workday 0-jobs + Ashby chunk timeouts
Two targeted fixes diagnosed against the live VPS this evening after the cache layer was activated. Workday — `_PAGE_SIZE = 50 → 20` Workday's CXS API silently tightened its per-page limit. Probed all 11 configured tenants (nvidia.wd5, micron.wd1, hpe.wd5, citi.wd5, walmart.wd5, hpe.wd5, adobe.wd5, boeing.wd1, disney.wd5, hp.wd5, blackrock.wd1, workday.wd5): limit=5/10/20 → 200 OK limit=50 → 400 with body {"errorCode":"HTTP_400", "message":""} The old `_PAGE_SIZE = 50` made the FIRST page POST fail for every tenant, so `_fetch_board_jobs` raised, fetch_all_postings yielded ('error', ...) for all 11 boards, and the cache worker ended up with zero workday postings to upsert. ~13 K Fortune- 500 jobs were silently missing from the index. 20 matches the browser default and is the largest size every validated tenant currently accepts. Ashby — per-source chunk_size override Ashby postings carry significantly larger description bodies than Greenhouse/Lever/Workday. The cache table's `search_tsv` is a GENERATED STORED tsvector that gets re-derived on every insert, and Supabase's default `statement_timeout` for the `service_role` REST path is 60 s. At chunk_size=100 we observed five consecutive `canceling statement due to statement timeout` failures per refresh, silently dropping ~500 rows. Lowered Ashby's chunk_size to 30; left other sources at 100. Per refresh: ashby goes from ~18 requests to ~60 but each finishes in well under 60 s, and every row lands. Other sources unchanged. The fix only kicks in on the next CI deploy (which this push triggers — the api container will recreate from the new image and the next cron tick at HH:30 / :00 will exercise both paths). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 630b6e8 commit 18fde26

2 files changed

Lines changed: 20 additions & 4 deletions

File tree

backend/services/job_cache_service.py

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,9 +143,19 @@ def refresh_cached_jobs(
143143
# endpoint — earlier 200-row chunks intermittently failed
144144
# mid-refresh after sustained writes (likely the supabase REST
145145
# tier's per-connection write budget). 100 is a good middle
146-
# ground: low overhead, high success rate, ~1.5x more requests.
146+
# ground for the lighter-payload sources.
147+
#
148+
# Ashby is the exception: its postings carry much larger
149+
# description bodies, and the GENERATED STORED `search_tsv`
150+
# column has to be re-derived on every row insert. At
151+
# chunk_size=100 we observed five consecutive statement
152+
# timeouts per refresh ("canceling statement due to statement
153+
# timeout") on Supabase's default 60 s `statement_timeout`,
154+
# silently losing ~500 rows. chunk_size=30 finishes each
155+
# chunk in well under 60 s; total Ashby refresh goes from
156+
# ~18 requests to ~60, but every row lands.
147157
if all_postings:
148-
chunk_size = 100
158+
chunk_size = 30 if source_name == "ashby" else 100
149159
for i in range(0, len(all_postings), chunk_size):
150160
chunk = all_postings[i : i + chunk_size]
151161
try:

src/job_sources/workday.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,14 @@
5757
# 90% of a typical Fortune 500 hiring window without ballooning the
5858
# refresh into hundreds of HTTP calls.
5959
_MAX_JOBS_PER_BOARD = 250
60-
# Workday accepts up to 50 per page reliably; 20 is the browser default.
61-
_PAGE_SIZE = 50
60+
# Workday's CXS API now rejects `limit > 20` with a bare HTTP 400
61+
# (errorCode "HTTP_400", empty message — diagnosed against
62+
# nvidia.wd5 / micron.wd1 / hpe.wd5 / etc., all 11 tenants). The
63+
# old code used 50 (the previously-permitted ceiling) and silently
64+
# got every board kicked out at the first page, producing 0 jobs
65+
# for the entire workday source. 20 matches the browser default and
66+
# every tenant in the validated pool accepts it without complaint.
67+
_PAGE_SIZE = 20
6268
# Workday IP-rate-limits aggressively (we got 400s after ~80 POSTs in
6369
# a few minutes during validation). Two throttles below mitigate:
6470
# 1) Lower per-provider concurrency (3 not 8) so we don't slam them

0 commit comments

Comments
 (0)