Skip to content

fix(glassdoor): fix CSRF token URL (404) and non-fatal GraphQL error handling#347

Open
EnxhiT wants to merge 1 commit intospeedyapply:mainfrom
EnxhiT:main
Open

fix(glassdoor): fix CSRF token URL (404) and non-fatal GraphQL error handling#347
EnxhiT wants to merge 1 commit intospeedyapply:mainfrom
EnxhiT:main

Conversation

@EnxhiT
Copy link
Copy Markdown

@EnxhiT EnxhiT commented Mar 19, 2026

Problem

Two bugs in jobspy/glassdoor/__init__.py cause Glassdoor to return 0 results regardless of location or search term.

Bug 1 — CSRF token URL returns 404

_get_csrf_token() fetches /Job/computer-science-jobs.htm to extract the CSRF token. This URL now returns a 404 after Glassdoor's migration to Next.js. Without a valid token, the fallback token is used but the subsequent _get_location() call also fails with a 403 because the session is not properly initialized.

Fix: Fetch the homepage (/) instead, which reliably returns the token.

Bug 2 — Non-fatal GraphQL errors abort all results

_fetch_jobs_page() raises ValueError("Error encountered in API response") if the GraphQL response contains any errors key. In practice, Glassdoor commonly returns non-critical 503 sub-errors on peripheral fields like jobsPageSeoData (SEO metadata) while the actual jobListings data is fully intact.

This causes the scraper to discard all 30 job results on every page.

Fix: Only treat errors on the jobListings path (excluding jobsPageSeoData) as fatal.

Verification

Tested locally against glassdoor.es with location="Spain", search_term="engineer":

  • Before fix: 0 results, ERROR - Glassdoor: Error encountered in API response
  • After fix: 30 jobs returned, totalJobsCount: 7576

Related issues: #279, #270, #273

Two bugs prevented Glassdoor from returning any results:

1. _get_csrf_token() was fetching /Job/computer-science-jobs.htm which
   now returns 404 after Glassdoor's Next.js migration. Changed to fetch
   the homepage (/) which reliably returns the token.

2. _fetch_jobs_page() treated any "errors" key in the GraphQL response
   as fatal, dropping all job results. Glassdoor commonly returns non-
   critical 503s on peripheral fields (e.g. jobsPageSeoData) while the
   actual jobListings data is intact. Now only errors on the jobListings
   path itself are treated as fatal.

Verified: 30 jobs returned for Spain/engineer with both fixes applied.
@EnxhiT EnxhiT requested a review from cullenwatson as a code owner March 19, 2026 00:58
@Astidor
Copy link
Copy Markdown

Astidor commented Mar 23, 2026

I also encountered the same issue and found the same Bug 2. Another bug would also be error 400 and it apparently has to do with glassdoor changing how the graph behaves.

ERROR - JobSpy:Glassdoor - Glassdoor response status code 400
ERROR - JobSpy:Glassdoor - Glassdoor: location not parsed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants