fix(glassdoor): fix CSRF token URL (404) and non-fatal GraphQL error handling#347
Open
EnxhiT wants to merge 1 commit intospeedyapply:mainfrom
Open
fix(glassdoor): fix CSRF token URL (404) and non-fatal GraphQL error handling#347EnxhiT wants to merge 1 commit intospeedyapply:mainfrom
EnxhiT wants to merge 1 commit intospeedyapply:mainfrom
Conversation
Two bugs prevented Glassdoor from returning any results: 1. _get_csrf_token() was fetching /Job/computer-science-jobs.htm which now returns 404 after Glassdoor's Next.js migration. Changed to fetch the homepage (/) which reliably returns the token. 2. _fetch_jobs_page() treated any "errors" key in the GraphQL response as fatal, dropping all job results. Glassdoor commonly returns non- critical 503s on peripheral fields (e.g. jobsPageSeoData) while the actual jobListings data is intact. Now only errors on the jobListings path itself are treated as fatal. Verified: 30 jobs returned for Spain/engineer with both fixes applied.
|
I also encountered the same issue and found the same Bug 2. Another bug would also be error 400 and it apparently has to do with glassdoor changing how the graph behaves. ERROR - JobSpy:Glassdoor - Glassdoor response status code 400 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Two bugs in
jobspy/glassdoor/__init__.pycause Glassdoor to return 0 results regardless of location or search term.Bug 1 — CSRF token URL returns 404
_get_csrf_token()fetches/Job/computer-science-jobs.htmto extract the CSRF token. This URL now returns a 404 after Glassdoor's migration to Next.js. Without a valid token, the fallback token is used but the subsequent_get_location()call also fails with a 403 because the session is not properly initialized.Fix: Fetch the homepage (
/) instead, which reliably returns the token.Bug 2 — Non-fatal GraphQL errors abort all results
_fetch_jobs_page()raisesValueError("Error encountered in API response")if the GraphQL response contains anyerrorskey. In practice, Glassdoor commonly returns non-critical 503 sub-errors on peripheral fields likejobsPageSeoData(SEO metadata) while the actualjobListingsdata is fully intact.This causes the scraper to discard all 30 job results on every page.
Fix: Only treat errors on the
jobListingspath (excludingjobsPageSeoData) as fatal.Verification
Tested locally against
glassdoor.eswithlocation="Spain",search_term="engineer":ERROR - Glassdoor: Error encountered in API responsetotalJobsCount: 7576Related issues: #279, #270, #273