fix(glassdoor): fix CSRF token URL (404) and non-fatal GraphQL error handling#347
Open
EnxhiT wants to merge 1 commit into
Open
fix(glassdoor): fix CSRF token URL (404) and non-fatal GraphQL error handling#347EnxhiT wants to merge 1 commit into
EnxhiT wants to merge 1 commit into
Conversation
Two bugs prevented Glassdoor from returning any results: 1. _get_csrf_token() was fetching /Job/computer-science-jobs.htm which now returns 404 after Glassdoor's Next.js migration. Changed to fetch the homepage (/) which reliably returns the token. 2. _fetch_jobs_page() treated any "errors" key in the GraphQL response as fatal, dropping all job results. Glassdoor commonly returns non- critical 503s on peripheral fields (e.g. jobsPageSeoData) while the actual jobListings data is intact. Now only errors on the jobListings path itself are treated as fatal. Verified: 30 jobs returned for Spain/engineer with both fixes applied.
|
I also encountered the same issue and found the same Bug 2. Another bug would also be error 400 and it apparently has to do with glassdoor changing how the graph behaves. ERROR - JobSpy:Glassdoor - Glassdoor response status code 400 |
|
The CSRF token URL fix here is solid and worth keeping. However, PR #350 covers the other two bugs (HTTP 400 from unencoded location params, and the GraphQL errors false-positive) more completely and with simpler logic. Would be great if the CSRF fix from this PR got folded into #350 so there's one clean merge rather than two overlapping fixes. |
feiyangliu2023
pushed a commit
to feiyangliu2023/JobSpyFeiyang
that referenced
this pull request
May 9, 2026
The previous commit only improved diagnostics — sources still returned nothing. This commit makes them actually work. glassdoor (upstream PRs speedyapply#347 + speedyapply#350): - CSRF URL `/Job/computer-science-jobs.htm` 404s after the Next.js migration. Switch to homepage `/`, which reliably returns the embedded `"token":"..."` payload. - `findPopularLocationAjax.htm?term=...` raw-interpolated locations with commas/spaces, causing HTTP 400 ("location not parsed"). Wrap the term in `urllib.parse.quote`. This was the actual root cause of every 0-row Glassdoor call — the city-only stripping in monitor's `run_search` only narrowed the pain, didn't fix it (Glassdoor still needs encoding even on bare city names with no comma). - The `/graph` endpoint regularly returns 30 valid jobs alongside non-fatal `errors` on peripheral fields like `jobsPageSeoData`. Old code raised on ANY `errors` key and discarded everything. Only fail when `data.jobListings` is missing. - `_get_location` now raises GlassdoorException with the underlying HTTP status code instead of returning `(None, None)` and losing context. - Per-instance header dict so concurrent scrapes don't trample each other's CSRF token / UA, and so `_fetch_job_description` reads this scrape's token rather than whatever the most recent Glassdoor instance left in the module-global. bayt: - Slugify now folds accented Latin chars to ASCII before stripping non-ASCII (`ingénieur logiciel` → `ingenieur-logiciel` rather than `ingnieur-logiciel`). google: - The job-listing wrapper key (`520084652`) is a Google-internal function ID that gets rotated on redeploy; old parser failed immediately when it changed. Make the parser tolerate rotation: try the known keys first, then fall back to scanning every 9-digit-key payload whose first record structurally matches a job shape (title/company/location strings + nested URL list). Log the fallback key when it fires so we can bake it back into the known list. - `find_job_info` (used by paginated responses) gets the same treatment — if no known key matches, walk the JSON looking for arrays whose first item shape-matches a job record.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Two bugs in
jobspy/glassdoor/__init__.pycause Glassdoor to return 0 results regardless of location or search term.Bug 1 — CSRF token URL returns 404
_get_csrf_token()fetches/Job/computer-science-jobs.htmto extract the CSRF token. This URL now returns a 404 after Glassdoor's migration to Next.js. Without a valid token, the fallback token is used but the subsequent_get_location()call also fails with a 403 because the session is not properly initialized.Fix: Fetch the homepage (
/) instead, which reliably returns the token.Bug 2 — Non-fatal GraphQL errors abort all results
_fetch_jobs_page()raisesValueError("Error encountered in API response")if the GraphQL response contains anyerrorskey. In practice, Glassdoor commonly returns non-critical 503 sub-errors on peripheral fields likejobsPageSeoData(SEO metadata) while the actualjobListingsdata is fully intact.This causes the scraper to discard all 30 job results on every page.
Fix: Only treat errors on the
jobListingspath (excludingjobsPageSeoData) as fatal.Verification
Tested locally against
glassdoor.eswithlocation="Spain",search_term="engineer":ERROR - Glassdoor: Error encountered in API responsetotalJobsCount: 7576Related issues: #279, #270, #273