This guide explains how to use the fetch_job_listing.py script to pull job listings from the internet and save them as markdown files.
The script provides two methods for fetching job listings:
- Method 1: Using
requestslibrary (Simple, but may be blocked by some sites) - Method 2: Using Selenium (More robust, works with JavaScript-heavy sites)
python fetch_job_listing.pyThis will attempt to fetch the job listing from the URL defined in the script and save it as a markdown file in the job_listings/ directory.
from fetch_job_listing import fetch_job_listing
# Fetch a job listing
url = "https://www.indeed.com/?from=gnav-viewjob&advn=100919326538784&vjk=fcd29f6d7f5168f9"
filepath = fetch_job_listing(url, output_dir="job_listings")
print(f"Job listing saved to: {filepath}")- Uses the
requestslibrary to fetch the HTML content - Parses the HTML with BeautifulSoup
- Extracts job title, company, location, and description
- Saves the content as a markdown file
- Simple and lightweight
- No additional dependencies beyond
requestsandbeautifulsoup4 - Fast
- May be blocked by sites with anti-bot protection (like Indeed)
- Doesn't execute JavaScript, so dynamic content won't be captured
The required packages are already in requirements.txt:
requestsbeautifulsoup4
If not installed, run:
pip install requests beautifulsoup4from fetch_job_listing import fetch_job_listing
url = "https://example.com/job-listing"
filepath = fetch_job_listing(url)- Uses Selenium to open a real browser (Chrome)
- Waits for JavaScript to execute and content to load
- Parses the rendered HTML with BeautifulSoup
- Saves the content as a markdown file
- Works with JavaScript-heavy sites
- Bypasses many anti-bot protections
- Captures fully rendered content
- Requires additional setup (ChromeDriver)
- Slower than the requests method
- Requires more system resources
- Install Selenium:
pip install selenium-
Download ChromeDriver:
- Visit: https://chromedriver.chromium.org/
- Download the version matching your Chrome browser version
- Extract the executable to a location in your PATH, or specify the path when creating the driver
-
Verify installation:
chromedriver --versionfrom fetch_job_listing import fetch_job_listing_selenium
url = "https://www.indeed.com/?from=gnav-viewjob&advn=100919326538784&vjk=fcd29f6d7f5168f9"
filepath = fetch_job_listing_selenium(url)The script saves job listings as markdown files with the following structure:
# Job Title
**Company:** Company Name
**Location:** City, State
---
Job description content here...The filename is derived from the job title, with special characters replaced by underscores.
Problem: The script returns a 403 Forbidden error.
Solutions:
- Use Selenium instead of requests
- Add delays between requests if fetching multiple listings
- Use a VPN or proxy
- Check if the site has a robots.txt that blocks automated access
Problem: "ChromeDriver not found" error when using Selenium.
Solutions:
- Download ChromeDriver from https://chromedriver.chromium.org/
- Ensure it matches your Chrome browser version
- Add the ChromeDriver location to your system PATH
- Or specify the path directly in the code:
driver = webdriver.Chrome('/path/to/chromedriver', options=options)
Problem: The script runs but doesn't extract job details properly.
Possible causes:
- The website structure is different from expected
- The job content is loaded dynamically (use Selenium)
- The HTML selectors need to be updated
Solution: Inspect the website's HTML and update the selectors in the script.
fetch_job_listing(url, output_dir="my_job_listings")You can modify the script to extract additional fields like:
- Salary
- Job type (Full-time, Part-time, etc.)
- Application deadline
- Required qualifications
Edit the extraction logic in the fetch_job_listing() or fetch_job_listing_selenium() functions.
- Always check the website's
robots.txtand Terms of Service - Respect rate limits and don't make excessive requests
- Some sites may prohibit automated scraping
- Consider using official APIs if available
- Add appropriate delays between requests when fetching multiple listings
For issues or questions:
- Check the troubleshooting section above
- Review the script comments for implementation details
- Inspect the website's HTML to understand its structure
- Consider using the website's official API if available