This guide will help you set up SmartCrawler on Windows systems.
- Windows 10 or later
- Administrator access for installation
- Internet connection for downloads
- Go to the SmartCrawler releases page
- Download the latest
smart-crawler-windows-x64.zip - Extract the ZIP file to a folder (e.g.,
C:\SmartCrawler\) - Optional: Add the folder to your system PATH:
- Press
Win + Xand select "System" - Click "Advanced system settings"
- Click "Environment Variables"
- Under "System variables", find and select "Path", then click "Edit"
- Click "New" and add your SmartCrawler folder path
- Click "OK" to save
- Press
- Download
smart-crawler-[version].msifrom the releases page - Double-click the MSI file to install
- Follow the installation wizard
- SmartCrawler will be automatically added to your PATH
If you have Rust installed:
git clone https://github.com/pixlie/SmartCrawler.git
cd SmartCrawler
cargo build --release
# Binary will be in target\release\smart-crawler.exeSmartCrawler requires a WebDriver server to control a browser. Choose one:
-
Install Firefox (if not already installed):
- Download from firefox.com
- Run the installer
-
Install GeckoDriver:
- Download
geckodriver.exefrom Mozilla GeckoDriver releases - Extract the file and place it in:
- The same folder as
smart-crawler.exe, OR - A folder in your system PATH (e.g.,
C:\Windows\System32)
- The same folder as
- Download
-
Install Chrome (if not already installed):
- Download from chrome.com
- Run the installer
-
Install ChromeDriver:
- Check your Chrome version: Go to
chrome://version/in Chrome - Download the matching ChromeDriver from Chrome for Testing
- Extract
chromedriver.exeand place it in:- The same folder as
smart-crawler.exe, OR - A folder in your system PATH
- The same folder as
- Check your Chrome version: Go to
- Open Command Prompt or PowerShell
- Start WebDriver (choose one):
# For Firefox (GeckoDriver) geckodriver.exe --port 4444 # For Chrome (ChromeDriver) chromedriver.exe --port=4444
- Open a new Command Prompt/PowerShell window
- Test SmartCrawler:
smart-crawler --link "https://example.com"
# Basic crawl
smart-crawler --link "https://example.com"
# Crawl with verbose output
smart-crawler --link "https://example.com" --verbose
# Crawl with template detection
smart-crawler --link "https://example.com" --template --verbose
# Crawl multiple sites
smart-crawler --link "https://example.com" --link "https://another.com"- Ensure WebDriver is running on port 4444
- Check that the browser is installed
- Try restarting the WebDriver
- Verify no other application is using port 4444
- If you didn't add SmartCrawler to PATH, run it with the full path:
C:\SmartCrawler\smart-crawler.exe --link "https://example.com" - Or add the folder to your PATH (see Step 1)
- Run Command Prompt as Administrator
- Check that the executable has proper permissions
- Kill any existing WebDriver processes:
taskkill /F /IM geckodriver.exe taskkill /F /IM chromedriver.exe
- Read the CLI Options documentation for advanced usage
- Learn more about template detection for content pattern analysis
- Explore verbose mode for detailed HTML tree analysis
If you encounter issues:
- Check the troubleshooting section above
- Visit the GitHub Issues page
- Search for existing solutions or create a new issue
- Include your Windows version, browser version, and error messages
- Firefox Download
- Chrome Download
- GeckoDriver Releases
- ChromeDriver Downloads
- Rust Installation (if building from source)