This guide will help you set up SmartCrawler on Linux systems.
- Linux distribution (Ubuntu, Debian, CentOS, Fedora, etc.)
- Root/sudo access for installation
- Internet connection for downloads
- Go to the SmartCrawler releases page
- Download
smart-crawler-linux-x64.tar.gz - Extract and install:
# Extract the downloaded file tar -xzf smart-crawler-linux-x64.tar.gz # Move to a directory in your PATH sudo mv smart-crawler /usr/local/bin/ # Make it executable chmod +x /usr/local/bin/smart-crawler
# Download and install the DEB package
wget https://github.com/pixlie/SmartCrawler/releases/latest/download/smart-crawler-[version].deb
sudo dpkg -i smart-crawler-[version].deb
# Install dependencies if needed
sudo apt-get install -f# Download and install the RPM package
wget https://github.com/pixlie/SmartCrawler/releases/latest/download/smart-crawler-[version].rpm
sudo rpm -i smart-crawler-[version].rpm
# Or with dnf/yum
sudo dnf install smart-crawler-[version].rpmIf you have Rust installed:
git clone https://github.com/pixlie/SmartCrawler.git
cd SmartCrawler
cargo build --release
# Binary will be in target/release/smart-crawlerSmartCrawler requires a WebDriver server to control a browser. Choose one:
# Install Firefox
sudo apt update
sudo apt install firefox
# Install GeckoDriver
wget https://github.com/mozilla/geckodriver/releases/latest/download/geckodriver-v0.33.0-linux64.tar.gz
tar -xzf geckodriver-v0.33.0-linux64.tar.gz
sudo mv geckodriver /usr/local/bin/
chmod +x /usr/local/bin/geckodriver# Install Firefox
sudo dnf install firefox
# Install GeckoDriver
wget https://github.com/mozilla/geckodriver/releases/latest/download/geckodriver-v0.33.0-linux64.tar.gz
tar -xzf geckodriver-v0.33.0-linux64.tar.gz
sudo mv geckodriver /usr/local/bin/
chmod +x /usr/local/bin/geckodriver# Install Firefox and GeckoDriver
sudo pacman -S firefox geckodriver# Install Chrome
wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" | sudo tee /etc/apt/sources.list.d/google-chrome.list
sudo apt update
sudo apt install google-chrome-stable
# Install ChromeDriver
# First check Chrome version
google-chrome --version
# Download matching ChromeDriver version
CHROME_VERSION=$(google-chrome --version | cut -d' ' -f3 | cut -d'.' -f1-3)
wget https://chromedriver.storage.googleapis.com/LATEST_RELEASE_${CHROME_VERSION}
CHROMEDRIVER_VERSION=$(cat LATEST_RELEASE_${CHROME_VERSION})
wget https://chromedriver.storage.googleapis.com/${CHROMEDRIVER_VERSION}/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
sudo mv chromedriver /usr/local/bin/
chmod +x /usr/local/bin/chromedriver# Install Chrome
sudo dnf install google-chrome-stable
# Install ChromeDriver (follow similar steps as Ubuntu above)- Open a terminal
- Start WebDriver (choose one):
# For Firefox (GeckoDriver) geckodriver --port 4444 # For Chrome (ChromeDriver) chromedriver --port=4444
- Open a new terminal
- Test SmartCrawler:
smart-crawler --link "https://example.com"
# Basic crawl
smart-crawler --link "https://example.com"
# Crawl with verbose output
smart-crawler --link "https://example.com" --verbose
# Crawl with template detection
smart-crawler --link "https://example.com" --template --verbose
# Crawl multiple sites
smart-crawler --link "https://example.com" --link "https://another.com"- Ensure WebDriver is running on port 4444
- Check that the browser is installed
- Try restarting the WebDriver
- Verify no other application is using port 4444
- Make sure the binary is executable:
chmod +x /usr/local/bin/smart-crawler
- Check that
/usr/local/binis in your PATH:echo $PATH
- If you didn't install to
/usr/local/bin, add the location to your PATH:export PATH=$PATH:/path/to/smart-crawler
- Or run with the full path:
/path/to/smart-crawler --link "https://example.com"
- Check if browser is installed correctly:
firefox --version google-chrome --version
- Verify WebDriver is accessible:
geckodriver --version chromedriver --version
- Kill any existing WebDriver processes:
pkill geckodriver pkill chromedriver
- Check what's using port 4444:
sudo netstat -tlnp | grep 4444
- Install missing libraries:
# Ubuntu/Debian sudo apt install libssl-dev pkg-config # CentOS/RHEL/Fedora sudo dnf install openssl-devel pkgconfig
- Use
aptfor package management - Firefox ESR is available via
firefox-esrpackage - Chrome installation requires adding Google's repository
- Use
dnforyumfor package management - EPEL repository may be needed for some packages
- Chrome is available through Google's repository
- Use
pacmanfor package management - Both Firefox and GeckoDriver are available in official repositories
- Chrome is available in AUR as
google-chrome
- Use
apkfor package management - May need additional setup for glibc compatibility
- Read the CLI Options documentation for advanced usage
- Learn more about template detection for content pattern analysis
- Explore verbose mode for detailed HTML tree analysis
If you encounter issues:
- Check the troubleshooting section above
- Visit the GitHub Issues page
- Search for existing solutions or create a new issue
- Include your Linux distribution, browser version, and error messages
- Firefox Download
- Chrome Download
- GeckoDriver Releases
- ChromeDriver Downloads
- Rust Installation (if building from source)