Contributions are welcome. Please read the guidelines below before opening a pull request — they keep the codebase consistent and reviews fast.
git clone https://github.com/FAAQJAVED/Leadhunter_Pro.git
cd Leadhunter_Pro
pip install -r requirements.txt -r requirements-dev.txtpytest tests/ --tb=shortAll tests live under tests/. They require no browser and no internet
connection — the full suite completes in under 5 seconds. Run it before
pushing; CI will also run it automatically on every PR across 3 OS ×
3 Python versions.
The project uses ruff for linting and formatting (configured in
pyproject.toml). Line length is 100 characters .
ruff check .
ruff format .Fix all ruff warnings before submitting. Do not suppress rules without a comment explaining why.
- Create
engines/<engine_name>.pyand subclassengine_base.EngineBase. - Implement
search(query: str, pages: int) -> list[SearchResult]. - Register the engine in
engines/__init__.py→ add it toENGINE_MAP. - Add
"myengine"toENGINES_PRIORITYinconfig.pyif it should run by default. - Add at least one HTML-parsing test in
tests/test_engines.py(see existing tests for the pattern).
The 10-step URL cleaning and scoring pipeline lives in
pipeline/data_cleaner.py. Common areas to improve:
- Expanding the
_ALWAYS_EXCLUDEDor_DIRECTORY_DOMAINSsets with newly identified junk domains. - Tightening or loosening
SCORE_BOOST_KEYWORDSinconfig.pyfor different industry verticals. - Adding new flagging rules in
_assess()for domain patterns observed in real runs. - Adding regression tests in
tests/test_cleaner.pyfor any new rule.
- Improving or expanding
BLUEPRINT.mdwith new architectural notes. - Adding worked example queries and output screenshots to
Assets/. - Fixing typos or unclear wording in
README.mdorCONTRIBUTING.md.
Before opening a PR, confirm all of the following:
-
pytest tests/ --tb=shortpasses with zero failures -
ruff check .reports no errors -
CHANGELOG.mdupdated with a brief description of the change - Commit message is short and imperative:
Add Ecosia engine,Fix Yahoo warmup retry logic,Expand directory domain blocklist - One PR per feature or fix — mixed-concern PRs are hard to review and harder to revert
- If the change touches scraping logic, include a note on which engine/site was tested and what the result looked like
Reference the relevant issue in the PR description (Closes #42).