You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Multi-engine search scraper + contact enricher. Finds business leads, extracts emails & phones, scores lead quality.**
3
+
**Production-grade Python lead generation engine — scrapes 4 independent search engines simultaneously, enriches every result with email and phone, and scores each lead HOT / WARM / COLD for prioritised outreach. Type a query, get a ready-to-use Excel lead list.**
LeadHunter Pro searches four independent search engines simultaneously to find real business websites matching your query. It then visits each website to extract a contact email address and phone number, and scores every lead as HOT, WARM, COLD, or NOISE based on how closely the page content matches what you searched for. The final output is a colour-coded Excel spreadsheet, ready to use.
13
+
Found this useful? A ⭐ on GitHub helps other developers find it.
|**[Leadhunter Pro](https://github.com/FAAQJAVED/Leadhunter_Pro)** ← *you are here*| Multi-engine search scraper with HOT/WARM/COLD lead scoring |
24
-
|**[Email Phone Enrichment Tool](https://github.com/FAAQJAVED/Email-Phone-Number-Enrichment-Tool)**| Scrapes contact emails + phones from company websites |
25
-
|**[Google Maps Business Scraper](https://github.com/FAAQJAVED/Google-Maps-Business-Scraper)**| Extracts and enriches business listings from Google Maps |
26
-
|**[Trustpilot Business Scraper](https://github.com/FAAQJAVED/trustpilot-business-scraper)**| Extracts business listings from Trustpilot search results |
@@ -39,6 +32,31 @@ LeadHunter Pro searches four independent search engines simultaneously to find r
39
32
40
33
---
41
34
35
+
## What It Does
36
+
37
+
1.**Reads `queries.txt`** — one search query per line (e.g. `property managers manchester`)
38
+
2.**Phase 1 — Scrapes 4 search engines** (Mojeek, DuckDuckGo, Yahoo, Bing) for each query, deduplicates results across engines, and saves a lead CSV.
39
+
3.**Phase 2 — Enriches every lead** by visiting each website: Pass 1 (fast HTTP GET) then Playwright fallback for JS-rendered sites.
40
+
4.**Scores each lead** HOT / WARM / COLD / NOISE based on keyword matching against the original query — prioritised for outreach.
41
+
5.**Outputs a styled Excel file** — colour-coded by score, sorted by quality, hyperlinked websites, and a Summary sheet with engine statistics.
42
+
43
+
Each engine runs in its own session with a warmup request to avoid HTTP 202 bot challenges. Results are deduplicated across all four engines using URL normalisation and domain deduplication before enrichment begins. A built-in `diagnose.py` tool checks each engine's health before a run.
44
+
45
+
---
46
+
47
+
## Use Cases
48
+
49
+
| Who uses it | What they do | Example query |
50
+
|---|---|---|
51
+
|**Sales teams**| Generate targeted prospect lists for cold email campaigns |`"accountants london"` → 400+ HOT leads with email |
52
+
|**Marketing agencies**| Deliver multi-source lead lists for any UK industry vertical |`"estate agents birmingham"` → enriched Excel in 2 hours |
53
+
|**Freelance lead gen**| Automate research for clients across any niche and geography | Any query → score-sorted Excel ready for CRM import |
54
+
|**Recruiters**| Identify employers in a sector and geography with direct contact |`"law firms edinburgh"` → HR emails and direct lines |
55
+
|**Market researchers**| Map a category using 4 independent search indexes simultaneously | Any query → deduplicated coverage from all 4 engines |
56
+
|**SDRs**| Build daily outreach lists with pre-scored priority rankings | Multiple queries → HOT leads on top, COLD at bottom |
57
+
58
+
---
59
+
42
60
## How It Works
43
61
44
62
```
@@ -96,6 +114,33 @@ LeadHunter Pro searches four independent search engines simultaneously to find r
| Single query | 1 | 20–60 leads | All 4 engines | 3–8 min |
122
+
| Small batch | 5–10 queries | 100–300 leads | Full 2-pass | 20–40 min |
123
+
| Overnight run | 50+ queries | 800–2,000 leads | Full 2-pass | 3–8 hours |
124
+
125
+
> **Real run:**`"property managers manchester"` — 1 query across all 4 engines, **62 unique leads from Mojeek alone** (pages 1–9), full enrichment pipeline applied. HOT leads sorted to top with 100% keyword match.
126
+
127
+
---
128
+
129
+
## What Data You Get
130
+
131
+
| Field | Example |
132
+
|---|---|
133
+
| Company Name | Prime Residential |
134
+
| Website |https://primeresidentialpm.com/|
135
+
| Email |manchester@primeresidentialpm.com|
136
+
| Phone | 01612413335 |
137
+
| Lead Quality | HOT |
138
+
| Keyword Match % | 100 |
139
+
140
+
See [`assets/sample_output.csv`](assets/sample_output.csv) for 20 rows of real output extracted from a live scrape.
141
+
142
+
---
143
+
99
144
## Quick Start
100
145
101
146
```bash
@@ -117,6 +162,12 @@ python main.py
117
162
118
163
---
119
164
165
+
## Blueprint Reference
166
+
167
+
For a complete technical deep-dive — architecture decisions, engine behaviour, rate-limit strategy, scoring model, and extension guide — see [BLUEPRINT.md](BLUEPRINT.md).
168
+
169
+
---
170
+
120
171
## Or Run Phases Separately
121
172
122
173
```bash
@@ -242,6 +293,22 @@ Launching a headless browser for every site would take 3–5 s per site versus ~
|`requests`| Phase 2 — lightweight HTTP GET for contact enrichment pass |
305
+
|`openpyxl`| Excel output with colour-coded rows and Summary sheet |
306
+
|`pyyaml`| YAML config loading for Phase 2 settings |
307
+
|`tqdm`| Live terminal progress bar with ETA for both phases |
308
+
|`python-dotenv`| Optional — loads BING_PROXY from .env file |
309
+
310
+
---
311
+
245
312
## Project Structure
246
313
247
314
```
@@ -302,21 +369,33 @@ Leadhunter_Pro/
302
369
303
370
---
304
371
372
+
## Troubleshooting
373
+
374
+
**Bing returning results in wrong language or region:**
375
+
Set `BING_PROXY=http://user:pass@host:8080` in your `.env` file. `BING_PROXY` is read automatically at startup.
376
+
377
+
**DuckDuckGo returning HTTP 202 with no results:**
378
+
DDG's warmup mechanism is handled automatically. If persistent, increase `DELAY_BETWEEN_ENGINES` in `config.py` or pause for 10–15 minutes.
379
+
380
+
**One engine returning zero results consistently:**
381
+
Run `python diagnose.py` — it fires a test query at each engine and reports the HTTP status, result count, and error. Use it to identify which engine to temporarily disable in `ENGINES_PRIORITY` in `config.py`.
382
+
383
+
**Script stops mid-run:**
384
+
Checkpoint is saved every 50 queries. Re-run with the same `queries.txt` to resume from where it stopped.
|`requests`| Phase 2 — lightweight HTTP GET for contact enrichment pass |
316
-
|`openpyxl`| Excel output with colour-coded rows and Summary sheet |
317
-
|`pyyaml`| YAML config loading for Phase 2 settings |
318
-
|`tqdm`| Live terminal progress bar with ETA for both phases |
319
-
|`python-dotenv`| Optional — loads BING_PROXY from .env file |
392
+
|**[Leadhunter Pro](https://github.com/FAAQJAVED/Leadhunter_Pro)** ← *you are here*| Multi-engine search scraper with HOT/WARM/COLD lead scoring |
393
+
|**[Email Phone Enrichment Tool](https://github.com/FAAQJAVED/Email-Phone-Number-Enrichment-Tool)**| Scrapes contact emails + phones from company websites |
394
+
|**[Google Maps Business Scraper](https://github.com/FAAQJAVED/Google-Maps-Business-Scraper)**| Extracts and enriches business listings from Google Maps |
395
+
|**[Trustpilot Business Scraper](https://github.com/FAAQJAVED/trustpilot-business-scraper)**| Extracts business listings from Trustpilot search results |
396
+
|**[JSON Directory Harvester](https://github.com/FAAQJAVED/json-directory-harvester)**| Configurable harvester for any JSON directory API with geo-filtering |
0 commit comments