@@ -14,6 +14,11 @@ Found this useful? A ⭐ on GitHub helps other developers find it.
1414
1515---
1616
17+ > ** Responsible use:** Only scrape websites you have permission to access.
18+ > Always check a site's ` robots.txt ` and terms of service before running
19+ > LeadHunter Pro against it at scale.
20+
21+
1722## Table of Contents
1823
1924[ Preview] ( #preview ) · [ What It Does] ( #what-it-does ) · [ Use Cases] ( #use-cases ) · [ How It Works] ( #how-it-works ) · [ Features] ( #features ) · [ Performance] ( #performance ) · [ What Data You Get] ( #what-data-you-get ) · [ Quick Start] ( #quick-start ) · [ Blueprint Reference] ( #blueprint-reference ) · [ Run Phases Separately] ( #or-run-phases-separately ) · [ Configuration] ( #configuration ) · [ Runtime Controls] ( #runtime-controls ) · [ Output Format] ( #output-format ) · [ Diagnose Your Engines] ( #diagnose-your-engines ) · [ Architecture Notes] ( #architecture-notes ) · [ Tech Stack] ( #tech-stack ) · [ Project Structure] ( #project-structure ) · [ Requirements] ( #requirements ) · [ Troubleshooting] ( #troubleshooting ) · [ B2B Lead Toolkit] ( #part-of-the-b2b-lead-toolkit ) · [ License] ( #license )
@@ -209,7 +214,25 @@ BING_PROXY = 'socks5://user:pass@proxy-host:1080'
209214cp config.example.yaml config.yaml
210215```
211216
212- Key settings: ` http_timeout ` , ` playwright_timeout ` , ` stop_at ` , ` contact_paths ` , ` skip_email_keywords ` .
217+ | Key | Default | Description |
218+ | ---| ---| ---|
219+ | ` output_format ` | ` xlsx ` | Output format — ` xlsx ` or ` csv ` |
220+ | ` http_timeout ` | ` [4, 6] ` | Pass 1 HTTP timeout range ` [min, max] ` in seconds |
221+ | ` playwright_timeout ` | ` 8000 ` | Pass 2 Playwright page load timeout in milliseconds |
222+ | ` browser_restart_every ` | ` 150 ` | Restart Chromium every N sites to prevent memory leaks |
223+ | ` stop_at ` | ` "" ` | Wall-clock auto-stop in 24h format — ` "" ` = disabled (e.g. ` "23:00" ` ) |
224+ | ` autosave_interval ` | ` 60 ` | Background checkpoint save interval in seconds |
225+ | ` enricher_workers ` | ` 5 ` | Concurrent worker count for Pass 1 HTTP enrichment |
226+ | ` rate_limit.min_seconds ` | ` 0.1 ` | Minimum delay between HTTP requests |
227+ | ` rate_limit.max_seconds ` | ` 0.5 ` | Maximum delay between HTTP requests |
228+ | ` GEO_SUSPECT_TLDS ` | ` [] ` | TLDs flagged as geo-suspect — e.g. ` ['in', 'pk', 'ru'] ` |
229+ | ` score_boost_keywords ` | ` [] ` | URL keywords that give a +1 score boost to a lead |
230+ | ` skip_email_keywords ` | ` [noreply, no-reply, …] ` | Local-part patterns that discard an email entirely (score 999) |
231+ | ` generic_email_keywords ` | ` [info, admin, support, …] ` | Generics used to assign email quality tier (2 or 3) |
232+ | ` junk_email_domains ` | ` [mailinator.com, …] ` | Domains whose emails are always discarded |
233+ | ` contact_paths ` | ` [/contact, /about, …] ` | Sub-pages visited per site in Pass 1 after the homepage |
234+ | ` locale ` | ` en-US ` | Browser locale passed to Playwright for Pass 2 |
235+ | ` cookie_selectors ` | ` […] ` | Playwright selectors tried for cookie banner dismissal (10 defaults) |
213236
214237---
215238
0 commit comments