Skip to content

Commit 09dfd05

Browse files
authored
Enhance README with key settings table
Added key settings table with descriptions and default values.
1 parent 9cad2fc commit 09dfd05

1 file changed

Lines changed: 24 additions & 1 deletion

File tree

README.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,11 @@ Found this useful? A ⭐ on GitHub helps other developers find it.
1414

1515
---
1616

17+
> **Responsible use:** Only scrape websites you have permission to access.
18+
> Always check a site's `robots.txt` and terms of service before running
19+
> LeadHunter Pro against it at scale.
20+
21+
1722
## Table of Contents
1823

1924
[Preview](#preview) · [What It Does](#what-it-does) · [Use Cases](#use-cases) · [How It Works](#how-it-works) · [Features](#features) · [Performance](#performance) · [What Data You Get](#what-data-you-get) · [Quick Start](#quick-start) · [Blueprint Reference](#blueprint-reference) · [Run Phases Separately](#or-run-phases-separately) · [Configuration](#configuration) · [Runtime Controls](#runtime-controls) · [Output Format](#output-format) · [Diagnose Your Engines](#diagnose-your-engines) · [Architecture Notes](#architecture-notes) · [Tech Stack](#tech-stack) · [Project Structure](#project-structure) · [Requirements](#requirements) · [Troubleshooting](#troubleshooting) · [B2B Lead Toolkit](#part-of-the-b2b-lead-toolkit) · [License](#license)
@@ -209,7 +214,25 @@ BING_PROXY = 'socks5://user:pass@proxy-host:1080'
209214
cp config.example.yaml config.yaml
210215
```
211216

212-
Key settings: `http_timeout`, `playwright_timeout`, `stop_at`, `contact_paths`, `skip_email_keywords`.
217+
| Key | Default | Description |
218+
|---|---|---|
219+
| `output_format` | `xlsx` | Output format — `xlsx` or `csv` |
220+
| `http_timeout` | `[4, 6]` | Pass 1 HTTP timeout range `[min, max]` in seconds |
221+
| `playwright_timeout` | `8000` | Pass 2 Playwright page load timeout in milliseconds |
222+
| `browser_restart_every` | `150` | Restart Chromium every N sites to prevent memory leaks |
223+
| `stop_at` | `""` | Wall-clock auto-stop in 24h format — `""` = disabled (e.g. `"23:00"`) |
224+
| `autosave_interval` | `60` | Background checkpoint save interval in seconds |
225+
| `enricher_workers` | `5` | Concurrent worker count for Pass 1 HTTP enrichment |
226+
| `rate_limit.min_seconds` | `0.1` | Minimum delay between HTTP requests |
227+
| `rate_limit.max_seconds` | `0.5` | Maximum delay between HTTP requests |
228+
| `GEO_SUSPECT_TLDS` | `[]` | TLDs flagged as geo-suspect — e.g. `['in', 'pk', 'ru']` |
229+
| `score_boost_keywords` | `[]` | URL keywords that give a +1 score boost to a lead |
230+
| `skip_email_keywords` | `[noreply, no-reply, …]` | Local-part patterns that discard an email entirely (score 999) |
231+
| `generic_email_keywords` | `[info, admin, support, …]` | Generics used to assign email quality tier (2 or 3) |
232+
| `junk_email_domains` | `[mailinator.com, …]` | Domains whose emails are always discarded |
233+
| `contact_paths` | `[/contact, /about, …]` | Sub-pages visited per site in Pass 1 after the homepage |
234+
| `locale` | `en-US` | Browser locale passed to Playwright for Pass 2 |
235+
| `cookie_selectors` | `[…]` | Playwright selectors tried for cookie banner dismissal (10 defaults) |
213236

214237
---
215238

0 commit comments

Comments
 (0)