Skip to content

Commit 585bf17

Browse files
B4nanclaude
andauthored
docs: add Impit HTTP client guide to v3.15 docs (apify#3360)
## Summary - Copy the Impit HTTP client guide to version 3.15 docs snapshot - Add the guide to the v3.15 sidebar Follow-up to apify#3359 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent d70fc8f commit 585bf17

6 files changed

Lines changed: 213 additions & 0 deletions

File tree

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
import { CheerioCrawler } from 'crawlee';
2+
import { ImpitHttpClient, Browser } from '@crawlee/impit-client';
3+
4+
const crawler = new CheerioCrawler({
5+
httpClient: new ImpitHttpClient({
6+
// Impersonate Chrome browser
7+
browser: Browser.Chrome,
8+
// Enable HTTP/3 protocol
9+
http3: true,
10+
}),
11+
async requestHandler({ $ }) {
12+
console.log(`Title: ${$('title').text()}`);
13+
},
14+
});
15+
16+
await crawler.run(['https://example.com']);
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
import { BasicCrawler } from 'crawlee';
2+
import { ImpitHttpClient, Browser } from '@crawlee/impit-client';
3+
4+
const crawler = new BasicCrawler({
5+
httpClient: new ImpitHttpClient({
6+
browser: Browser.Firefox,
7+
}),
8+
async requestHandler({ sendRequest, log }) {
9+
const response = await sendRequest();
10+
log.info('Received response', { statusCode: response.statusCode });
11+
},
12+
});
13+
14+
await crawler.run(['https://example.com']);
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
import { CheerioCrawler } from 'crawlee';
2+
import { ImpitHttpClient, Browser } from '@crawlee/impit-client';
3+
4+
const crawler = new CheerioCrawler({
5+
httpClient: new ImpitHttpClient({
6+
browser: Browser.Chrome,
7+
}),
8+
async requestHandler({ $, request, enqueueLinks, pushData }) {
9+
const title = $('title').text();
10+
const h1 = $('h1').first().text();
11+
12+
await pushData({
13+
url: request.url,
14+
title,
15+
h1,
16+
});
17+
18+
// Enqueue links found on the page
19+
await enqueueLinks();
20+
},
21+
});
22+
23+
await crawler.run(['https://example.com']);
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
import { HttpCrawler } from 'crawlee';
2+
import { ImpitHttpClient, Browser } from '@crawlee/impit-client';
3+
4+
const crawler = new HttpCrawler({
5+
httpClient: new ImpitHttpClient({
6+
browser: Browser.Firefox,
7+
http3: true,
8+
}),
9+
async requestHandler({ body, request, log, pushData }) {
10+
log.info(`Processing ${request.url}`);
11+
12+
// body is the raw HTML string
13+
await pushData({
14+
url: request.url,
15+
bodyLength: body.length,
16+
});
17+
},
18+
});
19+
20+
await crawler.run(['https://example.com']);
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
---
2+
id: impit-http-client
3+
title: Impit HTTP Client
4+
description: Browser impersonation for HTTP requests using the Impit library
5+
---
6+
7+
import CodeBlock from '@theme/CodeBlock';
8+
9+
import BasicUsageSource from '!!raw-loader!./basic-usage.ts';
10+
import CheerioCrawlerSource from '!!raw-loader!./cheerio-crawler.ts';
11+
import HttpCrawlerSource from '!!raw-loader!./http-crawler.ts';
12+
import AdvancedConfigSource from '!!raw-loader!./advanced-config.ts';
13+
14+
## Introduction
15+
16+
The `ImpitHttpClient` is an HTTP client implementation based on the [Impit](https://github.com/apify/impit) library. It enables browser impersonation for HTTP requests, helping you bypass bot detection systems without running an actual browser.
17+
18+
:::info Successor to got-scraping
19+
20+
Impit is the successor to `got-scraping`, which is no longer actively maintained. We recommend using `ImpitHttpClient` for all new projects. Impit provides better anti-bot evasion through TLS fingerprinting and HTTP/3 support, while maintaining a smaller package size.
21+
22+
**Impit will become the default HTTP client in the next major version of Crawlee.**
23+
24+
:::
25+
26+
### Why use Impit?
27+
28+
Websites increasingly use sophisticated bot detection that analyzes:
29+
30+
- **HTTP fingerprints**: User-Agent strings, header ordering, HTTP/2 pseudo-header sequences
31+
- **TLS fingerprints**: Cipher suites, TLS extensions, and cryptographic details in the ClientHello message
32+
33+
Standard HTTP clients like `fetch` or `axios` are easily detected because their fingerprints don't match real browsers. Unlike `got-scraping` which only handles HTTP-level fingerprinting, Impit also mimics TLS fingerprints, making requests appear to come from real browsers.
34+
35+
## Installation
36+
37+
Install the `@crawlee/impit-client` package:
38+
39+
```bash npm2yarn
40+
npm install @crawlee/impit-client
41+
```
42+
43+
:::note
44+
45+
The `impit` package includes native binaries and supports Windows, macOS (including ARM), and Linux out of the box.
46+
47+
:::
48+
49+
## Basic usage
50+
51+
Pass the `ImpitHttpClient` instance to the `httpClient` option of any Crawlee crawler:
52+
53+
<CodeBlock language="ts">{BasicUsageSource}</CodeBlock>
54+
55+
## Usage with different crawlers
56+
57+
### CheerioCrawler
58+
59+
<CodeBlock language="ts">{CheerioCrawlerSource}</CodeBlock>
60+
61+
### HttpCrawler
62+
63+
<CodeBlock language="ts">{HttpCrawlerSource}</CodeBlock>
64+
65+
## Configuration options
66+
67+
The `ImpitHttpClient` constructor accepts the following options:
68+
69+
| Option | Type | Default | Description |
70+
|--------|------|---------|-------------|
71+
| `browser` | `'chrome'` \| `'firefox'` | `undefined` | Browser to impersonate. Affects TLS fingerprint and default headers. |
72+
| `http3` | `boolean` | `false` | Enable HTTP/3 (QUIC) protocol support. |
73+
| `ignoreTlsErrors` | `boolean` | `false` | Ignore TLS certificate errors. Useful for testing or self-signed certificates. |
74+
75+
### Browser impersonation
76+
77+
Use the `Browser` enum to specify which browser to impersonate:
78+
79+
```ts
80+
import { ImpitHttpClient, Browser } from '@crawlee/impit-client';
81+
82+
// Impersonate Firefox
83+
const firefoxClient = new ImpitHttpClient({ browser: Browser.Firefox });
84+
85+
// Impersonate Chrome
86+
const chromeClient = new ImpitHttpClient({ browser: Browser.Chrome });
87+
```
88+
89+
### Advanced configuration
90+
91+
<CodeBlock language="ts">{AdvancedConfigSource}</CodeBlock>
92+
93+
## Proxy support
94+
95+
Proxies are configured per-request through Crawlee's proxy management system, not on the `ImpitHttpClient` itself. Use `ProxyConfiguration` as you normally would:
96+
97+
```ts
98+
import { CheerioCrawler, ProxyConfiguration } from 'crawlee';
99+
import { ImpitHttpClient, Browser } from '@crawlee/impit-client';
100+
101+
const proxyConfiguration = new ProxyConfiguration({
102+
proxyUrls: ['http://proxy1.example.com:8080', 'http://proxy2.example.com:8080'],
103+
});
104+
105+
const crawler = new CheerioCrawler({
106+
httpClient: new ImpitHttpClient({ browser: Browser.Chrome }),
107+
proxyConfiguration,
108+
async requestHandler({ $, request }) {
109+
console.log(`Scraped ${request.url}`);
110+
},
111+
});
112+
```
113+
114+
## How it works
115+
116+
Impit achieves browser impersonation at two levels:
117+
118+
1. **HTTP level**: Mimics browser-specific header ordering, HTTP/2 settings, and pseudo-header sequences that antibot services analyze.
119+
120+
2. **TLS level**: Uses a patched version of `rustls` to replicate the exact TLS ClientHello message that browsers send, including cipher suites and extensions.
121+
122+
This dual-layer approach makes requests appear to come from a real browser, significantly reducing blocks from bot detection systems.
123+
124+
## Comparison with other solutions
125+
126+
| Feature | got-scraping | curl-impersonate | Impit |
127+
|---------|--------------|------------------|-------|
128+
| TLS fingerprinting | No | Yes | Yes |
129+
| HTTP/3 support | No | Yes | Yes |
130+
| Native Node.js package | Yes | No (child process) | Yes |
131+
| Windows/macOS ARM | Yes | No | Yes |
132+
| Package size | ~10 MB | ~20 MB | ~8 MB |
133+
134+
**Related links**
135+
136+
- [Impit GitHub repository](https://github.com/apify/impit)
137+
- [Custom HTTP Client guide](./custom-http-client)
138+
- [Proxy Management guide](./proxy-management)
139+
- [Avoiding blocking guide](./avoid-blocking)

website/versioned_sidebars/version-3.15-sidebars.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
"guides/scaling-crawlers",
4444
"guides/avoid-blocking",
4545
"guides/jsdom-crawler-guide",
46+
"guides/impit-http-client/impit-http-client",
4647
"guides/got-scraping",
4748
"guides/typescript-project",
4849
"guides/docker-images",

0 commit comments

Comments
 (0)