|
| 1 | +--- |
| 2 | +id: impit-http-client |
| 3 | +title: Impit HTTP Client |
| 4 | +description: Browser impersonation for HTTP requests using the Impit library |
| 5 | +--- |
| 6 | + |
| 7 | +import CodeBlock from '@theme/CodeBlock'; |
| 8 | + |
| 9 | +import BasicUsageSource from '!!raw-loader!./basic-usage.ts'; |
| 10 | +import CheerioCrawlerSource from '!!raw-loader!./cheerio-crawler.ts'; |
| 11 | +import HttpCrawlerSource from '!!raw-loader!./http-crawler.ts'; |
| 12 | +import AdvancedConfigSource from '!!raw-loader!./advanced-config.ts'; |
| 13 | + |
| 14 | +## Introduction |
| 15 | + |
| 16 | +The `ImpitHttpClient` is an HTTP client implementation based on the [Impit](https://github.com/apify/impit) library. It enables browser impersonation for HTTP requests, helping you bypass bot detection systems without running an actual browser. |
| 17 | + |
| 18 | +:::info Successor to got-scraping |
| 19 | + |
| 20 | +Impit is the successor to `got-scraping`, which is no longer actively maintained. We recommend using `ImpitHttpClient` for all new projects. Impit provides better anti-bot evasion through TLS fingerprinting and HTTP/3 support, while maintaining a smaller package size. |
| 21 | + |
| 22 | +**Impit will become the default HTTP client in the next major version of Crawlee.** |
| 23 | + |
| 24 | +::: |
| 25 | + |
| 26 | +### Why use Impit? |
| 27 | + |
| 28 | +Websites increasingly use sophisticated bot detection that analyzes: |
| 29 | + |
| 30 | +- **HTTP fingerprints**: User-Agent strings, header ordering, HTTP/2 pseudo-header sequences |
| 31 | +- **TLS fingerprints**: Cipher suites, TLS extensions, and cryptographic details in the ClientHello message |
| 32 | + |
| 33 | +Standard HTTP clients like `fetch` or `axios` are easily detected because their fingerprints don't match real browsers. Unlike `got-scraping` which only handles HTTP-level fingerprinting, Impit also mimics TLS fingerprints, making requests appear to come from real browsers. |
| 34 | + |
| 35 | +## Installation |
| 36 | + |
| 37 | +Install the `@crawlee/impit-client` package: |
| 38 | + |
| 39 | +```bash npm2yarn |
| 40 | +npm install @crawlee/impit-client |
| 41 | +``` |
| 42 | + |
| 43 | +:::note |
| 44 | + |
| 45 | +The `impit` package includes native binaries and supports Windows, macOS (including ARM), and Linux out of the box. |
| 46 | + |
| 47 | +::: |
| 48 | + |
| 49 | +## Basic usage |
| 50 | + |
| 51 | +Pass the `ImpitHttpClient` instance to the `httpClient` option of any Crawlee crawler: |
| 52 | + |
| 53 | +<CodeBlock language="ts">{BasicUsageSource}</CodeBlock> |
| 54 | + |
| 55 | +## Usage with different crawlers |
| 56 | + |
| 57 | +### CheerioCrawler |
| 58 | + |
| 59 | +<CodeBlock language="ts">{CheerioCrawlerSource}</CodeBlock> |
| 60 | + |
| 61 | +### HttpCrawler |
| 62 | + |
| 63 | +<CodeBlock language="ts">{HttpCrawlerSource}</CodeBlock> |
| 64 | + |
| 65 | +## Configuration options |
| 66 | + |
| 67 | +The `ImpitHttpClient` constructor accepts the following options: |
| 68 | + |
| 69 | +| Option | Type | Default | Description | |
| 70 | +|--------|------|---------|-------------| |
| 71 | +| `browser` | `'chrome'` \| `'firefox'` | `undefined` | Browser to impersonate. Affects TLS fingerprint and default headers. | |
| 72 | +| `http3` | `boolean` | `false` | Enable HTTP/3 (QUIC) protocol support. | |
| 73 | +| `ignoreTlsErrors` | `boolean` | `false` | Ignore TLS certificate errors. Useful for testing or self-signed certificates. | |
| 74 | + |
| 75 | +### Browser impersonation |
| 76 | + |
| 77 | +Use the `Browser` enum to specify which browser to impersonate: |
| 78 | + |
| 79 | +```ts |
| 80 | +import { ImpitHttpClient, Browser } from '@crawlee/impit-client'; |
| 81 | + |
| 82 | +// Impersonate Firefox |
| 83 | +const firefoxClient = new ImpitHttpClient({ browser: Browser.Firefox }); |
| 84 | + |
| 85 | +// Impersonate Chrome |
| 86 | +const chromeClient = new ImpitHttpClient({ browser: Browser.Chrome }); |
| 87 | +``` |
| 88 | + |
| 89 | +### Advanced configuration |
| 90 | + |
| 91 | +<CodeBlock language="ts">{AdvancedConfigSource}</CodeBlock> |
| 92 | + |
| 93 | +## Proxy support |
| 94 | + |
| 95 | +Proxies are configured per-request through Crawlee's proxy management system, not on the `ImpitHttpClient` itself. Use `ProxyConfiguration` as you normally would: |
| 96 | + |
| 97 | +```ts |
| 98 | +import { CheerioCrawler, ProxyConfiguration } from 'crawlee'; |
| 99 | +import { ImpitHttpClient, Browser } from '@crawlee/impit-client'; |
| 100 | + |
| 101 | +const proxyConfiguration = new ProxyConfiguration({ |
| 102 | + proxyUrls: ['http://proxy1.example.com:8080', 'http://proxy2.example.com:8080'], |
| 103 | +}); |
| 104 | + |
| 105 | +const crawler = new CheerioCrawler({ |
| 106 | + httpClient: new ImpitHttpClient({ browser: Browser.Chrome }), |
| 107 | + proxyConfiguration, |
| 108 | + async requestHandler({ $, request }) { |
| 109 | + console.log(`Scraped ${request.url}`); |
| 110 | + }, |
| 111 | +}); |
| 112 | +``` |
| 113 | + |
| 114 | +## How it works |
| 115 | + |
| 116 | +Impit achieves browser impersonation at two levels: |
| 117 | + |
| 118 | +1. **HTTP level**: Mimics browser-specific header ordering, HTTP/2 settings, and pseudo-header sequences that antibot services analyze. |
| 119 | + |
| 120 | +2. **TLS level**: Uses a patched version of `rustls` to replicate the exact TLS ClientHello message that browsers send, including cipher suites and extensions. |
| 121 | + |
| 122 | +This dual-layer approach makes requests appear to come from a real browser, significantly reducing blocks from bot detection systems. |
| 123 | + |
| 124 | +## Comparison with other solutions |
| 125 | + |
| 126 | +| Feature | got-scraping | curl-impersonate | Impit | |
| 127 | +|---------|--------------|------------------|-------| |
| 128 | +| TLS fingerprinting | No | Yes | Yes | |
| 129 | +| HTTP/3 support | No | Yes | Yes | |
| 130 | +| Native Node.js package | Yes | No (child process) | Yes | |
| 131 | +| Windows/macOS ARM | Yes | No | Yes | |
| 132 | +| Package size | ~10 MB | ~20 MB | ~8 MB | |
| 133 | + |
| 134 | +**Related links** |
| 135 | + |
| 136 | +- [Impit GitHub repository](https://github.com/apify/impit) |
| 137 | +- [Custom HTTP Client guide](./custom-http-client) |
| 138 | +- [Proxy Management guide](./proxy-management) |
| 139 | +- [Avoiding blocking guide](./avoid-blocking) |
0 commit comments