Skip to content

Commit c62a42e

Browse files
B4nanclaude
andcommitted
chore(docs): regenerate v4 docs snapshot
Sync versioned_docs/version-4.0 with current /docs/ and regenerate api-typedoc.json from the latest TypeScript sources. The snapshot still showed the removed Configuration.get/set methods and stale guide copy; regenerating pulls in the new property-based getters and the updated configuration guide. Also adds the custom-logger guide that landed in v4 after the snapshot was taken. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 8404901 commit c62a42e

21 files changed

Lines changed: 74287 additions & 68258 deletions

website/versioned_docs/version-4.0/api-typedoc.json

Lines changed: 73986 additions & 68217 deletions
Large diffs are not rendered by default.

website/versioned_docs/version-4.0/guides/configuration.mdx

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@ There are three ways of changing the configuration parameters:
1515
- using the `Configuration` class
1616

1717
You could also combine all the above, but you should keep in mind, that the precedence for these 3 options is the following:
18-
***`crawlee.json`*** < ***constructor options*** < ***environment variables***.
18+
***constructor options*** > ***environment variables*** > ***`crawlee.json`***.
1919

20-
`crawlee.json` is a baseline. The options provided in the `Configuration` constructor will override the options provided in the JSON. Environment variables will override both.
20+
Constructor options have the highest priority. Environment variables override `crawlee.json`. The JSON file serves as a baseline.
2121

2222
## `crawlee.json`
2323

@@ -133,7 +133,7 @@ the autoscaling feature will only use up to 2048 MB of memory.
133133

134134
## Configuration class
135135

136-
The last option to adjust Crawlee configuration is to use the <ApiLink to="core/class/Configuration">`Configuration`</ApiLink> class in the code.
136+
The last option to adjust Crawlee configuration is to use the <ApiLink to="core/class/Configuration">`Configuration`</ApiLink> class in the code. Configuration is immutable — values are set via the constructor and cannot be changed afterwards.
137137

138138
### Global Configuration
139139

@@ -144,13 +144,17 @@ import { CheerioCrawler, Configuration, sleep } from 'crawlee';
144144

145145
// Get the global configuration
146146
const config = Configuration.getGlobalConfig();
147-
// Set the 'persistStateIntervalMillis' option
148-
// of global configuration to 10 seconds
149-
config.set('persistStateIntervalMillis', 10_000);
147+
// Access configuration values directly as properties
148+
console.log(config.persistStateIntervalMillis);
150149

151-
// Note, that we are not passing the configuration to the crawler
152-
// as it's using the global configuration
153-
const crawler = new CheerioCrawler();
150+
// To use custom configuration values, create a new Configuration instance
151+
const customConfig = new Configuration({
152+
// Set the 'persistStateIntervalMillis' option to 10 seconds
153+
persistStateIntervalMillis: 10_000,
154+
});
155+
156+
// Pass the configuration to the crawler
157+
const crawler = new CheerioCrawler({ configuration: customConfig });
154158

155159
crawler.router.addDefaultHandler(async ({ request }) => {
156160
// For the first request we wait for 5 seconds,
@@ -170,15 +174,13 @@ crawler.router.addDefaultHandler(async ({ request }) => {
170174
await crawler.run(['https://www.example.com/1']);
171175
```
172176

173-
This is pretty much the same example we used for showing `crawlee.json` usage,
174-
but now we're using the global configuration, which is the only difference.
175-
If you run this example - you will find the `SDK_CRAWLER_STATISTICS` file in default Key-Value store as before,
176-
which would show the same number of finishes requests (one) and the same crawler runtime (~10 seconds).
177-
This confirms that provided parameters worked: the state was persisted after 10 seconds, as it was set in the global configuration.
177+
If you run this example - you will find the `SDK_CRAWLER_STATISTICS` file in default Key-Value store,
178+
which would show the same number of finished requests (one) and the same crawler runtime (~10 seconds).
179+
This confirms that provided parameters worked: the state was persisted after 10 seconds, as it was set in the configuration.
178180

179181
:::note
180182

181-
After running the same example with commented two lines of code related to `Configuration` there will be
183+
After running the same example without the custom configuration, there will be
182184
no `SDK_CRAWLER_STATISTICS` file stored in the default Key-Value store:
183185
as we did not change the `persistStateIntervalMillis`, Crawlee used the default value of 60 seconds,
184186
and the crawler was forcefully aborted after ~15 seconds of run time before it persisted the state for the first time.
@@ -187,7 +189,7 @@ and the crawler was forcefully aborted after ~15 seconds of run time before it p
187189

188190
### Custom configuration
189191

190-
Alternatively, you can create a custom configuration. In this case you need to pass it to the class that is going to use it, e.g. to the crawler. Let's adjust the previous example:
192+
You can create a custom configuration and pass it to the crawler via the `configuration` option:
191193

192194
```js
193195
import { CheerioCrawler, Configuration, sleep } from 'crawlee';
@@ -198,8 +200,8 @@ const config = new Configuration({
198200
persistStateIntervalMillis: 10_000,
199201
});
200202

201-
// Now we need to pass the configuration to the crawler
202-
const crawler = new CheerioCrawler({}, config);
203+
// Pass the configuration to the crawler
204+
const crawler = new CheerioCrawler({ configuration: config });
203205

204206
crawler.router.addDefaultHandler(async ({ request }) => {
205207
// for the first request we wait for 5 seconds,
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
---
2+
id: custom-logger
3+
title: Custom logger
4+
description: Use your own logging library (Winston, Pino, etc.) with Crawlee
5+
---
6+
7+
import ApiLink from '@site/src/components/ApiLink';
8+
import Tabs from '@theme/Tabs';
9+
import TabItem from '@theme/TabItem';
10+
import CodeBlock from '@theme/CodeBlock';
11+
12+
import WinstonSource from '!!raw-loader!./winston.ts';
13+
import PinoSource from '!!raw-loader!./pino.ts';
14+
15+
Crawlee uses `@apify/log` as its default logging library, but you can replace it with any logger you prefer, such as Winston or Pino. This is done by implementing a small adapter and passing it to the crawler.
16+
17+
## Creating an adapter
18+
19+
All Crawlee logging goes through the <ApiLink to="core/interface/CrawleeLogger">`CrawleeLogger`</ApiLink> interface. To plug in your own logger, extend the <ApiLink to="core/class/BaseCrawleeLogger">`BaseCrawleeLogger`</ApiLink> abstract class and implement two methods:
20+
21+
- **`logWithLevel(level, message, data)`** — dispatches a log message to your logging library. The `level` parameter uses <ApiLink to="core/enum/LogLevel">`LogLevel`</ApiLink> constants (`ERROR = 1`, `SOFT_FAIL = 2`, `WARNING = 3`, `INFO = 4`, `DEBUG = 5`, `PERF = 6`). Map these to your logger's native levels. The `message` is a human-readable `string`, and `data` is an optional `Record<string, unknown>` with structured context (e.g. `{ url, statusCode }`) — pass it to your logger as metadata or structured fields.
22+
- **`createChild(options)`** — returns a new child logger instance scoped to a specific component. Crawlee calls this internally to give each subsystem (e.g. `CheerioCrawler`, `AutoscaledPool`, `SessionPool`) its own identifiable logger. The `options` parameter is a <ApiLink to="core/interface/CrawleeLoggerOptions">`CrawleeLoggerOptions`</ApiLink> object with a single field: `prefix` — a string label prepended to each log line from that component.
23+
24+
All other methods (`error`, `warning`, `info`, `debug`, `exception`, `perf`, etc.) are derived automatically from `logWithLevel` — you don't need to implement them.
25+
26+
:::info Level filtering
27+
28+
`logWithLevel()` is called for **every** log message, regardless of the configured level. Level filtering is the responsibility of the underlying logging library (e.g. Winston's `level` option or Pino's `level` setting). This means your adapter doesn't need to check log levels — just forward everything and let the library decide what to output.
29+
30+
:::
31+
32+
## Injecting the logger
33+
34+
There are two ways to inject a custom logger: per-crawler and globally.
35+
36+
### Per-crawler logger
37+
38+
Pass your adapter via the `logger` option in the crawler constructor. When a `logger` is provided, the crawler creates its own isolated <ApiLink to="core/class/ServiceLocator">`ServiceLocator`</ApiLink> instance, so the custom logger is used by all internal components of that crawler (autoscaling, session pool, statistics, etc.):
39+
40+
```ts
41+
import { CheerioCrawler } from 'crawlee';
42+
43+
const crawler = new CheerioCrawler({
44+
logger: new WinstonAdapter(winstonLogger),
45+
async requestHandler({ log }) {
46+
// `log` is a child of your custom logger, with prefix set to the crawler class name
47+
log.info('Hello from my custom logger!');
48+
},
49+
});
50+
```
51+
52+
The same logger is available as `crawler.log` outside of the request handler, for example when setting up routes.
53+
54+
### Global logger via service locator
55+
56+
Instead of passing the logger to each crawler individually, you can set it globally via the `serviceLocator`. This is useful when you run multiple crawlers and want them all to use the same logging backend:
57+
58+
```ts
59+
import { serviceLocator, CheerioCrawler, PlaywrightCrawler } from 'crawlee';
60+
61+
// Set the logger globally — must be done before creating any crawlers
62+
serviceLocator.setLogger(new WinstonAdapter(winstonLogger));
63+
64+
// Both crawlers will use the Winston logger
65+
const cheerioCrawler = new CheerioCrawler({ /* ... */ });
66+
const playwrightCrawler = new PlaywrightCrawler({ /* ... */ });
67+
```
68+
69+
:::warning
70+
71+
`serviceLocator.setLogger()` must be called **before** any crawler is created. Once a logger has been retrieved from the service locator (which happens during crawler construction), it cannot be replaced — an error will be thrown.
72+
73+
:::
74+
75+
## Full examples
76+
77+
<Tabs>
78+
<TabItem value="winston" label="Winston" default>
79+
80+
<CodeBlock language="ts">{WinstonSource}</CodeBlock>
81+
82+
</TabItem>
83+
<TabItem value="pino" label="Pino">
84+
85+
<CodeBlock language="ts">{PinoSource}</CodeBlock>
86+
87+
</TabItem>
88+
</Tabs>
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
import { CheerioCrawler, BaseCrawleeLogger, LogLevel } from 'crawlee';
2+
import type { CrawleeLogger, CrawleeLoggerOptions } from 'crawlee';
3+
import pino from 'pino';
4+
5+
// Map Crawlee log levels to Pino levels
6+
const CRAWLEE_TO_PINO: Record<number, string> = {
7+
[LogLevel.ERROR]: 'error',
8+
[LogLevel.SOFT_FAIL]: 'warn',
9+
[LogLevel.WARNING]: 'warn',
10+
[LogLevel.INFO]: 'info',
11+
[LogLevel.DEBUG]: 'debug',
12+
[LogLevel.PERF]: 'trace',
13+
};
14+
15+
class PinoAdapter extends BaseCrawleeLogger {
16+
constructor(
17+
private logger: pino.Logger,
18+
options?: Partial<CrawleeLoggerOptions>,
19+
) {
20+
super(options);
21+
}
22+
23+
logWithLevel(level: number, message: string, data?: Record<string, unknown>): void {
24+
const pinoLevel = CRAWLEE_TO_PINO[level] ?? 'info';
25+
this.logger[pinoLevel as pino.Level](data ?? {}, message);
26+
}
27+
28+
protected createChild(options: Partial<CrawleeLoggerOptions>): CrawleeLogger {
29+
return new PinoAdapter(this.logger.child({ prefix: options.prefix }), { ...this.getOptions(), ...options });
30+
}
31+
}
32+
33+
// Create a Pino logger with your preferred configuration
34+
const pinoLogger = pino({
35+
level: 'debug',
36+
});
37+
38+
// Pass the adapter to the crawler via the `logger` option
39+
const crawler = new CheerioCrawler({
40+
logger: new PinoAdapter(pinoLogger),
41+
async requestHandler({ request, $, log }) {
42+
log.info(`Processing ${request.url}`);
43+
const title = $('title').text();
44+
log.debug('Page title extracted', { title });
45+
},
46+
});
47+
48+
await crawler.run(['https://crawlee.dev']);
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
import { CheerioCrawler, BaseCrawleeLogger, LogLevel } from 'crawlee';
2+
import type { CrawleeLogger, CrawleeLoggerOptions } from 'crawlee';
3+
import winston from 'winston';
4+
5+
// Map Crawlee log levels to Winston levels
6+
const CRAWLEE_TO_WINSTON: Record<number, string> = {
7+
[LogLevel.ERROR]: 'error',
8+
[LogLevel.SOFT_FAIL]: 'warn',
9+
[LogLevel.WARNING]: 'warn',
10+
[LogLevel.INFO]: 'info',
11+
[LogLevel.DEBUG]: 'debug',
12+
[LogLevel.PERF]: 'debug',
13+
};
14+
15+
class WinstonAdapter extends BaseCrawleeLogger {
16+
constructor(
17+
private logger: winston.Logger,
18+
options?: Partial<CrawleeLoggerOptions>,
19+
) {
20+
super(options);
21+
}
22+
23+
logWithLevel(level: number, message: string, data?: Record<string, unknown>): void {
24+
const winstonLevel = CRAWLEE_TO_WINSTON[level] ?? 'info';
25+
this.logger.log(winstonLevel, message, data);
26+
}
27+
28+
protected createChild(options: Partial<CrawleeLoggerOptions>): CrawleeLogger {
29+
return new WinstonAdapter(this.logger.child({ prefix: options.prefix }), { ...this.getOptions(), ...options });
30+
}
31+
}
32+
33+
// Create a Winston logger with your preferred configuration
34+
const winstonLogger = winston.createLogger({
35+
level: 'debug',
36+
format: winston.format.combine(
37+
winston.format.colorize(),
38+
winston.format.timestamp(),
39+
winston.format.printf(({ level, message, timestamp, prefix }) => {
40+
const tag = prefix ? `[${prefix}] ` : '';
41+
return `${timestamp} ${level}: ${tag}${message}`;
42+
}),
43+
),
44+
transports: [new winston.transports.Console()],
45+
});
46+
47+
// Pass the adapter to the crawler via the `logger` option
48+
const crawler = new CheerioCrawler({
49+
logger: new WinstonAdapter(winstonLogger),
50+
async requestHandler({ request, $, log }) {
51+
log.info(`Processing ${request.url}`);
52+
const title = $('title').text();
53+
log.debug('Page title extracted', { title });
54+
},
55+
});
56+
57+
await crawler.run(['https://crawlee.dev']);

website/versioned_docs/version-4.0/guides/http-clients/cheerio-got-scraping-example.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
import { CheerioCrawler, GotScrapingHttpClient } from 'crawlee';
1+
import { CheerioCrawler } from 'crawlee';
2+
import { GotScrapingHttpClient } from '@crawlee/got-scraping-client';
23

34
const crawler = new CheerioCrawler({
45
httpClient: new GotScrapingHttpClient(),

website/versioned_docs/version-4.0/guides/impit-http-client/basic-usage.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ const crawler = new BasicCrawler({
77
}),
88
async requestHandler({ sendRequest, log }) {
99
const response = await sendRequest();
10-
log.info('Received response', { statusCode: response.statusCode });
10+
log.info('Received response', { status: response.status });
1111
},
1212
});
1313

website/versioned_docs/version-4.0/guides/proxy_management_session_cheerio.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ const proxyConfiguration = new ProxyConfiguration({
55
});
66

77
const crawler = new CheerioCrawler({
8-
useSessionPool: true,
98
persistCookiesPerSession: true,
109
proxyConfiguration,
1110
// ...

website/versioned_docs/version-4.0/guides/proxy_management_session_http.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ const proxyConfiguration = new ProxyConfiguration({
55
});
66

77
const crawler = new HttpCrawler({
8-
useSessionPool: true,
98
persistCookiesPerSession: true,
109
proxyConfiguration,
1110
// ...

website/versioned_docs/version-4.0/guides/proxy_management_session_jsdom.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ const proxyConfiguration = new ProxyConfiguration({
55
});
66

77
const crawler = new JSDOMCrawler({
8-
useSessionPool: true,
98
persistCookiesPerSession: true,
109
proxyConfiguration,
1110
// ...

0 commit comments

Comments
 (0)