Skip to content

Commit fc4c358

Browse files
B4nanclaude
andcommitted
fix: sync v4.0 docs snapshot with updated parallel-scraping examples
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 4063bf6 commit fc4c358

2 files changed

Lines changed: 8 additions & 13 deletions

File tree

website/versioned_docs/version-4.0/guides/parallel-scraping/parallel-scraper.mjs

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -73,15 +73,13 @@ if (!process.env.IN_WORKER_THREAD) {
7373
// or a configuration option. This is just for show 😈
7474
workerLogger.setLevel(log.LEVELS.DEBUG);
7575

76-
// Disable the automatic purge on start
77-
// This is needed when running locally, as otherwise multiple processes will try to clear the default storage (and that will cause clashes)
78-
Configuration.set('purgeOnStart', false);
79-
8076
// Get the request queue
8177
const requestQueue = await getOrInitQueue(false);
8278

83-
// Configure crawlee to store the worker-specific data in a separate directory (needs to be done AFTER the queue is initialized when running locally)
79+
// Disable the automatic purge on start and configure crawlee to store the worker-specific data in a separate directory
80+
// (needs to be done AFTER the queue is initialized when running locally)
8481
const config = new Configuration({
82+
purgeOnStart: false,
8583
storageClientOptions: {
8684
localDataDirectory: `./storage/worker-${process.env.WORKER_INDEX}`,
8785
},

website/versioned_docs/version-4.0/guides/parallel-scraping/parallel-scraping.mdx

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -132,22 +132,19 @@ We use this to ensure the parent process stays alive until all the worker proces
132132

133133
There are three steps we want to do for the worker processes:
134134

135-
- ensure the default storages do **not** get purged on start, as otherwise we'd lose the queue we prepared
136135
- get the queue that supports locking from the same location as the parent process
137-
- initialize a special storage for worker processes so they do not collide with each other
136+
- ensure the default storages do **not** get purged on start, as otherwise we'd lose the queue we prepared, and initialize a special storage for worker processes so they do not collide with each other
138137

139138
In order, that's what these lines do:
140139

141140
```javascript title="src/parallel-scraper.mjs"
142-
// Disable the automatic purge on start (step 1)
143-
// This is needed when running locally, as otherwise multiple processes will try to clear the default storage (and that will cause clashes)
144-
Configuration.set('purgeOnStart', false);
145-
146-
// Get the request queue from the parent process (step 2)
141+
// Get the request queue from the parent process (step 1)
147142
const requestQueue = await getOrInitQueue(false);
148143

149-
// Configure crawlee to store the worker-specific data in a separate directory (needs to be done AFTER the queue is initialized when running locally) (step 3)
144+
// Disable the automatic purge on start and configure crawlee to store the worker-specific data
145+
// in a separate directory (needs to be done AFTER the queue is initialized when running locally) (step 2)
150146
const config = new Configuration({
147+
purgeOnStart: false,
151148
storageClientOptions: {
152149
localDataDirectory: `./storage/worker-${process.env.WORKER_INDEX}`,
153150
},

0 commit comments

Comments
 (0)