You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guides/parallel-scraping/parallel-scraping.mdx
+5-8Lines changed: 5 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -132,22 +132,19 @@ We use this to ensure the parent process stays alive until all the worker proces
132
132
133
133
There are three steps we want to do for the worker processes:
134
134
135
-
- ensure the default storages do **not** get purged on start, as otherwise we'd lose the queue we prepared
136
135
- get the queue that supports locking from the same location as the parent process
137
-
- initialize a special storage for worker processes so they do not collide with each other
136
+
-ensure the default storages do **not** get purged on start, as otherwise we'd lose the queue we prepared, and initialize a special storage for worker processes so they do not collide with each other
138
137
139
138
In order, that's what these lines do:
140
139
141
140
```javascript title="src/parallel-scraper.mjs"
142
-
// Disable the automatic purge on start (step 1)
143
-
// This is needed when running locally, as otherwise multiple processes will try to clear the default storage (and that will cause clashes)
144
-
Configuration.set('purgeOnStart', false);
145
-
146
-
// Get the request queue from the parent process (step 2)
141
+
// Get the request queue from the parent process (step 1)
147
142
constrequestQueue=awaitgetOrInitQueue(false);
148
143
149
-
// Configure crawlee to store the worker-specific data in a separate directory (needs to be done AFTER the queue is initialized when running locally) (step 3)
144
+
// Disable the automatic purge on start and configure crawlee to store the worker-specific data
145
+
// in a separate directory (needs to be done AFTER the queue is initialized when running locally) (step 2)
0 commit comments