You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -115,6 +116,97 @@ The `KeyValueStore.getPublicUrl` method is now asynchronous and reads the public
115
116
116
117
The `preNavigationHooks` option in `HttpCrawler` subclasses no longer accepts the `gotOptions` object as a second parameter. Modify the `crawlingContext` fields (e.g. `.request`) directly instead.
117
118
119
+
## Service management moved from `Configuration` to `ServiceLocator`
120
+
121
+
The service management functionality has been extracted from `Configuration` into a new `ServiceLocator` class, following the pattern established in Crawlee for Python.
122
+
123
+
### Breaking changes
124
+
125
+
The following methods and properties have been removed from `Configuration`:
126
+
127
+
-`Configuration.getStorageClient()` - moved to `ServiceLocator.getStorageClient()`
128
+
-`Configuration.getEventManager()` - moved to `ServiceLocator.getEventManager()`
129
+
-`Configuration.useStorageClient()` - use `ServiceLocator.setStorageClient()` instead
130
+
-`Configuration.useEventManager()` - use `ServiceLocator.setEventManager()` instead
131
+
-`Configuration.storageManagers` - moved to `ServiceLocator.storageManagers`
132
+
133
+
The `EventManager` and `LocalEventManager` constructors now accept an options object for configuring event intervals (e.g. `persistStateIntervalMillis`, `systemInfoIntervalMillis`). You can also use the new `LocalEventManager.fromConfig()` factory method to create an instance with intervals derived from a `Configuration` object.
134
+
135
+
### Migration guide
136
+
137
+
If you were using the removed `Configuration` methods directly, you need to update your code:
The new `ServiceLocator` supports per-crawler service isolation, allowing you to use different storage clients or event managers for different crawlers by passing them via options:
// All crawlers will use the global service locator by default
191
+
const crawler =newBasicCrawler({
192
+
requestHandler: async ({ request, log }) => {
193
+
log.info(`Processing ${request.url}`);
194
+
},
195
+
});
196
+
```
197
+
198
+
### Accessing configuration
199
+
200
+
`Configuration.getGlobalConfig()` remains as a utility function, but in most cases, you should use `serviceLocator.getConfiguration()` instead:
201
+
202
+
```typescript
203
+
import { serviceLocator } from'crawlee';
204
+
205
+
const config =serviceLocator.getConfiguration();
206
+
```
207
+
208
+
Do note that the method is currently misnamed - in specific circumstances, it will not return the global configuration object, but the one from the currently active service locator.
209
+
118
210
## `transformRequestFunction` precedence in `enqueueLinks`
119
211
120
212
The `transformRequestFunction` callback in `enqueueLinks` now runs **after** URL pattern filtering (`globs`, `regexps`, `pseudoUrls`) instead of before. This means it has the highest priority and can overwrite any request options set by patterns or the global `label` option.
0 commit comments