refactor!: Make the StorageManager more like StorageInstanceManager from crawlee-python#3584
refactor!: Make the StorageManager more like StorageInstanceManager from crawlee-python#3584
Conversation
fc1d565 to
afc367c
Compare
fc3505c to
4f69d63
Compare
barjin
left a comment
There was a problem hiding this comment.
Thank you @janbuchar !
Here are a few ideas while I still have the context from the previous PR.
| client, | ||
| config, | ||
| clientOpener, | ||
| clientCacheKey, |
There was a problem hiding this comment.
With client, clientOpener, and clientCacheKey, the client- prefix is imo too overloaded to read this comfortably.
Maybe it's time to switch to the backend naming? iiuc, client and clientCacheKey refer to the "big" client (backend), and clientOpener refers to the DatasetClient etc., right?
| const instance = new cls( | ||
| { | ||
| id: storageInfo.id, | ||
| name: storageInfo.name, | ||
| client: subClient, | ||
| }, | ||
| serviceLocator.getConfiguration(), | ||
| ) as T; |
There was a problem hiding this comment.
As discussed in the previous PR (and tracked in this issue - #3594), shouldn't subClient be enough?
Additionally, won't the storage frontends fetch the configuration from the serviceLocator by default?
barjin
left a comment
There was a problem hiding this comment.
Thank you @janbuchar , I have a few more ideas / questions, see below ⬇️
| // pass in failed requests back to the `crawler.run()`, otherwise they would be considered as handled and | ||
| // ignored - as a failed requests is still handled. | ||
| if (this.requestQueue?.name === 'default' && purgeRequestQueue) { | ||
| const isDefaultQueue = this.requestQueue?.name === 'default' || this.requestQueue?.name === '__default__'; |
There was a problem hiding this comment.
| const isDefaultQueue = this.requestQueue?.name === 'default' || this.requestQueue?.name === '__default__'; | |
| const isDefaultQueue = this.requestQueue?.name === 'default' || this.requestQueue?.alias === '__default__'; |
Is this right? I haven't got used to this new paradigm yet, but __default__ seems to be referenced to as an alias throughout this PR.
| for (const instance of ServiceLocator.storageInstanceManager.getAllInstances()) { | ||
| if ('clearCache' in instance && typeof (instance as any).clearCache === 'function') { | ||
| (instance as any).clearCache(); | ||
| } |
There was a problem hiding this comment.
What is the reason for the ServiceLocator to access the internals of StorageInstanceManager? Couldn't we extract this to storageInstanceManager.clearCache?
Additionally, if this happens, do we need ServiceLocator.clearStorageManagerCache? Wouldn't calling ServiceLocator.storageInstanceManager.clearCache(); suffice?
| } | ||
|
|
||
| if (typeof identifier === 'string') { | ||
| if (client.storageExists && (await client.storageExists(identifier, 'Dataset'))) { |
There was a problem hiding this comment.
Why is Dataset hardcoded here? What if we want to spawn, e.g., a named RequestQueue or KVS with this plain string arg? 🤔
There was a problem hiding this comment.
Why are we even using string names here btw? Bundlers hate this (they mangle class names), and I would love v4 to be bundler-friendly.
There was a problem hiding this comment.
This is a string enum, it's not compared to a constructor.name or anything. But what Jindra says is valid, the storage type should be an argument to resolveStorageIdentifier and passed to storageExists.
Uh oh!
There was an error while loading. Please reload this page.