| id | upgrading-to-v3 |
|---|---|
| title | Upgrading to v3 |
This page summarizes the breaking changes between Apify Python SDK v2.x and v3.0.
Support for Python 3.9 has been dropped. The Apify Python SDK v3.x now requires Python 3.10 or later. Make sure your environment is running a compatible version before upgrading.
Apify Python SDK v3.0 includes Crawlee v1.0, which brings significant changes to the storage APIs. In Crawlee v1.0, the Dataset, KeyValueStore, and RequestQueue storage APIs have been updated for consistency and simplicity. Below is a detailed overview of what's new, what's changed, and what's been removed.
See the Crawlee's Storages guide for more details.
The Dataset API now includes several new methods, such as:
get_metadata- retrieves metadata information for the dataset.purge- completely clears the dataset, including all items (keeps the metadata only).list_items- returns the dataset's items in a list format.
Some older methods have been removed or replaced:
from_storage_objectconstructor has been removed. You should now use theopenmethod with either anameoridparameter.get_infomethod and thestorage_objectproperty have been replaced by the newget_metadatamethod.set_metadatamethod has been removed.write_to_jsonandwrite_to_csvmethods have been removed; instead, use theexport_tomethod for exporting data in different formats.
The KeyValueStore API now includes several new methods, such as:
get_metadata- retrieves metadata information for the key-value store.purge- completely clears the key-value store, removing all keys and values (keeps the metadata only).delete_value- deletes a specific key and its associated value.list_keys- lists all keys in the key-value store.
Some older methods have been removed or replaced:
from_storage_object- removed; use theopenmethod with either anameoridinstead.get_infoandstorage_object- replaced by the newget_metadatamethod.set_metadatamethod has been removed.
The RequestQueue API now includes several new methods, such as:
get_metadata- retrieves metadata information for the request queue.purge- completely clears the request queue, including all pending and processed requests (keeps the metadata only).add_requests- replaces the previousadd_requests_batchedmethod, offering the same functionality under a simpler name.
Some older methods have been removed or replaced:
from_storage_object- removed; use theopenmethod with either anameoridinstead.get_infoandstorage_object- replaced by the newget_metadatamethod.get_requesthas argumentunique_keyinstead ofrequest_idas theidfield was removed from theRequest.set_metadatamethod has been removed.
Some changes in the related model classes:
resource_directoryinRequestQueueMetadata- removed; use the correspondingpath_to_*property instead.statsfield inRequestQueueMetadata- removed as it was unused.RequestQueueHead- replaced byRequestQueueHeadWithLocks.
Actor.configproperty has been removed. UseActor.configurationinstead.
Actor initialization and global service_locator services setup is more strict and predictable.
- Services in
Actorcan't be changed after callingActor.init, entering theasync with Actorcontext manager or after requesting them from theActor. - Services in
Actorcan be different from services in Crawler.
Now (v3.0):
from crawlee.crawlers import BasicCrawler
from crawlee.storage_clients import MemoryStorageClient
from crawlee.configuration import Configuration
from crawlee.events import LocalEventManager
from apify import Actor
async def main():
async with Actor():
# This crawler will use same services as Actor and global service_locator
crawler_1 = BasicCrawler()
# This crawler will use custom services
custom_configuration = Configuration()
custom_event_manager = LocalEventManager.from_config(custom_configuration)
custom_storage_client = MemoryStorageClient()
crawler_2 = BasicCrawler(
configuration=custom_configuration,
event_manager=custom_event_manager,
storage_client=custom_storage_client,
)Actor.configproperty has been removed. UseActor.configurationinstead.
Configuration.default_key_value_store_idchanged from'default'toNone.Configuration.default_dataset_idchanged from'default'toNone.Configuration.default_request_queue_idchanged from'default'toNone.
Previously using the default storage without specifying its id in Configuration would lead to using specific storage with id 'default'. Now it will use newly created unnamed storage with 'id' assigned by the Apify platform, consecutive calls to get the default storage will return the same storage.