You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: fix inaccuracies and expand SDK documentation
Correct several factual errors verified against the codebase (Python 3.11
requirement, web server config properties, Actor event payload types, a
stale env var, and an empty API link), fix a latent runtime bug in the
events snippet, and fill gaps by adding a Storage clients page, documenting
Actor Standby, and expanding the Introduction and Configuration pages.
The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides useful features like Actor lifecycle management, local storage emulation, and Actor event handling.
12
+
The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It gives you everything you need to build an Actor and run it both locally and on the [Apify platform](https://docs.apify.com/platform), including:
13
+
14
+
-**Actor lifecycle management** — initialization, graceful shutdown, status messages, rebooting, and metamorphing.
15
+
-**Storage access** — datasets, key-value stores, and request queues, with automatic local emulation when running outside the platform.
16
+
-**Actor input** — convenient access to the Actor input, including automatic decryption of secret fields.
17
+
-**Events & state persistence** — react to platform events (system info, migration, abort) and persist state across migrations and restarts.
18
+
-**Proxy management** — Apify Proxy and custom proxies, with session and tiered-proxy support.
19
+
-**Platform interaction** — start, call, and abort other Actors and tasks, create webhooks, and reach the full Apify API client.
20
+
-**Monetization** — charge users with the pay-per-event pricing model.
21
+
-**Framework integrations** — first-class support for [Crawlee](../guides/crawlee) and [Scrapy](../guides/scrapy).
13
22
14
23
<CodeBlockclassName="language-python">
15
24
{IntroductionExample}
@@ -29,7 +38,7 @@ Explore the Guides section in the sidebar for a deeper understanding of the SDK'
29
38
30
39
## Installation
31
40
32
-
The Apify SDK for Python requires Python version 3.10 or above. It is typically installed when you create a new Actor project using the [Apify CLI](https://docs.apify.com/cli). To install it manually in an existing project, use:
41
+
The Apify SDK for Python requires Python version 3.11 or above. It is typically installed when you create a new Actor project using the [Apify CLI](https://docs.apify.com/cli). To install it manually in an existing project, use:
Copy file name to clipboardExpand all lines: docs/02_concepts/01_actor_lifecycle.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -106,4 +106,4 @@ Update the status only when the user's understanding of progress changes - avoid
106
106
107
107
## Conclusion
108
108
109
-
This page has presented the full Actor lifecycle: initialization, execution, error handling, rebooting, shutdown and status messages. You've seen how the SDK supports both context-based and manual control patterns. For deeper dives, explore the <ApiLinkto="">reference docs</ApiLink>, [guides](https://docs.apify.com/sdk/python/docs/guides/beautifulsoup-httpx), and [platform documentation](https://docs.apify.com/platform).
109
+
This page has presented the full Actor lifecycle: initialization, execution, error handling, rebooting, shutdown and status messages. You've seen how the SDK supports both context-based and manual control patterns. For deeper dives, explore the <ApiLinkto="class/Actor">`Actor` API reference</ApiLink>, [guides](https://docs.apify.com/sdk/python/docs/guides/beautifulsoup-httpx), and [platform documentation](https://docs.apify.com/platform).
Copy file name to clipboardExpand all lines: docs/02_concepts/03_storages.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -183,6 +183,6 @@ To check if all the requests in the queue are handled, you can use the <ApiLink
183
183
184
184
## Storage clients
185
185
186
-
Behind the scenes, the SDK uses storage clients to communicate with the storage backend. The appropriate client is selected automatically based on the runtime environment — on the Apify platform, data is persisted via the Apify API, while local runs use the filesystem. For most use cases, you don't need to think about storage clients at all. If you want to learn more about how storage clients work, the available implementations, or how to configure them, see the [Crawlee storage clients guide](https://crawlee.dev/python/docs/guides/storage-clients). The Apify-specific clients are available in the `apify.storage_clients` module.
186
+
Behind the scenes, the SDK uses storage clients to communicate with the storage backend. The appropriate client is selected automatically based on the runtime environment — on the Apify platform, data is persisted via the Apify API, while local runs use the filesystem. For most use cases, you don't need to think about storage clients at all. To learn about the available implementations, how to switch between a single and shared request queue, or how to configure a custom client, see the [Storage clients](./storage-clients) page. For a deeper look at how storage clients work internally, see the [Crawlee storage clients guide](https://crawlee.dev/python/docs/guides/storage-clients).
187
187
188
188
For comprehensive information about storage on the Apify platform, see the [storage documentation](https://docs.apify.com/platform/storage), including the pages on [datasets](https://docs.apify.com/platform/storage/dataset), [key-value stores](https://docs.apify.com/platform/storage/key-value-store), and [request queues](https://docs.apify.com/platform/storage/request-queue).
Copy file name to clipboardExpand all lines: docs/02_concepts/04_actor_events.mdx
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,8 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o
14
14
15
15
## Event types
16
16
17
+
A listener can optionally receive a single argument — a Pydantic model with the event's data. The table below lists the events, the type of that data object, and when each event is emitted.
18
+
17
19
<table>
18
20
<thead>
19
21
<tr>
@@ -25,25 +27,23 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o
Copy file name to clipboardExpand all lines: docs/02_concepts/10_configuration.mdx
+18-3Lines changed: 18 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,14 +27,29 @@ This will cause the Actor to persist its state every 10 seconds:
27
27
28
28
## Configuring via environment variables
29
29
30
-
All the configuration options can be set via environment variables. The environment variables are prefixed with `APIFY_`, and the configuration options are in uppercase, with underscores as separators. See the <ApiLinkto="class/Configuration">`Configuration`</ApiLink> API reference for the full list of configuration options.
30
+
All configuration options can also be set via environment variables. Most options are read from an environment variable named after the option in uppercase; many options accept several aliases — commonly with an `APIFY_`, `ACTOR_`, or `CRAWLEE_` prefix. See the <ApiLinkto="class/Configuration">`Configuration`</ApiLink> API reference for the full list of configuration options.
31
31
32
-
This Actor run will not persist its local storages to the filesystem:
32
+
For example, this Actor run will keep the contents of its local storages instead of purging them on start:
33
33
34
34
```bash
35
-
APIFY_PERSIST_STORAGE=0 apify run
35
+
APIFY_PURGE_ON_START=0 apify run
36
36
```
37
37
38
+
### Commonly used options
39
+
40
+
The table below lists a few options you are most likely to set yourself. When running on the Apify platform or via the Apify CLI, the platform-related options are populated automatically.
|`token`|`APIFY_TOKEN`|`None`| API token used to authenticate calls to the Apify API. |
45
+
|`proxy_password`|`APIFY_PROXY_PASSWORD`|`None`| Password for [Apify Proxy](https://docs.apify.com/proxy). |
46
+
|`purge_on_start`|`APIFY_PURGE_ON_START`|`True`| Whether to purge local storages when the Actor starts. |
47
+
|`persist_state_interval`|`APIFY_PERSIST_STATE_INTERVAL_MILLIS`|`1 min`| How often the `PERSIST_STATE` event is emitted (the variable is in milliseconds). |
48
+
|`log_level`|`APIFY_LOG_LEVEL`|`'INFO'`| Minimum severity of log messages that are printed. |
49
+
|`headless`|`APIFY_HEADLESS`|`True`| Whether to run browsers in headless mode. |
50
+
|`storage_dir`|`APIFY_LOCAL_STORAGE_DIR`|`'./storage'`| Directory holding local storages when running outside the platform. |
51
+
|`is_at_home`|`APIFY_IS_AT_HOME`|`False`| Set by the platform — `True` when the Actor runs on Apify. |
52
+
38
53
## Reading the runtime environment
39
54
40
55
The <ApiLinkto="class/Actor#get_env">`Actor.get_env`</ApiLink> method returns a dictionary with all `APIFY_*` environment variables parsed into their typed values. This is useful for inspecting the Actor's runtime context, such as the Actor ID, run ID, or default storage IDs. Variables that are not set or are invalid will have a value of `None`.
Storage clients are the components that actually read and write your [storages](./storages) — datasets, key-value stores, and request queues. The Apify SDK selects an appropriate client automatically based on where the Actor runs, so for most Actors you never need to think about them. This page explains the available clients and how to customize them when you do.
14
+
15
+
## How the Actor selects a storage client
16
+
17
+
By default, the Actor uses a <ApiLinkto="class/SmartApifyStorageClient">`SmartApifyStorageClient`</ApiLink> — a hybrid client that delegates to one of two underlying clients depending on the environment:
18
+
19
+
- When running **on the Apify platform** (detected automatically), or when you pass `force_cloud=True`, it uses the **cloud** client — <ApiLinkto="class/ApifyStorageClient">`ApifyStorageClient`</ApiLink>, which persists data through the Apify API.
20
+
- When running **locally**, it uses the **local** client — <ApiLinkto="class/FileSystemStorageClient">`FileSystemStorageClient`</ApiLink>, which emulates platform storages on your filesystem under the `storage` folder.
21
+
22
+
This is what lets the same Actor code run unchanged both locally and on the platform.
23
+
24
+
## Available storage clients
25
+
26
+
The `apify.storage_clients` module provides the following clients:
27
+
28
+
- <ApiLinkto="class/SmartApifyStorageClient">`SmartApifyStorageClient`</ApiLink> — the default hybrid client described above. It wraps a `cloud_storage_client` and a `local_storage_client` and routes each call to the right one.
29
+
- <ApiLinkto="class/ApifyStorageClient">`ApifyStorageClient`</ApiLink> — talks to the Apify API. Used as the cloud client.
30
+
- <ApiLinkto="class/FileSystemStorageClient">`FileSystemStorageClient`</ApiLink> — persists data to the local filesystem. Used as the default local client.
31
+
- <ApiLinkto="class/MemoryStorageClient">`MemoryStorageClient`</ApiLink> — keeps everything in memory only; nothing is persisted. Useful for tests and short-lived runs.
32
+
33
+
## Single vs. shared request queue
34
+
35
+
`ApifyStorageClient` supports two ways of accessing the Apify request queue, selected via its `request_queue_access` argument:
36
+
37
+
-**`'single'`** (default) — optimized for a single consumer. It makes far fewer API calls, so it is cheaper and faster, but it does not support multiple clients consuming the same queue concurrently. This is the right choice for the majority of Actors.
38
+
-**`'shared'`** — supports multiple consumers working on the same queue at the same time, at the cost of more API calls.
39
+
40
+
To opt into the shared client, set it as the cloud client of the `SmartApifyStorageClient` in the [service locator](https://crawlee.dev/python/docs/guides/service-locator) before entering the Actor context:
When developing locally, storages are read from and written to the local filesystem by default. To work with a storage on the Apify platform instead — for example, to read the output of a remote Actor run — pass `force_cloud=True` to <ApiLinkto="class/Actor#open_dataset">`Actor.open_dataset`</ApiLink>, <ApiLinkto="class/Actor#open_key_value_store">`Actor.open_key_value_store`</ApiLink>, or <ApiLinkto="class/Actor#open_request_queue">`Actor.open_request_queue`</ApiLink>. This requires an Apify token, provided via the `APIFY_TOKEN` environment variable.
49
+
50
+
## Customizing the storage client
51
+
52
+
You can replace either of the underlying clients — for example, to keep all local data in memory instead of on disk. To do this, set a `SmartApifyStorageClient` with your chosen sub-clients in the service locator **before** entering the Actor context (or awaiting <ApiLinkto="class/Actor#init">`Actor.init`</ApiLink>):
The Actor's storage client must be a `SmartApifyStorageClient`. Setting a bare `ApifyStorageClient` or `MemoryStorageClient` directly in the service locator raises an error — wrap it in a `SmartApifyStorageClient` as shown above.
61
+
62
+
:::
63
+
64
+
For a deeper look at how storage clients work and how to write your own, see the [Crawlee storage clients guide](https://crawlee.dev/python/docs/guides/storage-clients).
0 commit comments