Skip to content

Commit 6d81eac

Browse files
committed
docs: fix inaccuracies, unify wording, and expand SDK documentation
1 parent 0f4dd21 commit 6d81eac

22 files changed

Lines changed: 253 additions & 58 deletions

docs/01_introduction/index.mdx

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,16 @@ import CodeBlock from '@theme/CodeBlock';
99

1010
import IntroductionExample from '!!raw-loader!./code/01_introduction.py';
1111

12-
The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides useful features like Actor lifecycle management, local storage emulation, and Actor event handling.
12+
The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It gives you everything you need to build an Actor and run it both locally and on the [Apify platform](https://docs.apify.com/platform), including:
13+
14+
- **Actor lifecycle management** — initialization, graceful shutdown, status messages, rebooting, and metamorphing.
15+
- **Storage access** — datasets, key-value stores, and request queues, with automatic local emulation when running outside the platform.
16+
- **Actor input** — convenient access to the Actor input, including automatic decryption of secret fields.
17+
- **Events & state persistence** — react to platform events (system info, migration, abort) and persist state across migrations and restarts.
18+
- **Proxy management** — Apify Proxy and custom proxies, with session and tiered-proxy support.
19+
- **Platform interaction** — start, call, and abort other Actors and tasks, create webhooks, and reach the full Apify API client.
20+
- **Monetization** — charge users with the pay-per-event pricing model.
21+
- **Framework integrations** — first-class support for [Crawlee](../guides/crawlee) and [Scrapy](../guides/scrapy), with guides for [Playwright](../guides/playwright) and others.
1322

1423
<CodeBlock className="language-python">
1524
{IntroductionExample}
@@ -29,7 +38,7 @@ Explore the Guides section in the sidebar for a deeper understanding of the SDK'
2938

3039
## Installation
3140

32-
The Apify SDK for Python requires Python version 3.10 or above. It is typically installed when you create a new Actor project using the [Apify CLI](https://docs.apify.com/cli). To install it manually in an existing project, use:
41+
The Apify SDK for Python requires Python version 3.11 or above. It is typically installed when you create a new Actor project using the [Apify CLI](https://docs.apify.com/cli). To install it manually in an existing project, use:
3342

3443
```bash
3544
pip install apify

docs/01_introduction/quick-start.mdx

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,15 +59,15 @@ The Actor's runtime dependencies are specified in the `requirements.txt` file, w
5959
The Actor's source code is in the `src` folder. This folder contains two important files:
6060

6161
- `main.py` - which contains the main function of the Actor
62-
- `__main__.py` - which is the entrypoint of the Actor package setting up the Actor [logger](../concepts/logging) and executing the Actor's main function via [`asyncio.run()`](https://docs.python.org/3/library/asyncio-runner.html#asyncio.run).
62+
- `__main__.py` - which is the entrypoint of the Actor package, executing the Actor's main function via [`asyncio.run()`](https://docs.python.org/3/library/asyncio-runner.html#asyncio.run).
6363

6464
<Tabs>
6565
<TabItem value="main.py" label="main.py" default>
6666
<CodeBlock className="language-python">
6767
{MainExample}
6868
</CodeBlock>
6969
</TabItem>
70-
<TabItem value="__main__.py" label="__main.py__">
70+
<TabItem value="__main__.py" label="__main__.py">
7171
<CodeBlock className="language-python">
7272
{UnderscoreMainExample}
7373
</CodeBlock>
@@ -79,13 +79,15 @@ We recommend keeping the entrypoint for the Actor in the `src/__main__.py` file.
7979

8080
## Next steps
8181

82+
Now that you can create and run an Actor locally, explore the rest of the SDK's features and its framework integrations.
83+
8284
### Concepts
8385

8486
To learn more about the features of the Apify SDK and how to use them, check out the Concepts section in the sidebar:
8587

8688
- [Actor lifecycle](../concepts/actor-lifecycle)
8789
- [Actor input](../concepts/actor-input)
88-
- [Working with storages](../concepts/storages)
90+
- [Storages](../concepts/storages)
8991
- [Actor events & state persistence](../concepts/actor-events)
9092
- [Proxy management](../concepts/proxy-management)
9193
- [Interacting with other Actors](../concepts/interacting-with-other-actors)
@@ -94,6 +96,7 @@ To learn more about the features of the Apify SDK and how to use them, check out
9496
- [Logging](../concepts/logging)
9597
- [Actor configuration](../concepts/actor-configuration)
9698
- [Pay-per-event monetization](../concepts/pay-per-event)
99+
- [Storage clients](../concepts/storage-clients)
97100

98101
### Guides
99102

docs/02_concepts/01_actor_lifecycle.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,4 +106,4 @@ Update the status only when the user's understanding of progress changes - avoid
106106

107107
## Conclusion
108108

109-
This page has presented the full Actor lifecycle: initialization, execution, error handling, rebooting, shutdown and status messages. You've seen how the SDK supports both context-based and manual control patterns. For deeper dives, explore the <ApiLink to="">reference docs</ApiLink>, [guides](https://docs.apify.com/sdk/python/docs/guides/beautifulsoup-httpx), and [platform documentation](https://docs.apify.com/platform).
109+
This page has presented the full Actor lifecycle: initialization, execution, error handling, rebooting, shutdown and status messages. You've seen how the SDK supports both context-based and manual control patterns. For deeper dives, explore the <ApiLink to="class/Actor">`Actor`</ApiLink> API reference, [guides](../guides/beautifulsoup-httpx), and [platform documentation](https://docs.apify.com/platform).

docs/02_concepts/02_actor_input.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ import ApiLink from '@theme/ApiLink';
1212

1313
The Actor gets its [input](https://docs.apify.com/platform/actors/running/input) from the input record in its default [key-value store](https://docs.apify.com/platform/storage/key-value-store).
1414

15-
To access it, instead of reading the record manually, you can use the <ApiLink to="class/Actor#get_input">`Actor.get_input`</ApiLink> convenience method. It will get the input record key from the Actor configuration, read the record from the default key-value store,and decrypt any [secret input fields](https://docs.apify.com/platform/actors/development/secret-input).
15+
To access it, instead of reading the record manually, you can use the <ApiLink to="class/Actor#get_input">`Actor.get_input`</ApiLink> convenience method. It will get the input record key from the Actor configuration, read the record from the default key-value store, and decrypt any [secret input fields](https://docs.apify.com/platform/actors/development/secret-input).
1616

1717
For example, if an Actor received a JSON input with two fields, `{ "firstNumber": 1, "secondNumber": 2 }`, this is how you might process it:
1818

@@ -34,4 +34,8 @@ The Apify platform supports [secret input fields](https://docs.apify.com/platfor
3434

3535
No special handling is needed in your code — when you call <ApiLink to="class/Actor#get_input">`Actor.get_input`</ApiLink>, encrypted fields are automatically decrypted using the Actor's private key, which is provided by the platform via environment variables. You receive the plaintext values directly.
3636

37+
## Conclusion
38+
39+
This page has shown how to read Actor input with <ApiLink to="class/Actor#get_input">`Actor.get_input`</ApiLink>, how to load URL sources with <ApiLink to="class/ApifyRequestList">`ApifyRequestList`</ApiLink>, and how secret input fields are decrypted automatically when you read them.
40+
3741
For more details on Actor input and how to define input schemas, see the [Actor input](https://docs.apify.com/platform/actors/running/input) and [input schema](https://docs.apify.com/platform/actors/development/input-schema) documentation on the Apify platform.

docs/02_concepts/03_storages.mdx

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
id: storages
3-
title: Working with storages
3+
title: Storages
44
description: Use datasets, key-value stores, and request queues to persist Actor data.
55
---
66

@@ -45,11 +45,11 @@ Each dataset item, key-value store record, or request in a request queue is then
4545

4646
When developing locally, opening any storage will by default use local storage. To change this behavior and to use remote storage you have to use `force_cloud=True` argument in <ApiLink to="class/Actor#open_dataset">`Actor.open_dataset`</ApiLink>, <ApiLink to="class/Actor#open_request_queue">`Actor.open_request_queue`</ApiLink> or <ApiLink to="class/Actor#open_key_value_store">`Actor.open_key_value_store`</ApiLink>. Proper use of this argument allows you to work with both local and remote storages.
4747

48-
Calling another remote Actor and accessing its default storage is typical use-case for using `force-cloud=True` argument to open remote Actor's storages.
48+
Calling another remote Actor and accessing its default storage is a typical use-case for using `force_cloud=True` argument to open remote Actor's storages.
4949

5050
### Local storage persistence
5151

52-
By default, the storage contents are persisted across multiple Actor runs. To clean up the Actor storages before the running the Actor, use the `--purge` flag of the [`apify run`](https://docs.apify.com/cli/docs/reference#apify-run) command of the Apify CLI.
52+
By default, the storage contents are persisted across multiple Actor runs. To clean up the Actor storages before running the Actor, use the `--purge` flag of the [`apify run`](https://docs.apify.com/cli/docs/reference#apify-run) command of the Apify CLI.
5353

5454
```bash
5555
apify run --purge
@@ -106,8 +106,8 @@ To get an iterator of the data, you can use the <ApiLink to="class/Dataset#itera
106106
### Exporting items
107107

108108
You can also export the dataset items into a key-value store, as either a CSV or a JSON record,
109-
using the <ApiLink to="class/Dataset#export_to_csv">`Dataset.export_to_csv`</ApiLink>
110-
or <ApiLink to="class/Dataset#export_to_json">`Dataset.export_to_json`</ApiLink> method.
109+
using the <ApiLink to="class/Dataset#export_to">`Dataset.export_to`</ApiLink> method with the
110+
`content_type` argument set to `'csv'` or `'json'`.
111111

112112
<RunnableCodeBlock className="language-python" language="python">
113113
{DatasetExportsExample}
@@ -183,6 +183,10 @@ To check if all the requests in the queue are handled, you can use the <ApiLink
183183

184184
## Storage clients
185185

186-
Behind the scenes, the SDK uses storage clients to communicate with the storage backend. The appropriate client is selected automatically based on the runtime environment — on the Apify platform, data is persisted via the Apify API, while local runs use the filesystem. For most use cases, you don't need to think about storage clients at all. If you want to learn more about how storage clients work, the available implementations, or how to configure them, see the [Crawlee storage clients guide](https://crawlee.dev/python/docs/guides/storage-clients). The Apify-specific clients are available in the `apify.storage_clients` module.
186+
Behind the scenes, the SDK uses storage clients to communicate with the storage backend. The appropriate client is selected automatically based on the runtime environment — on the Apify platform, data is persisted via the Apify API, while local runs use the filesystem. For most use cases, you don't need to think about storage clients at all. To learn about the available implementations, how to switch between a single and shared request queue, or how to configure a custom client, see the [Storage clients](./storage-clients) page. For a deeper look at how storage clients work internally, see the [Crawlee storage clients guide](https://crawlee.dev/python/docs/guides/storage-clients).
187+
188+
## Conclusion
189+
190+
This page has covered the three storage types — datasets, key-value stores, and request queues — how they are emulated on the local filesystem, how to open named and unnamed storages, and how to read from and write to each through the `Actor` shortcuts and the storage classes.
187191

188192
For comprehensive information about storage on the Apify platform, see the [storage documentation](https://docs.apify.com/platform/storage), including the pages on [datasets](https://docs.apify.com/platform/storage/dataset), [key-value stores](https://docs.apify.com/platform/storage/key-value-store), and [request queues](https://docs.apify.com/platform/storage/request-queue).

docs/02_concepts/04_actor_events.mdx

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o
1414

1515
## Event types
1616

17+
A listener can optionally receive a single argument — a Pydantic model with the event's data. The table below lists the events, the type of that data object, and when each event is emitted.
18+
1719
<table>
1820
<thead>
1921
<tr>
@@ -25,25 +27,23 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o
2527
<tbody>
2628
<tr>
2729
<td><code>SYSTEM_INFO</code></td>
28-
<td><pre>{`{
29-
"created_at": datetime,
30-
"cpu_current_usage": float,
31-
"mem_current_bytes": int,
32-
"is_cpu_overloaded": bool
33-
}`}
34-
</pre></td>
30+
<td><ApiLink to="class/EventSystemInfoData"><code>EventSystemInfoData</code></ApiLink></td>
3531
<td>
36-
<p>This event is emitted regularly and it indicates the current resource usage of the Actor.</p>
37-
The <code>is_cpu_overloaded</code> argument indicates whether the current CPU usage is higher than <code>Config.max_used_cpu_ratio</code>
32+
<p>Emitted regularly to report the Actor's current resource usage. The
33+
<code>cpu_info.used_ratio</code> field reports the fraction of CPU currently in use
34+
(a float between <code>0.0</code> and <code>1.0</code>), and <code>memory_info.current_size</code>
35+
reports the current memory usage. Compare <code>cpu_info.used_ratio</code> against
36+
<code>Configuration.max_used_cpu_ratio</code> to detect CPU overload.</p>
3837
</td>
3938
</tr>
4039
<tr>
4140
<td><code>MIGRATING</code></td>
42-
<td><code>None</code></td>
41+
<td><ApiLink to="class/EventMigratingData"><code>EventMigratingData</code></ApiLink></td>
4342
<td>
4443
<p>Emitted when the Actor running on the Apify platform
4544
is going to be <a href="https://docs.apify.com/platform/actors/development/state-persistence#what-is-a-migration">migrated</a>
46-
{' '}to another worker server soon.</p>
45+
{' '}to another worker server soon. The <code>time_remaining</code> field reports how much time
46+
the Actor has left before it is force-migrated.</p>
4747
You can use it to persist the state of the Actor so that once it is executed again on the new server,
4848
it doesn't have to start over from the beginning.
4949
Once you have persisted the state of your Actor, you can call <ApiLink to="class/Actor#reboot">`Actor.reboot`</ApiLink>
@@ -52,7 +52,7 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o
5252
</tr>
5353
<tr>
5454
<td><code>ABORTING</code></td>
55-
<td><code>None</code></td>
55+
<td><ApiLink to="class/EventAbortingData"><code>EventAbortingData</code></ApiLink></td>
5656
<td>
5757
When a user aborts an Actor run on the Apify platform,
5858
they can choose to abort gracefully to allow the Actor some time before getting killed.
@@ -61,7 +61,7 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o
6161
</tr>
6262
<tr>
6363
<td><code>PERSIST_STATE</code></td>
64-
<td><pre>{`{ "is_migrating": bool }`}</pre></td>
64+
<td><ApiLink to="class/EventPersistStateData"><code>EventPersistStateData</code></ApiLink></td>
6565
<td>
6666
<p>Emitted in regular intervals (by default 60 seconds) to notify the Actor that it should persist its state,
6767
in order to avoid repeating all work when the Actor restarts.</p>
@@ -73,7 +73,7 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o
7373
</tr>
7474
<tr>
7575
<td><code>EXIT</code></td>
76-
<td><code>None</code></td>
76+
<td><ApiLink to="class/EventExitData"><code>EventExitData</code></ApiLink></td>
7777
<td>
7878
Emitted by the SDK (not the platform) when the Actor is about to exit. You can use this event to perform final cleanup tasks,
7979
such as closing external connections or sending notifications, before the Actor shuts down.
@@ -103,4 +103,8 @@ You can optionally specify a `key` (the key-value store key under which the stat
103103
{UseStateExample}
104104
</RunnableCodeBlock>
105105

106+
## Conclusion
107+
108+
This page has described the events emitted during a run — `SYSTEM_INFO`, `MIGRATING`, `ABORTING`, `PERSIST_STATE`, and `EXIT` — how to handle them with <ApiLink to="class/Actor#on">`Actor.on`</ApiLink>, and how to persist state automatically with <ApiLink to="class/Actor#use_state">`Actor.use_state`</ApiLink>.
109+
106110
For more details on platform events and state persistence, see the [system events](https://docs.apify.com/platform/actors/development/programming-interface/system-events) and [state persistence](https://docs.apify.com/platform/actors/development/state-persistence) documentation on the Apify platform.

docs/02_concepts/05_proxy_management.mdx

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ The Apify SDK provides built-in proxy management through the <ApiLink to="class/
2222

2323
If you want to use Apify Proxy locally, make sure that you run your Actors via the Apify CLI and that you are [logged in](https://docs.apify.com/cli/docs/installation#login-with-your-apify-account) with your Apify account in the CLI.
2424

25-
### Using Apify proxy
25+
### Using Apify Proxy
2626

2727
<RunnableCodeBlock className="language-python" language="python">
2828
{ApifyProxyExample}
@@ -38,7 +38,7 @@ If you want to use Apify Proxy locally, make sure that you run your Actors via t
3838

3939
All your proxy needs are managed by the <ApiLink to="class/ProxyConfiguration">`ProxyConfiguration`</ApiLink> class. You create an instance using the <ApiLink to="class/Actor#create_proxy_configuration">`Actor.create_proxy_configuration()`</ApiLink> method. Then you generate proxy URLs using the <ApiLink to="class/ProxyConfiguration#new_url">`ProxyConfiguration.new_url()`</ApiLink> method.
4040

41-
### Apify proxy vs. your own proxies
41+
### Apify Proxy vs. your own proxies
4242

4343
The `ProxyConfiguration` class covers both Apify Proxy and custom proxy URLs, so that you can easily switch between proxy providers. However, some features of the class are available only to Apify Proxy users, mainly because Apify Proxy is what one would call a super-proxy. It's not a single proxy server, but an API endpoint that allows connection through millions of different IP addresses. So the class essentially has two modes: Apify Proxy or Your proxy.
4444

@@ -54,7 +54,7 @@ When no `session_id` is provided, your custom proxy URLs are rotated round-robin
5454
{ProxyRotationExample}
5555
</RunnableCodeBlock>
5656

57-
### Apify proxy configuration
57+
### Apify Proxy configuration
5858

5959
With Apify Proxy, you can select specific proxy groups to use, or countries to connect from. For even finer control, you can also target a specific subdivision (e.g. a US state) using the `subdivision_code` parameter alongside `country_code`. This allows you to get better proxy performance after some initial research.
6060

@@ -106,6 +106,8 @@ You can then use that input to create the proxy configuration:
106106

107107
## Using the generated proxy URLs
108108

109+
`ProxyConfiguration` only generates proxy URLs — it does not make requests itself. Pass a generated URL to whichever HTTP client your Actor uses to route requests through the proxy.
110+
109111
### HTTPX
110112

111113
To use the generated proxy URLs with the `httpx` library, use the [`proxies`](https://www.python-httpx.org/advanced/#http-proxying) argument:
@@ -120,4 +122,8 @@ Make sure you have the `httpx` library installed:
120122
pip install httpx
121123
```
122124

125+
## Conclusion
126+
127+
This page has explained how to manage proxies with the <ApiLink to="class/ProxyConfiguration">`ProxyConfiguration`</ApiLink> class — using Apify Proxy or your own servers, keeping sessions sticky across requests, configuring tiered proxy rotation, and feeding proxy settings from Actor input.
128+
123129
For full details on proxy configuration options, see the <ApiLink to="class/ProxyConfiguration">`ProxyConfiguration`</ApiLink> API reference and the [Apify Proxy documentation](https://docs.apify.com/proxy).

0 commit comments

Comments
 (0)