Skip to content

Commit fbbf06e

Browse files
vdusekclaude
andauthored
docs: Add versioned docs for v0.2 (#856)
- Add versioned documentation for SDK v0.2 (last zero-based version) - Fix typo in `api-typedoc.json` (`Retrive` → `Retrieve`) - Exclude changelog files from spell check in `typos.toml` - Remove `changelog.md` from versioned docs (single top-level CHANGELOG is used) - Consolidate inner `.gitignore` files into the top-level one --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 9474a6b commit fbbf06e

File tree

16 files changed

+6782
-5
lines changed

16 files changed

+6782
-5
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ Session.vim
6363

6464
# Docs
6565
docs/changelog.md
66+
website/versioned_docs/*/changelog.md
6667

6768
# Website build artifacts, node dependencies
6869
website/build

typos.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,5 +16,6 @@ extend-exclude = [
1616
"*.lock",
1717
"*.min.js",
1818
"*.min.css",
19-
"CHANGELOG.md",
19+
"**/CHANGELOG.md",
20+
"**/changelog.md",
2021
]
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
id: introduction
3+
title: Overview
4+
sidebar_label: Overview
5+
slug: /overview
6+
description: 'The official library for creating Apify Actors in Python, providing tools for web scraping, automation, and data storage integration.'
7+
---
8+
9+
The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python.
10+
It provides useful features like automatic retries and convenience functions that improve the experience of using the Apify API.
11+
12+
```python
13+
import asyncio
14+
from apify import Actor
15+
16+
async def main():
17+
async with Actor:
18+
actor_input = await Actor.get_input()
19+
print('Actor input:', actor_input)
20+
await Actor.push_data([{'result': 'Hello, world!'}])
21+
await Actor.set_value('OUTPUT', 'Done!')
22+
23+
asyncio.run(main())
24+
```
25+
26+
## What are Actors?
27+
28+
Actors are serverless cloud programs that can do almost anything a human can do in a web browser. They can do anything from small tasks such as filling in forms or unsubscribing from online services, all the way up to scraping and processing vast numbers of web pages.
29+
30+
Actors can be run either locally, or on the [Apify platform](https://docs.apify.com/platform/), where you can run them at scale, monitor them, schedule them, and even publish and monetize them.
31+
32+
If you're new to Apify, learn [what is Apify](https://docs.apify.com/platform/about) in the Apify platform documentation.
33+
34+
## Quick start
35+
36+
To create and run Actors through Apify Console, see the [Console documentation](https://docs.apify.com/academy/getting-started/creating-actors#choose-your-template). For creating and running Python Actors locally, refer to the [quick start guide](./quick-start).
37+
38+
## Installation
39+
40+
The Apify SDK for Python requires Python 3.8 or above. You can install it from [PyPI](https://pypi.org/project/apify/):
41+
42+
```bash
43+
pip install apify
44+
```
45+
46+
## Features
47+
48+
### Local storage emulation
49+
50+
When running Actors locally, the Apify SDK performs storage operations like `Actor.push_data()` or `Actor.set_value()` on the local filesystem, in the `storage` folder in the Actor project directory.
51+
52+
### Automatic configuration
53+
54+
When running Actors on the Apify platform, the SDK automatically configures the Actor using the environment variables the platform provides to the Actor's container. This means you don't have to specify your Apify API token, your Apify Proxy password, or the default storage IDs manually.
55+
56+
### Interacting with other Actors
57+
58+
You can interact with other Actors with useful API wrappers:
59+
- `Actor.start(other_actor_id, run_input=...)` starts a run of another Actor.
60+
- `Actor.call(other_actor_id, run_input=...)` starts a run and waits for it to finish.
61+
- `Actor.call_task(actor_task_id)` starts an Actor task run and waits for it to finish.
62+
63+
:::note API client alternative
64+
65+
If you need to interact with the Apify API programmatically without creating Actors, use the [Apify API client for Python](https://docs.apify.com/api/client/python) instead.
66+
67+
:::
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
---
2+
id: quick-start
3+
title: Quick start
4+
sidebar_label: Quick start
5+
description: 'Get started with the Apify SDK for Python by creating your first Actor and learning the basics.'
6+
---
7+
8+
Learn how to create and run Actors using the Apify SDK for Python.
9+
10+
---
11+
12+
## Step 1: Create Actors
13+
14+
To create a new Apify Actor on your computer, you can use the [Apify CLI](https://docs.apify.com/cli) and select one of the Python Actor templates.
15+
16+
```bash
17+
apify create my-python-actor --template python-sdk
18+
cd my-python-actor
19+
```
20+
21+
This will create a new folder called `my-python-actor`, download and extract the Python SDK Actor template there, create a virtual environment in `my-python-actor/.venv`, and install the Actor dependencies in it.
22+
23+
## Step 2: Run Actors
24+
25+
To run the Actor, you can use the [`apify run` command](https://docs.apify.com/cli/docs/reference#apify-run):
26+
27+
```bash
28+
apify run
29+
```
30+
31+
This command:
32+
33+
- Activates the virtual environment in `.venv` (if no other virtual environment is activated yet)
34+
- Starts the Actor with the appropriate environment variables for local running
35+
- Configures it to use local storages from the `storage` folder
36+
37+
The Actor input, for example, will be in `storage/key_value_stores/default/INPUT.json`.
38+
39+
## Step 3: Understand Actor structure
40+
41+
The `.actor` directory contains the [Actor configuration](https://docs.apify.com/platform/actors/development/actor-config), such as the Actor's definition and input schema, and the Dockerfile necessary to run the Actor on the Apify platform.
42+
43+
The Actor's runtime dependencies are specified in the `requirements.txt` file, which follows the [standard requirements file format](https://pip.pypa.io/en/stable/reference/requirements-file-format/).
44+
45+
The Actor's source code is in the `src` folder with two important files:
46+
47+
- `main.py` - which contains the main function of the Actor
48+
- `__main__.py` - which is the entrypoint of the Actor package, setting up the Actor logger and executing the Actor's main function via [`asyncio.run()`](https://docs.python.org/3/library/asyncio-runner.html#asyncio.run).
49+
50+
```python title="src/main.py"
51+
from apify import Actor
52+
53+
async def main():
54+
async with Actor:
55+
print('Actor input:', await Actor.get_input())
56+
await Actor.set_value('OUTPUT', 'Hello, world!')
57+
```
58+
59+
```python title="src/__main__.py"
60+
import asyncio
61+
import logging
62+
63+
from apify.log import ActorLogFormatter
64+
65+
from .main import main
66+
67+
handler = logging.StreamHandler()
68+
handler.setFormatter(ActorLogFormatter())
69+
70+
apify_logger = logging.getLogger('apify')
71+
apify_logger.setLevel(logging.DEBUG)
72+
apify_logger.addHandler(handler)
73+
74+
asyncio.run(main())
75+
```
76+
77+
If you want to modify the Actor structure, you need to make sure that your Actor is executable as a module, via `python -m src`, as that is the command started by `apify run` in the Apify CLI.
78+
79+
## Next steps
80+
81+
To learn more about the features of the Apify SDK and how to use them, check out the Concepts section, especially:
82+
83+
- [Actor lifecycle](../concepts/actor-lifecycle)
84+
- [Working with storages](../concepts/storages)
85+
- [Working with proxies](../concepts/proxy-management)
86+
- [Managing Actor events](../concepts/actor-events)
87+
- [Direct access to the Apify API](../concepts/access-apify-api)
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
---
2+
title: Actor lifecycle
3+
sidebar_label: Actor lifecycle
4+
---
5+
6+
## Lifecycle methods
7+
8+
### Initialization and cleanup
9+
10+
At the start of its runtime, the Actor needs to initialize itself, its event manager and its storages,
11+
and at the end of the runtime it needs to close these cleanly.
12+
The Apify SDK provides several options on how to manage this.
13+
14+
#### `Actor.init()` and `Actor.exit()`
15+
16+
The `Actor.init()` method initializes the Actor,
17+
the event manager which processes the Actor events from the platform event websocket,
18+
and the storage client used in the execution environment.
19+
It should be called before performing any other Actor operations.
20+
21+
The `Actor.exit()` method then exits the Actor cleanly,
22+
tearing down the event manager and the storage client.
23+
There is also the `Actor.fail()` method, which exits the Actor while marking it as failed.
24+
25+
```python title="src/main.py"
26+
import asyncio
27+
from apify import Actor
28+
from apify.consts import ActorExitCodes
29+
30+
async def main():
31+
await Actor.init()
32+
try:
33+
print('Actor input:', await Actor.get_input())
34+
await Actor.set_value('OUTPUT', 'Hello, world!')
35+
await Actor.exit()
36+
except Exception as e:
37+
print('Error while running Actor', e)
38+
await Actor.fail(exit_code=ActorExitCodes.ERROR_USER_FUNCTION_THREW, exception=e)
39+
40+
asyncio.run(main())
41+
```
42+
43+
#### Context manager
44+
45+
So that you don't have to call the lifecycle methods manually, the `Actor` class provides a context manager,
46+
which calls the `Actor.init()` method on enter,
47+
the `Actor.exit()` method on a clean exit,
48+
and the `Actor.fail()` method when there is an exception during the run of the Actor.
49+
50+
This is the recommended way to work with the `Actor` class.
51+
52+
```python title="src/main.py"
53+
import asyncio
54+
from apify import Actor
55+
56+
async def main():
57+
async with Actor:
58+
print('Actor input:', await Actor.get_input())
59+
await Actor.set_value('OUTPUT', 'Hello, world!')
60+
61+
asyncio.run(main())
62+
```
63+
64+
#### Main function
65+
66+
Another option is to pass a function to the Actor via the `Actor.main(main_func)` method,
67+
which causes the Actor to initialize, run the main function, and exit, catching any runtime errors in the passed function.
68+
69+
```python title="src/main.py"
70+
import asyncio
71+
from apify import Actor
72+
73+
async def actor_main_func():
74+
print('Actor input:', await Actor.get_input())
75+
await Actor.set_value('OUTPUT', 'Hello, world!')
76+
77+
async def main():
78+
await Actor.main(actor_main_func)
79+
80+
asyncio.run(main())
81+
```
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
title: Working with storages
3+
sidebar_label: Storages
4+
---
5+
6+
The `Actor` class provides methods to work either with the default storages of the Actor, or with any other storage, named or unnamed.
7+
8+
## Convenience methods for default storages
9+
10+
There are several methods for directly working with the default key-value store or default dataset of the Actor.
11+
12+
- `Actor.get_value('my-record')` reads a record from the default key-value store of the Actor.
13+
- `Actor.set_value('my-record', 'my-value')` saves a new value to the record in the default key-value store.
14+
- `Actor.get_input()` reads the Actor input from the default key-value store of the Actor.
15+
- `Actor.push_data([{'result': 'Hello, world!'}, ...])` saves results to the default dataset of the Actor.
16+
17+
## Opening other storages
18+
19+
The `Actor.open_dataset()`, `Actor.open_key_value_store()` and `Actor.open_request_queue()` methods can be used to open any storage for reading and writing. You can either use them without arguments to open the default storages, or you can pass a storage ID or name to open another storage.
20+
21+
```python
22+
import asyncio
23+
from apify import Actor
24+
25+
async def main():
26+
async with Actor:
27+
# Work with the default dataset of the Actor
28+
dataset = await Actor.open_dataset()
29+
await dataset.push_data({'result': 'Hello, world!'})
30+
31+
# Work with the key-value store with ID 'mIJVZsRQrDQf4rUAf'
32+
key_value_store = await Actor.open_key_value_store(id='mIJVZsRQrDQf4rUAf')
33+
await key_value_store.set_value('record', 'Hello, world!')
34+
35+
# Work with the request queue with name 'my-queue'
36+
request_queue = await Actor.open_request_queue(name='my-queue')
37+
await request_queue.add_request({'uniqueKey': 'v0Ngr', 'url': 'https://example.com'})
38+
39+
asyncio.run(main())
40+
```
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
title: Proxy management
3+
sidebar_label: Proxy management
4+
---
5+
6+
To work with proxies in your Actor, you can use the `Actor.create_proxy_configuration()` method, which allows you to generate proxy URLs either for the Apify Proxy, or even for your own proxies, with automatic proxy rotation and support for sessions.
7+
8+
```python
9+
import asyncio
10+
import httpx
11+
from apify import Actor
12+
13+
async def main():
14+
async with Actor:
15+
# You can either set the proxy config manually
16+
proxy_configuration = await Actor.create_proxy_configuration(
17+
groups=['RESIDENTIAL'],
18+
country_code='US',
19+
)
20+
21+
# --- OR ---
22+
# You can get the proxy config from the Actor input
23+
actor_input = await Actor.get_input()
24+
selected_proxy_config = actor_input['proxyConfiguration']
25+
proxy_configuration = await Actor.create_proxy_configuration(
26+
actor_proxy_input=selected_proxy_config,
27+
)
28+
29+
# --- OR ---
30+
# You can use your own proxy servers
31+
proxy_configuration = await Actor.create_proxy_configuration(
32+
proxy_urls=[
33+
'http://my-proxy.com:8000',
34+
'http://my-other-proxy.com:8000',
35+
],
36+
)
37+
38+
proxy_url = await proxy_configuration.new_url(session_id='my_session')
39+
40+
async with httpx.AsyncClient(proxies=proxy_url) as client:
41+
response = await client.get('http://example.com')
42+
await Actor.set_value('OUTPUT', response.text)
43+
44+
asyncio.run(main())
45+
```
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
---
2+
title: Actor events
3+
sidebar_label: Actor events
4+
---
5+
6+
The Apify platform sends several events to the Actor. If you want to work with them, you can use the `Actor.on()` and `Actor.off()` methods.
7+
8+
## Available events
9+
10+
- **`ActorEventTypes.SYSTEM_INFO`** — Emitted every minute, the event data contains info about the resource usage of the Actor.
11+
- **`ActorEventTypes.MIGRATING`** — Emitted when the Actor running on the Apify platform is going to be migrated to another worker server soon. You can use it to persist the state of the Actor and abort the run, to speed up the migration.
12+
- **`ActorEventTypes.PERSIST_STATE`** — Emitted in regular intervals (by default 60 seconds) to notify the Actor that it should persist its state, in order to avoid repeating all work when the Actor restarts.
13+
- **`ActorEventTypes.ABORTING`** — When a user aborts an Actor run on the Apify platform, they can choose to abort gracefully to allow the Actor some time before getting terminated.
14+
15+
## Example
16+
17+
```python
18+
import asyncio
19+
from pprint import pprint
20+
from apify import Actor
21+
from apify.consts import ActorEventTypes
22+
23+
async def print_system_info(event_data):
24+
print('Actor system info from platform:')
25+
pprint(event_data)
26+
27+
async def react_to_abort(event_data):
28+
print('The Actor is aborting!')
29+
pprint(event_data)
30+
31+
async def persist_state(event_data):
32+
print('The Actor should persist its state!')
33+
pprint(event_data)
34+
# Add your state persisting logic here
35+
36+
async def main():
37+
async with Actor:
38+
Actor.on(ActorEventTypes.SYSTEM_INFO, print_system_info)
39+
Actor.on(ActorEventTypes.ABORTING, react_to_abort)
40+
Actor.on(ActorEventTypes.PERSIST_STATE, persist_state)
41+
42+
# Do some work here
43+
...
44+
45+
# Remove the event handler when no longer needed
46+
Actor.off(ActorEventTypes.SYSTEM_INFO, print_system_info)
47+
48+
# Do some more work here
49+
...
50+
51+
asyncio.run(main())
52+
```

0 commit comments

Comments
 (0)