The official Python SDK for building Apify Actors.
apify is the official SDK for building Apify Actors in Python. Actors are serverless programs that run on the Apify platform, where you can scale them, schedule them, and monetize them. The SDK manages the Actor lifecycle, gives you access to storages (datasets, key-value stores, request queues), handles platform events, configures Apify Proxy, and supports pay-per-event monetization. It builds on the Crawlee web scraping framework and bundles the Apify API client.
If you only need to consume the Apify API from Python (running Actors, reading datasets, managing storages) rather than building Actors, use the Apify API client for Python instead. It comes bundled with this SDK.
- Installation
- Quick start
- Features
- Usage examples
- What are Actors?
- Documentation
- Related projects
- Support and community
- Contributing
- License
The Apify SDK for Python requires Python 3.11 or higher. It is published on PyPI as the apify package and can be installed with pip:
pip install apifyor with uv:
uv add apifyTo use the Scrapy integration, install the scrapy extra:
pip install 'apify[scrapy]'An Actor is a Python program that runs inside the async with Actor: context. The context initializes the Actor when it starts and tears it down when it finishes. Here's a minimal Actor that reads its input and stores a result:
from apify import Actor
async def main() -> None:
async with Actor:
actor_input = await Actor.get_input()
Actor.log.info('Actor input: %s', actor_input)
await Actor.set_value('OUTPUT', 'Hello, world!')The quickest way to scaffold a full Actor project, with the .actor configuration, input schema, and Dockerfile already in place, is the Apify CLI:
-
Install the CLI:
npm install -g apify-cli
-
Create a new Actor from the Python "getting started" template:
apify create my-actor --template python-start
-
Run it locally:
cd my-actor apify run
To create, run, and deploy your first Actor step by step, see the Quick start guide.
- Actor lifecycle management —
async with Actor:initializes the Actor, then handles exit, failure, status messages, and reboots (Actor lifecycle). - Typed Actor input — read input validated against your input schema with
Actor.get_input()(Actor input). - Storage access — read and write datasets, key-value stores, and request queues, both locally and on the platform (Working with storages).
- Platform events — react to system info, migration, and abort events streamed over a WebSocket (Actor events).
- Proxy management — route requests through Apify Proxy with residential or datacenter groups, country targeting, and rotation (Proxy management).
- Actor orchestration — start, call, abort, and metamorph other Actors and tasks, and register webhooks for run events (Interacting with other Actors, Webhooks).
- Pay-per-event monetization — charge for the events your Actor emits (Pay-per-event).
- Direct Apify API access — reach the full Apify API through a preconfigured
ApifyClient(Accessing the Apify API). - Built on Crawlee — combine the SDK with Crawlee crawlers, or any HTTP or browser library you prefer (Crawlee guide).
- Scrapy integration — run existing Scrapy spiders as Apify Actors through the
apify[scrapy]extra (Scrapy guide).
The SDK works with whatever scraping stack you prefer. The examples below show two common setups. For more, see the Guides.
Scrape pages with HTTPX and BeautifulSoup, using the Actor's request queue to track URLs:
from bs4 import BeautifulSoup
from httpx import AsyncClient
from apify import Actor
async def main() -> None:
async with Actor:
# Retrieve the Actor input, and use default values if not provided.
actor_input = await Actor.get_input() or {}
start_urls = actor_input.get('start_urls', [{'url': 'https://apify.com'}])
# Open the default request queue for handling URLs to be processed.
request_queue = await Actor.open_request_queue()
# Enqueue the start URLs.
for start_url in start_urls:
url = start_url.get('url')
await request_queue.add_request(url)
# Process the URLs from the request queue.
while request := await request_queue.fetch_next_request():
Actor.log.info(f'Scraping {request.url} ...')
# Fetch the HTTP response from the specified URL using HTTPX.
async with AsyncClient() as client:
response = await client.get(request.url)
# Parse the HTML content using Beautiful Soup.
soup = BeautifulSoup(response.content, 'html.parser')
# Extract the desired data.
data = {
'url': request.url,
'title': soup.title.string,
'h1s': [h1.text for h1 in soup.find_all('h1')],
'h2s': [h2.text for h2 in soup.find_all('h2')],
'h3s': [h3.text for h3 in soup.find_all('h3')],
}
# Store the extracted data to the default dataset.
await Actor.push_data(data)Scrape pages with Crawlee's PlaywrightCrawler, which handles queueing, concurrency, and browser automation for you:
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
from apify import Actor
async def main() -> None:
async with Actor:
# Retrieve the Actor input, and use default values if not provided.
actor_input = await Actor.get_input() or {}
start_urls = [url.get('url') for url in actor_input.get('start_urls', [{'url': 'https://apify.com'}])]
# Exit if no start URLs are provided.
if not start_urls:
Actor.log.info('No start URLs specified in Actor input, exiting...')
await Actor.exit()
# Create a crawler.
crawler = PlaywrightCrawler(
# Limit the crawl to max requests. Remove or increase it for crawling all links.
max_requests_per_crawl=50,
headless=True,
)
# Define a request handler, which will be called for every request.
@crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None:
url = context.request.url
Actor.log.info(f'Scraping {url}...')
# Extract the desired data.
data = {
'url': context.request.url,
'title': await context.page.title(),
'h1s': [await h1.text_content() for h1 in await context.page.locator('h1').all()],
'h2s': [await h2.text_content() for h2 in await context.page.locator('h2').all()],
'h3s': [await h3.text_content() for h3 in await context.page.locator('h3').all()],
}
# Store the extracted data to the default dataset.
await context.push_data(data)
# Enqueue additional links found on the current page.
await context.enqueue_links()
# Run the crawler with the starting URLs.
await crawler.run(start_urls)Actors are serverless cloud programs that can do almost anything a human can do in a web browser. They range from small tasks, such as filling in forms or unsubscribing from online services, all the way up to scraping and processing vast numbers of web pages.
They run either locally or on the Apify platform, where you can run them at scale, monitor them, schedule them, or publish and monetize them. If you're new to Apify, learn what Apify is in the platform documentation.
The full documentation lives at docs.apify.com/sdk/python.
| Section | What you'll find |
|---|---|
| Overview | What the SDK is, what Actors are, and how the pieces fit together. |
| Quick start | Create, run, and deploy your first Python Actor. |
| Concepts | Actor lifecycle, input, storages, events, proxy management, interacting with other Actors, webhooks, accessing the Apify API, logging, configuration, and pay-per-event. |
| Guides | Integrations with BeautifulSoup, Parsel, Playwright, Selenium, Crawlee, Scrapy, Crawl4AI, and Browser Use, plus running a web server and using uv. |
| Upgrading | Migrating between major versions. |
| API reference | Generated reference for every class and method. |
| Changelog | Release history and breaking changes. |
- Apify API client for Python — talk to the Apify API directly from Python (bundled with this SDK).
- Crawlee for Python — the web scraping and browser automation framework the SDK builds on.
- Apify SDK for JavaScript / TypeScript — the equivalent SDK for Node.js.
- Apify API client for JavaScript / TypeScript — the equivalent API client for Node.js.
- Crawlee for JavaScript / TypeScript — the original Node.js implementation of Crawlee.
- Apify CLI — command-line tool for creating, running, and deploying Actors locally and on the platform.
- Discord — chat with the team and other users on the Apify Discord server.
- GitHub issues — report a bug or request a feature in the issue tracker.
Bug reports, fixes, and improvements are welcome! See CONTRIBUTING.md for the development setup, coding standards, testing, and release process. The project uses uv for project management and Poe the Poet as a task runner; the typical loop is:
uv run poe install-dev # install dev dependencies and git hooks
uv run poe check-code # lint, type-check, and unit testsReleased under the Apache License 2.0.