Apify SDK for Python

The official Python SDK for building Apify Actors.

apify is the official SDK for building Apify Actors in Python. Actors are serverless programs that run on the Apify platform, where you can scale them, schedule them, and monetize them. The SDK manages the Actor lifecycle, gives you access to storages (datasets, key-value stores, request queues), handles platform events, configures Apify Proxy, and supports pay-per-event monetization. It builds on the Crawlee web scraping framework and bundles the Apify API client.

If you only need to consume the Apify API from Python (running Actors, reading datasets, managing storages) rather than building Actors, use the Apify API client for Python instead. It comes bundled with this SDK.

Installation
Quick start
Features
Usage examples
What are Actors?
Documentation
Related projects
Support and community
Contributing
License

Installation

The Apify SDK for Python requires Python 3.11 or higher. It is published on PyPI as the apify package and can be installed with pip:

pip install apify

or with uv:

uv add apify

To use the Scrapy integration, install the scrapy extra:

pip install 'apify[scrapy]'

Quick start

An Actor is a Python program that runs inside the async with Actor: context. The context initializes the Actor when it starts and tears it down when it finishes. Here's a minimal Actor that reads its input and stores a result:

from apify import Actor


async def main() -> None:
    async with Actor:
        actor_input = await Actor.get_input()
        Actor.log.info('Actor input: %s', actor_input)
        await Actor.set_value('OUTPUT', 'Hello, world!')

The quickest way to scaffold a full Actor project, with the .actor configuration, input schema, and Dockerfile already in place, is the Apify CLI:

Install the CLI:
```
npm install -g apify-cli
```
Create a new Actor from the Python "getting started" template:
```
apify create my-actor --template python-start
```
Run it locally:
```
cd my-actor
apify run
```

To create, run, and deploy your first Actor step by step, see the Quick start guide.

Features

Actor lifecycle management — async with Actor: initializes the Actor, then handles exit, failure, status messages, and reboots (Actor lifecycle).
Typed Actor input — read input validated against your input schema with Actor.get_input() (Actor input).
Storage access — read and write datasets, key-value stores, and request queues, both locally and on the platform (Working with storages).
Platform events — react to system info, migration, and abort events streamed over a WebSocket (Actor events).
Proxy management — route requests through Apify Proxy with residential or datacenter groups, country targeting, and rotation (Proxy management).
Actor orchestration — start, call, abort, and metamorph other Actors and tasks, and register webhooks for run events (Interacting with other Actors, Webhooks).
Pay-per-event monetization — charge for the events your Actor emits (Pay-per-event).
Direct Apify API access — reach the full Apify API through a preconfigured ApifyClient (Accessing the Apify API).
Built on Crawlee — combine the SDK with Crawlee crawlers, or any HTTP or browser library you prefer (Crawlee guide).
Scrapy integration — run existing Scrapy spiders as Apify Actors through the apify[scrapy] extra (Scrapy guide).

Usage examples

The SDK works with whatever scraping stack you prefer. The examples below show two common setups. For more, see the Guides.

HTTPX with BeautifulSoup

Scrape pages with HTTPX and BeautifulSoup, using the Actor's request queue to track URLs:

from bs4 import BeautifulSoup
from httpx import AsyncClient

from apify import Actor


async def main() -> None:
    async with Actor:
        # Retrieve the Actor input, and use default values if not provided.
        actor_input = await Actor.get_input() or {}
        start_urls = actor_input.get('start_urls', [{'url': 'https://apify.com'}])

        # Open the default request queue for handling URLs to be processed.
        request_queue = await Actor.open_request_queue()

        # Enqueue the start URLs.
        for start_url in start_urls:
            url = start_url.get('url')
            await request_queue.add_request(url)

        # Process the URLs from the request queue.
        while request := await request_queue.fetch_next_request():
            Actor.log.info(f'Scraping {request.url} ...')

            # Fetch the HTTP response from the specified URL using HTTPX.
            async with AsyncClient() as client:
                response = await client.get(request.url)

            # Parse the HTML content using Beautiful Soup.
            soup = BeautifulSoup(response.content, 'html.parser')

            # Extract the desired data.
            data = {
                'url': request.url,
                'title': soup.title.string,
                'h1s': [h1.text for h1 in soup.find_all('h1')],
                'h2s': [h2.text for h2 in soup.find_all('h2')],
                'h3s': [h3.text for h3 in soup.find_all('h3')],
            }

            # Store the extracted data to the default dataset.
            await Actor.push_data(data)

PlaywrightCrawler from Crawlee

Scrape pages with Crawlee's PlaywrightCrawler, which handles queueing, concurrency, and browser automation for you:

from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext

from apify import Actor


async def main() -> None:
    async with Actor:
        # Retrieve the Actor input, and use default values if not provided.
        actor_input = await Actor.get_input() or {}
        start_urls = [url.get('url') for url in actor_input.get('start_urls', [{'url': 'https://apify.com'}])]

        # Exit if no start URLs are provided.
        if not start_urls:
            Actor.log.info('No start URLs specified in Actor input, exiting...')
            await Actor.exit()

        # Create a crawler.
        crawler = PlaywrightCrawler(
            # Limit the crawl to max requests. Remove or increase it for crawling all links.
            max_requests_per_crawl=50,
            headless=True,
        )

        # Define a request handler, which will be called for every request.
        @crawler.router.default_handler
        async def request_handler(context: PlaywrightCrawlingContext) -> None:
            url = context.request.url
            Actor.log.info(f'Scraping {url}...')

            # Extract the desired data.
            data = {
                'url': context.request.url,
                'title': await context.page.title(),
                'h1s': [await h1.text_content() for h1 in await context.page.locator('h1').all()],
                'h2s': [await h2.text_content() for h2 in await context.page.locator('h2').all()],
                'h3s': [await h3.text_content() for h3 in await context.page.locator('h3').all()],
            }

            # Store the extracted data to the default dataset.
            await context.push_data(data)

            # Enqueue additional links found on the current page.
            await context.enqueue_links()

        # Run the crawler with the starting URLs.
        await crawler.run(start_urls)

What are Actors?

Actors are serverless cloud programs that can do almost anything a human can do in a web browser. They range from small tasks, such as filling in forms or unsubscribing from online services, all the way up to scraping and processing vast numbers of web pages.

They run either locally or on the Apify platform, where you can run them at scale, monitor them, schedule them, or publish and monetize them. If you're new to Apify, learn what Apify is in the platform documentation.

Documentation

The full documentation lives at docs.apify.com/sdk/python.

Section	What you'll find
Overview	What the SDK is, what Actors are, and how the pieces fit together.
Quick start	Create, run, and deploy your first Python Actor.
Concepts	Actor lifecycle, input, storages, events, proxy management, interacting with other Actors, webhooks, accessing the Apify API, logging, configuration, and pay-per-event.
Guides	Integrations with BeautifulSoup, Parsel, Playwright, Selenium, Crawlee, Scrapy, Crawl4AI, and Browser Use, plus running a web server and using uv.
Upgrading	Migrating between major versions.
API reference	Generated reference for every class and method.
Changelog	Release history and breaking changes.

Related projects

Apify API client for Python — talk to the Apify API directly from Python (bundled with this SDK).
Crawlee for Python — the web scraping and browser automation framework the SDK builds on.
Apify SDK for JavaScript / TypeScript — the equivalent SDK for Node.js.
Apify API client for JavaScript / TypeScript — the equivalent API client for Node.js.
Crawlee for JavaScript / TypeScript — the original Node.js implementation of Crawlee.
Apify CLI — command-line tool for creating, running, and deploying Actors locally and on the platform.

Support and community

Discord — chat with the team and other users on the Apify Discord server.
GitHub issues — report a bug or request a feature in the issue tracker.

Contributing

Bug reports, fixes, and improvements are welcome! See CONTRIBUTING.md for the development setup, coding standards, testing, and release process. The project uses uv for project management and Poe the Poet as a task runner; the typical loop is:

uv run poe install-dev   # install dev dependencies and git hooks
uv run poe check-code    # lint, type-check, and unit tests

License

Released under the Apache License 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Apify SDK for Python

Table of contents

Installation

Quick start

Features

Usage examples

HTTPX with BeautifulSoup

PlaywrightCrawler from Crawlee

What are Actors?

Documentation

Related projects

Support and community

Contributing

License

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Apify SDK for Python

Table of contents

Installation

Quick start

Features

Usage examples

HTTPX with BeautifulSoup

PlaywrightCrawler from Crawlee

What are Actors?

Documentation

Related projects

Support and community

Contributing

License