Skip to content

Commit d7a44ef

Browse files
vdusekclaude
andauthored
docs: improve guides consistency (#845)
## Summary - Fix inlined code examples in Introduction and Quick Start - Standardize all guides to follow the same pattern: intro paragraph, Introduction, Example Actor, Conclusion, Additional resources - Move "Running webserver" from concepts to guides in v1.7 and v2.7 - Add linked library names, Apify template links, and official documentation references to all guides - Update quick-start links to guides and concepts ## Test plan - [x] Executed locally and manually checked - [x] CI passes --- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent c1b2124 commit d7a44ef

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+491
-267
lines changed

docs/01_introduction/index.mdx

Lines changed: 7 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6,20 +6,15 @@ slug: /overview
66
description: 'The official library for creating Apify Actors in Python, providing tools for web scraping, automation, and data storage integration.'
77
---
88

9+
import CodeBlock from '@theme/CodeBlock';
10+
11+
import IntroductionExample from '!!raw-loader!./code/01_introduction.py';
12+
913
The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides useful features like Actor lifecycle management, local storage emulation, and Actor event handling.
1014

11-
```python
12-
from apify import Actor
13-
from bs4 import BeautifulSoup
14-
import requests
15-
16-
async def main():
17-
async with Actor:
18-
input = await Actor.get_input()
19-
response = requests.get(input['url'])
20-
soup = BeautifulSoup(response.content, 'html.parser')
21-
await Actor.push_data({ 'url': input['url'], 'title': soup.title.string })
22-
```
15+
<CodeBlock className="language-python">
16+
{IntroductionExample}
17+
</CodeBlock>
2318

2419
## What are Actors
2520

docs/01_introduction/quick-start.mdx

Lines changed: 30 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,9 @@ import Tabs from '@theme/Tabs';
1313
import TabItem from '@theme/TabItem';
1414
import CodeBlock from '@theme/CodeBlock';
1515

16+
import MainExample from '!!raw-loader!./code/actor_structure/main.py';
17+
import UnderscoreMainExample from '!!raw-loader!./code/actor_structure/__main__.py';
18+
1619
## Step 1: Create Actors
1720

1821
To create and run Actors in Apify Console, refer to the [Console documentation](/platform/actors/development/quick-start/web-ide).
@@ -61,33 +64,14 @@ The Actor's source code is in the `src` folder. This folder contains two importa
6164

6265
<Tabs>
6366
<TabItem value="main.py" label="main.py" default>
64-
<CodeBlock language="python">{
65-
`from apify import Actor
66-
${''}
67-
async def main():
68-
async with Actor:
69-
Actor.log.info('Actor input:', await Actor.get_input())
70-
await Actor.set_value('OUTPUT', 'Hello, world!')`
71-
}</CodeBlock>
67+
<CodeBlock className="language-python">
68+
{MainExample}
69+
</CodeBlock>
7270
</TabItem>
7371
<TabItem value="__main__.py" label="__main.py__">
74-
<CodeBlock language="python">{
75-
`import asyncio
76-
import logging
77-
${''}
78-
from apify.log import ActorLogFormatter
79-
${''}
80-
from .main import main
81-
${''}
82-
handler = logging.StreamHandler()
83-
handler.setFormatter(ActorLogFormatter())
84-
${''}
85-
apify_logger = logging.getLogger('apify')
86-
apify_logger.setLevel(logging.DEBUG)
87-
apify_logger.addHandler(handler)
88-
${''}
89-
asyncio.run(main())`
90-
}</CodeBlock>
72+
<CodeBlock className="language-python">
73+
{UnderscoreMainExample}
74+
</CodeBlock>
9175
</TabItem>
9276
</Tabs>
9377

@@ -96,21 +80,30 @@ We recommend keeping the entrypoint for the Actor in the `src/__main__.py` file.
9680

9781
## Next steps
9882

83+
### Concepts
84+
85+
To learn more about the features of the Apify SDK and how to use them, check out the Concepts section in the sidebar:
86+
87+
- [Actor lifecycle](../concepts/actor-lifecycle)
88+
- [Actor input](../concepts/actor-input)
89+
- [Working with storages](../concepts/storages)
90+
- [Actor events & state persistence](../concepts/actor-events)
91+
- [Proxy management](../concepts/proxy-management)
92+
- [Interacting with other Actors](../concepts/interacting-with-other-actors)
93+
- [Creating webhooks](../concepts/webhooks)
94+
- [Accessing Apify API](../concepts/access-apify-api)
95+
- [Logging](../concepts/logging)
96+
- [Actor configuration](../concepts/actor-configuration)
97+
- [Pay-per-event monetization](../concepts/pay-per-event)
98+
9999
### Guides
100100

101-
To see how you can integrate the Apify SDK with some of the most popular web scraping libraries, check out our guides for working with:
101+
To see how you can integrate the Apify SDK with popular web scraping libraries, check out our guides:
102102

103-
- [Requests or HTTPX](../guides/requests-and-httpx)
104-
- [Beautiful Soup](../guides/beautiful-soup)
103+
- [BeautifulSoup with HTTPX](../guides/beautifulsoup-httpx)
104+
- [Parsel with Impit](../guides/parsel-impit)
105105
- [Playwright](../guides/playwright)
106106
- [Selenium](../guides/selenium)
107+
- [Crawlee](../guides/crawlee)
107108
- [Scrapy](../guides/scrapy)
108-
109-
### Usage concepts
110-
111-
To learn more about the features of the Apify SDK and how to use them, check out the Usage Concepts section in the sidebar, especially the guides for:
112-
113-
- [Actor lifecycle](../concepts/actor-lifecycle)
114-
- [Working with storages](../concepts/storages)
115-
- [Handling Actor events](../concepts/actor-events)
116-
- [How to use proxies](../concepts/proxy-management)
109+
- [Running webserver](../guides/running-webserver)

docs/03_guides/01_beautifulsoup_httpx.mdx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,9 @@ Below is a simple Actor that recursively scrapes titles from all linked websites
2828
## Conclusion
2929

3030
In this guide, you learned how to use the [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) with the [HTTPX](https://www.python-httpx.org/) in your Apify Actors. By combining these libraries, you can efficiently extract data from HTML or XML files, making it easy to build web scraping tasks in Python. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
31+
32+
## Additional resources
33+
34+
- [Apify templates: BeautifulSoup](https://apify.com/templates/python-beautifulsoup)
35+
- [BeautifulSoup: Official documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
36+
- [HTTPX: Official documentation](https://www.python-httpx.org/)

docs/03_guides/02_parsel_impit.mdx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,9 @@ The following example shows a simple Actor that recursively scrapes titles from
2626
## Conclusion
2727

2828
In this guide, you learned how to use [Parsel](https://github.com/scrapy/parsel) with [Impit](https://github.com/apify/impit) in your Apify Actors. By combining these libraries, you get a powerful and efficient solution for web scraping: [Parsel](https://github.com/scrapy/parsel) provides excellent CSS selector and XPath support for data extraction, while [Impit](https://github.com/apify/impit) offers a fast and simple HTTP client built by Apify. This combination makes it easy to build scalable web scraping tasks in Python. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
29+
30+
## Additional resources
31+
32+
- [Apify templates: Crawlee + Parsel](https://apify.com/templates/python-crawlee-parsel)
33+
- [Parsel: GitHub repository](https://github.com/scrapy/parsel)
34+
- [Impit: GitHub repository](https://github.com/apify/impit)

docs/03_guides/03_playwright.mdx

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
1010

1111
import PlaywrightExample from '!!raw-loader!roa-loader!./code/03_playwright.py';
1212

13+
In this guide, you'll learn how to use [Playwright](https://playwright.dev) for web scraping in your Apify Actors.
14+
15+
## Introduction
16+
1317
[Playwright](https://playwright.dev) is a tool for web automation and testing that can also be used for web scraping. It allows you to control a web browser programmatically and interact with web pages just as a human would.
1418

1519
Some of the key features of Playwright for web scraping include:
@@ -19,8 +23,6 @@ Some of the key features of Playwright for web scraping include:
1923
- **Powerful selectors** - Playwright provides a variety of powerful selectors that allow you to target specific elements on a web page, including CSS selectors, XPath, and text matching.
2024
- **Emulation of user interactions** - Playwright allows you to emulate user interactions like clicking, scrolling, filling out forms, and even typing in text, which can be useful for scraping websites that have dynamic content or require user input.
2125

22-
## Using Playwright in Actors
23-
2426
To create Actors which use Playwright, start from the [Playwright & Python](https://apify.com/templates/categories/python) Actor template.
2527

2628
On the Apify platform, the Actor will already have Playwright and the necessary browsers preinstalled in its Docker image, including the tools and setup necessary to run browsers in headful mode.
@@ -55,3 +57,9 @@ It uses Playwright to open the pages in an automated Chrome browser, and to extr
5557
## Conclusion
5658

5759
In this guide you learned how to create Actors that use Playwright to scrape websites. Playwright is a powerful tool that can be used to manage browser instances and scrape websites that require JavaScript execution. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
60+
61+
## Additional resources
62+
63+
- [Apify templates: Playwright + Chrome](https://apify.com/templates/python-playwright)
64+
- [Apify templates: Crawlee + Playwright + Chrome](https://apify.com/templates/python-crawlee-playwright)
65+
- [Playwright: Official documentation](https://playwright.dev/python/)

docs/03_guides/04_selenium.mdx

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
77

88
import SeleniumExample from '!!raw-loader!roa-loader!./code/04_selenium.py';
99

10+
In this guide, you'll learn how to use [Selenium](https://www.selenium.dev/) for web scraping in your Apify Actors.
11+
12+
## Introduction
13+
1014
[Selenium](https://www.selenium.dev/) is a tool for web automation and testing that can also be used for web scraping. It allows you to control a web browser programmatically and interact with web pages just as a human would.
1115

1216
Some of the key features of Selenium for web scraping include:
@@ -21,8 +25,6 @@ including CSS selectors, XPath, and text matching.
2125
- **Emulation of user interactions** - Selenium allows you to emulate user interactions like clicking, scrolling, filling out forms,
2226
and even typing in text, which can be useful for scraping websites that have dynamic content or require user input.
2327

24-
## Using Selenium in Actors
25-
2628
To create Actors which use Selenium, start from the [Selenium & Python](https://apify.com/templates/categories/python) Actor template.
2729

2830
On the Apify platform, the Actor will already have Selenium and the necessary browsers preinstalled in its Docker image,
@@ -44,3 +46,8 @@ It uses Selenium ChromeDriver to open the pages in an automated Chrome browser,
4446
## Conclusion
4547

4648
In this guide you learned how to use Selenium for web scraping in Apify Actors. You can now create your own Actors that use Selenium to scrape dynamic websites and interact with web pages just like a human would. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
49+
50+
## Additional resources
51+
52+
- [Apify templates: Selenium + Chrome](https://apify.com/templates/python-selenium)
53+
- [Selenium: Official documentation](https://www.selenium.dev/documentation/)

docs/03_guides/05_crawlee.mdx

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,3 +44,12 @@ The [`PlaywrightCrawler`](https://crawlee.dev/python/api/class/PlaywrightCrawler
4444
## Conclusion
4545

4646
In this guide, you learned how to use the [Crawlee](https://crawlee.dev/python) library in your Apify Actors. By using the [`BeautifulSoupCrawler`](https://crawlee.dev/python/api/class/BeautifulSoupCrawler), [`ParselCrawler`](https://crawlee.dev/python/api/class/ParselCrawler), and [`PlaywrightCrawler`](https://crawlee.dev/python/api/class/PlaywrightCrawler) crawlers, you can efficiently scrape static or dynamic web pages, making it easy to build web scraping tasks in Python. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own scraping tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
47+
48+
## Additional resources
49+
50+
- [Apify templates: Crawlee + BeautifulSoup](https://apify.com/templates/python-crawlee-beautifulsoup)
51+
- [Apify templates: Crawlee + Parsel](https://apify.com/templates/python-crawlee-parsel)
52+
- [Apify templates: Crawlee + Playwright + Chrome](https://apify.com/templates/python-crawlee-playwright)
53+
- [Crawlee: Official website](https://crawlee.dev/python)
54+
- [Crawlee: Documentation](https://crawlee.dev/python/docs)
55+
- [Crawlee: GitHub repository](https://github.com/apify/crawlee-python)

docs/03_guides/06_scrapy.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@ import ItemsExample from '!!raw-loader!./code/scrapy_project/src/items.py';
1313
import SpidersExample from '!!raw-loader!./code/scrapy_project/src/spiders/title.py';
1414
import SettingsExample from '!!raw-loader!./code/scrapy_project/src/settings.py';
1515

16+
In this guide, you'll learn how to use the [Scrapy](https://scrapy.org/) framework in your Apify Actors.
17+
18+
## Introduction
19+
1620
[Scrapy](https://scrapy.org/) is an open-source web scraping framework for Python. It provides tools for defining scrapers, extracting data from web pages, following links, and handling pagination. With the Apify SDK, Scrapy projects can be converted into Apify [Actors](https://docs.apify.com/platform/actors), integrated with Apify [storages](https://docs.apify.com/platform/storage), and executed on the Apify [platform](https://docs.apify.com/platform).
1721

1822
## Integrating Scrapy with the Apify platform
Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,16 @@
11
---
22
id: running-webserver
3-
title: Running webserver in your Actor
3+
title: Running webserver
44
---
55

66
import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
77

88
import WebserverExample from '!!raw-loader!roa-loader!./code/07_webserver.py';
99

10+
In this guide, you'll learn how to run a web server inside your Apify Actor. This is useful for monitoring Actor progress, creating custom APIs, or serving content during the Actor run.
11+
12+
## Introduction
13+
1014
Each Actor run on the Apify platform is assigned a unique hard-to-guess URL (for example `https://8segt5i81sokzm.runs.apify.net`), which enables HTTP access to an optional web server running inside the Actor run's container.
1115

1216
The URL is available in the following places:
@@ -17,10 +21,18 @@ The URL is available in the following places:
1721

1822
The web server running inside the container must listen at the port defined by the `Actor.configuration.container_port` property. When running Actors locally, the port defaults to `4321`, so the web server will be accessible at `http://localhost:4321`.
1923

20-
## Example
24+
## Example Actor
2125

22-
The following example demonstrates how to start a simple web server in your Actor,which will respond to every GET request with the number of items that the Actor has processed so far:
26+
The following example demonstrates how to start a simple web server in your Actor, which will respond to every GET request with the number of items that the Actor has processed so far:
2327

2428
<RunnableCodeBlock className="language-python" language="python">
2529
{WebserverExample}
2630
</RunnableCodeBlock>
31+
32+
## Conclusion
33+
34+
In this guide, you learned how to run a web server inside your Apify Actor. By leveraging the container URL and port provided by the platform, you can expose HTTP endpoints for monitoring, reporting, or serving content during Actor execution. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU).
35+
36+
## Additional resources
37+
38+
- [Apify templates: Standby Python project](https://apify.com/templates/python-standby)

docs/03_guides/code/07_webserver.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@
77
http_server = None
88

99

10-
# Just a simple handler that will print the number of processed items so far
11-
# on every GET request.
1210
class RequestHandler(BaseHTTPRequestHandler):
11+
"""A handler that prints the number of processed items on every GET request."""
12+
1313
def do_get(self) -> None:
1414
self.log_request()
1515
self.send_response(200)
@@ -18,8 +18,7 @@ def do_get(self) -> None:
1818

1919

2020
def run_server() -> None:
21-
# Start the HTTP server on the provided port,
22-
# and save a reference to the server.
21+
"""Start the HTTP server on the provided port, and save a reference to the server."""
2322
global http_server
2423
with ThreadingHTTPServer(
2524
('', Actor.configuration.web_server_port), RequestHandler

0 commit comments

Comments
 (0)