You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: standardize language, differentiate guides, and add closing notes (#851)
## Summary
- Rewrite opening paragraphs for 5 concept pages (proxy management,
interacting with other Actors, API access, logging, configuration) to
follow a consistent "what it does + why it matters" pattern
- Standardize all 7 guide titles from gerund form ("Using X") to
imperative form ("Use X"), including "Running webserver" → "Run a web
server"
- Differentiate Playwright and Selenium feature lists — Playwright now
highlights auto-waiting, locator API, and network interception; Selenium
highlights its broad ecosystem, WebDriver protocol, and flexible
selection strategies
- Standardize example intro phrasing to "The following example shows..."
across Crawlee, Scrapy, and webserver guides; fix a stray backtick typo
in the Crawlee guide
- Remove duplicate opening sentence in the Crawlee guide
- Add closing sentences with API reference/docs links to 5 concept pages
that ended abruptly after code blocks
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
[IP address blocking](https://en.wikipedia.org/wiki/IP_address_blocking) is one of the oldest and most effective ways of preventing access to a website. It is therefore paramount for a good web scraping library to provide easy to use but powerful tools which can work around IP blocking. The most powerful weapon in your anti IP blocking arsenal is a [proxy server](https://en.wikipedia.org/wiki/Proxy_server).
18
-
19
-
With the Apify SDK, you can use your own proxy servers, proxy servers acquired from third-party providers, or you can rely on [Apify Proxy](https://apify.com/proxy) for your scraping needs.
18
+
The Apify SDK provides built-in proxy management through the <ApiLinkto="class/ProxyConfiguration">`ProxyConfiguration`</ApiLink> class, supporting both [Apify Proxy](https://apify.com/proxy) and custom proxy servers. Proxies are essential for web scraping to avoid [IP address blocking](https://en.wikipedia.org/wiki/IP_address_blocking) and distribute requests across multiple addresses.
20
19
21
20
## Quick start
22
21
@@ -107,3 +106,5 @@ Make sure you have the `httpx` library installed:
107
106
```bash
108
107
pip install httpx
109
108
```
109
+
110
+
For full details on proxy configuration options, see the <ApiLinkto="class/ProxyConfiguration">`ProxyConfiguration`</ApiLink> API reference and the [Apify Proxy documentation](https://docs.apify.com/proxy).
There are several methods that interact with other Actors and Actor tasks on the Apify platform.
15
+
The Apify SDK lets you start, call, and transform (metamorph) other Actors directly from your Actor code. This is useful for composing complex workflows from smaller, reusable Actors.
15
16
16
17
## Actor start
17
18
@@ -50,3 +51,5 @@ For example, imagine you have an Actor that accepts a hotel URL on input, and th
For more information about webhooks, including event types and payloads, see the [Apify webhooks documentation](https://docs.apify.com/platform/integrations/webhooks).
The Apify SDK contains many useful features for making Actor development easier. However, it does not cover all the features the Apify API offers.
13
-
14
-
For working with the Apify API directly, you can use the provided instance of the [Apify API Client](https://docs.apify.com/api/client/python) library.
12
+
The Apify SDK provides a built-in instance of the [Apify API Client](https://docs.apify.com/api/client/python) for accessing Apify platform features beyond what the SDK covers directly.
15
13
16
14
## Actor client
17
15
@@ -30,3 +28,5 @@ If you want to create a completely new instance of the client, for example, to g
The Apify SDK is logging useful information through the [`logging`](https://docs.python.org/3/library/logging.html) module from Python's standard library, into the logger with the name `apify`.
14
+
The Apify SDK logs through Python's standard [`logging`](https://docs.python.org/3/library/logging.html) module, using the `apify` logger. Configuring log levels and formatting helps you debug Actors locally and monitor them on the platform.
The [`Actor`](../../reference/class/Actor) class gets configured using the [`Configuration`](../../reference/class/Configuration) class, which initializes itself based on the provided environment variables.
12
-
13
-
If you're using the Apify SDK in your Actors on the Apify platform, or Actors running locally through the Apify CLI, you don't need to configure the `Actor` class manually, unless you have some specific requirements, everything will get configured automatically.
12
+
The <ApiLinkto="class/Actor">`Actor`</ApiLink> class is configured through the <ApiLinkto="class/Configuration">`Configuration`</ApiLink> class, which reads its settings from environment variables. When running on the Apify platform or through the Apify CLI, configuration is automatic — manual setup is only needed for custom requirements.
14
13
15
14
If you need some special configuration, you can adjust it either through the `Configuration` class directly, or by setting environment variables when running the Actor locally.
16
15
@@ -33,3 +32,5 @@ This Actor run will not persist its local storages to the filesystem:
33
32
```bash
34
33
APIFY_PERSIST_STORAGE=0 apify run
35
34
```
35
+
36
+
For the full list of configuration options, see the <ApiLinkto="class/Configuration">`Configuration`</ApiLink> API reference.
Copy file name to clipboardExpand all lines: docs/03_guides/03_playwright.mdx
+6-5Lines changed: 6 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
id: playwright
3
-
title: Using Playwright
3
+
title: Use Playwright
4
4
description: Build an Apify Actor that scrapes dynamic web pages using Playwright browser automation.
5
5
---
6
6
@@ -19,10 +19,11 @@ In this guide, you'll learn how to use [Playwright](https://playwright.dev) for
19
19
20
20
Some of the key features of Playwright for web scraping include:
21
21
22
-
-**Cross-browser support** - Playwright supports the latest versions of major browsers like Chrome, Firefox, and Safari, so you can choose the one that suits your needs the best.
23
-
-**Headless mode** - Playwright can run in headless mode, meaning that the browser window is not visible on your screen while it is scraping, which can be useful for running scraping tasks in the background or in containers without a display.
24
-
-**Powerful selectors** - Playwright provides a variety of powerful selectors that allow you to target specific elements on a web page, including CSS selectors, XPath, and text matching.
25
-
-**Emulation of user interactions** - Playwright allows you to emulate user interactions like clicking, scrolling, filling out forms, and even typing in text, which can be useful for scraping websites that have dynamic content or require user input.
22
+
-**Cross-browser support** - Playwright supports Chromium, Firefox, and WebKit with a single API, ensuring consistent behavior across all browsers.
23
+
-**Auto-waiting** - Playwright automatically waits for elements to be ready before performing actions, reducing flaky scripts and eliminating the need for manual sleep calls.
24
+
-**Headless and headful modes** - Playwright can run with or without a visible browser window, making it suitable for both local development and containerized environments.
25
+
-**Powerful selectors** - Playwright provides CSS selectors, XPath, text matching, and its own resilient locator API for targeting elements on a page.
26
+
-**Network interception** - Playwright can intercept and modify network requests, allowing you to block unnecessary resources or mock API responses during scraping.
26
27
27
28
To create Actors which use Playwright, start from the [Playwright & Python](https://apify.com/templates/categories/python) Actor template.
Copy file name to clipboardExpand all lines: docs/03_guides/04_selenium.mdx
+6-10Lines changed: 6 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
id: selenium
3
-
title: Using Selenium
3
+
title: Use Selenium
4
4
description: Build an Apify Actor that scrapes dynamic web pages using Selenium WebDriver.
5
5
---
6
6
@@ -16,15 +16,11 @@ In this guide, you'll learn how to use [Selenium](https://www.selenium.dev/) for
16
16
17
17
Some of the key features of Selenium for web scraping include:
18
18
19
-
-**Cross-browser support** - Selenium supports the latest versions of major browsers like Chrome, Firefox, and Safari,
20
-
so you can choose the one that suits your needs the best.
21
-
-**Headless mode** - Selenium can run in headless mode,
22
-
meaning that the browser window is not visible on your screen while it is scraping,
23
-
which can be useful for running scraping tasks in the background or in containers without a display.
24
-
-**Powerful selectors** - Selenium provides a variety of powerful selectors that allow you to target specific elements on a web page,
25
-
including CSS selectors, XPath, and text matching.
26
-
-**Emulation of user interactions** - Selenium allows you to emulate user interactions like clicking, scrolling, filling out forms,
27
-
and even typing in text, which can be useful for scraping websites that have dynamic content or require user input.
19
+
-**Broad ecosystem** - Selenium has a large community and extensive documentation, with support for multiple programming languages beyond Python.
20
+
-**WebDriver protocol** - Selenium uses the W3C WebDriver protocol, providing standardized browser automation that works with Chrome, Firefox, Edge, and Safari.
21
+
-**Headless and headful modes** - Selenium can run with or without a visible browser window, making it suitable for both local development and containerized environments.
22
+
-**Flexible element selection** - Selenium provides CSS selectors, XPath, ID, class name, and other strategies for locating elements on a page.
23
+
-**User interaction emulation** - Selenium allows you to emulate user actions like clicking, scrolling, filling out forms, and typing, which is useful for scraping dynamic websites.
28
24
29
25
To create Actors which use Selenium, start from the [Selenium & Python](https://apify.com/templates/categories/python) Actor template.
0 commit comments