Skip to content

Commit 005249b

Browse files
committed
docs: rotate proxy per request in the Playwright example and tidy guide wording
1 parent 10c8199 commit 005249b

3 files changed

Lines changed: 14 additions & 8 deletions

File tree

docs/03_guides/03_playwright.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ It uses Playwright to open the pages in an automated Chrome browser, and to extr
5858

5959
## Using Apify Proxy
6060

61-
Running on the Apify platform gives your scraper access to [Apify Proxy](https://docs.apify.com/platform/proxy), which rotates IP addresses to avoid rate limiting and blocking. The example creates a proxy configuration with `Actor.create_proxy_configuration` and launches the browser through it. Playwright applies the proxy at the browser level, so the whole run shares a single proxy URL rather than rotating per request; the `to_playwright_proxy` helper splits that URL into the `server`, `username`, and `password` fields Playwright expects. To select specific proxy groups or a country, pass the relevant arguments to `Actor.create_proxy_configuration`. For more details, see the [Proxy management](../concepts/proxy-management) guide.
61+
Running on the Apify platform gives your scraper access to [Apify Proxy](https://docs.apify.com/platform/proxy), which rotates IP addresses to avoid rate limiting and blocking. The example creates a proxy configuration with `Actor.create_proxy_configuration` and fetches a fresh proxy URL for every request. Playwright applies a proxy per browser context, so each request runs in its own new context to rotate the IP. The `to_playwright_proxy` helper splits that URL into the `server`, `username`, and `password` fields Playwright expects. To select specific proxy groups or a country, pass the relevant arguments to `Actor.create_proxy_configuration`. For more details, see the [Proxy management](../concepts/proxy-management) guide.
6262

6363
## Conclusion
6464

docs/03_guides/12_running_webserver.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ The following example shows how to start a simple web server in your Actor, whic
3333

3434
## Using FastAPI
3535

36-
The example above relies only on Python's standard library, which keeps it dependency-free but leaves you handling requests by hand. For anything beyond a single endpoint, a web framework such as [FastAPI](https://fastapi.tiangolo.com/) is a better fit - it gives you routing, request parsing, and automatic JSON responses, and is served by an ASGI server like [uvicorn](https://www.uvicorn.org/).
36+
The example above relies only on Python's standard library, which keeps it dependency-free but leaves you handling requests by hand. For anything beyond a single endpoint, a web framework such as [FastAPI](https://fastapi.tiangolo.com/) is a better fit. It gives you routing, request parsing, and automatic JSON responses, and is served by an ASGI server like [uvicorn](https://www.uvicorn.org/).
3737

3838
Install both, for example by adding them to your `requirements.txt`:
3939

@@ -52,13 +52,13 @@ A few things worth pointing out:
5252

5353
- `uvicorn.Server(...).serve()` is a coroutine, so it runs as an `asyncio` task alongside the Actor's own work instead of blocking it. Setting `server.should_exit = True` triggers a graceful shutdown once the work is done.
5454
- The server binds to `0.0.0.0` (all interfaces) rather than `localhost`, so it's reachable through the container URL, not only from inside the container.
55-
- The same pattern powers an [Actor Standby](#actor-standby) service - swap the one-off work loop for an Actor that just keeps serving requests.
55+
- The same pattern powers an [Actor Standby](#actor-standby) service. Swap the one-off work loop for an Actor that just keeps serving requests.
5656

5757
## Actor Standby
5858

5959
The example above runs a web server for the duration of a single Actor run. With [Actor Standby](https://docs.apify.com/platform/actors/development/programming-interface/standby), you can instead expose your Actor as an always-ready HTTP API: the platform keeps the Actor running in the background and routes incoming HTTP requests to the web server inside it, spinning up additional instances as the load grows.
6060

61-
From the SDK's perspective, a Standby Actor is built the same way as the web server above start an HTTP server listening on the port from `Actor.configuration.web_server_port`. The difference is operational: instead of doing its work once and exiting, a Standby Actor stays up and serves requests. This makes it a good fit for low-latency, on-demand use cases, such as serving scraped data or acting as a microservice.
61+
From the SDK's perspective, a Standby Actor is built the same way as the web server above. You start an HTTP server listening on the port from `Actor.configuration.web_server_port`. The difference is operational: instead of doing its work once and exiting, a Standby Actor stays up and serves requests. This makes it a good fit for low-latency, on-demand use cases, such as serving scraped data or acting as a microservice.
6262

6363
To get started quickly, use the [Standby Python template](https://apify.com/templates/python-standby). For details on enabling Standby, request routing, and readiness probes, see the [Actor Standby documentation](https://docs.apify.com/platform/actors/development/programming-interface/standby).
6464

docs/03_guides/code/03_playwright.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -83,9 +83,8 @@ async def main() -> None:
8383
Actor.log.info('No start URLs specified in Actor input, exiting...')
8484
await Actor.exit()
8585

86-
# Playwright proxies at the browser level, so one URL is shared per run.
86+
# Set up the proxy configuration; a fresh proxy URL is fetched per request below.
8787
proxy_configuration = await Actor.create_proxy_configuration()
88-
proxy_url = await proxy_configuration.new_url() if proxy_configuration else None
8988

9089
# Open the request queue and enqueue the start URLs (crawl depth 0).
9190
request_queue = await Actor.open_request_queue()
@@ -103,10 +102,8 @@ async def main() -> None:
103102
async with async_playwright() as playwright:
104103
browser = await playwright.chromium.launch(
105104
headless=Actor.configuration.headless,
106-
proxy=to_playwright_proxy(proxy_url) if proxy_url else None,
107105
args=['--no-sandbox', '--disable-dev-shm-usage', '--disable-gpu'],
108106
)
109-
context = await browser.new_context()
110107

111108
while handled_requests < max_requests and (
112109
request := await request_queue.fetch_next_request()
@@ -116,6 +113,14 @@ async def main() -> None:
116113
depth = request.crawl_depth
117114
Actor.log.info(f'Scraping {url} (depth={depth}) ...')
118115

116+
# A new context with a fresh proxy URL per request rotates the proxy IP.
117+
proxy_url = (
118+
await proxy_configuration.new_url() if proxy_configuration else None
119+
)
120+
context = await browser.new_context(
121+
proxy=to_playwright_proxy(proxy_url) if proxy_url else None,
122+
)
123+
119124
try:
120125
data, links = await scrape_page(context, url)
121126
await Actor.push_data(data)
@@ -131,6 +136,7 @@ async def main() -> None:
131136
Actor.log.exception(f'Cannot extract data from {url}.')
132137

133138
finally:
139+
await context.close()
134140
await request_queue.mark_request_as_handled(request)
135141

136142

0 commit comments

Comments
 (0)