docs: unify crawl caps and fix runnable examples

vdusek · vdusek · commit 0b8d10a36d34 · 2026-06-18T15:03:40.000+02:00
Lower the page cap from 50 to 10 across all crawling examples so the browser-based ones finish within the runnable-demo timeout. Make Selenium (snippet too large for the Run-on-Apify URL) and Browser Use (needs an LLM API key) non-runnable with explanatory comments, keep both Scrapling examples runnable, and have the Pydantic example fail cleanly via `Actor.fail` instead of re-raising into a raw traceback.
diff --git a/docs/03_guides/04_selenium.mdx b/docs/03_guides/04_selenium.mdx
@@ -4,9 +4,9 @@ title: Browser automation with Selenium
 description: Build an Apify Actor that scrapes dynamic web pages using Selenium WebDriver.
 ---
 
-import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
+import CodeBlock from '@theme/CodeBlock';
 
-import SeleniumExample from '!!raw-loader!roa-loader!./code/04_selenium.py';
+import SeleniumExample from '!!raw-loader!./code/04_selenium.py';
 
 In this guide, you'll learn how to use [Selenium](https://www.selenium.dev/) for browser automation and web scraping in your Apify Actors.
 
@@ -36,9 +36,10 @@ This is a simple Actor that recursively scrapes data from linked pages on the sa
 
 It uses Selenium ChromeDriver to open the pages in an automated Chrome browser, and to extract the title, headings, and links after the pages load.
 
-<RunnableCodeBlock className="language-python" language="python">
+{/* Not runnable from the docs: the "Run on Apify" link encodes the whole snippet into the URL, and this Actor (with its inline proxy-auth extension) is large enough to exceed the URL length limit and fail with an HTTP 414. */}
+<CodeBlock className="language-python">
     {SeleniumExample}
-</RunnableCodeBlock>
+</CodeBlock>
 
 ## Using Apify Proxy
 
diff --git a/docs/03_guides/06_scrapy.mdx b/docs/03_guides/06_scrapy.mdx
@@ -73,6 +73,7 @@ For further details, see the [Scrapy migration guide](https://docs.apify.com/cli
 
 The following example shows a Scrapy Actor that scrapes page titles and enqueues links found on each page. This example aligns with the structure provided in the Apify Actor templates.
 
+{/* Not runnable from the docs: a Scrapy Actor is a multi-file project, while the "Run on Apify" runner executes a single self-contained snippet. */}
 <Tabs>
     <TabItem value="__main__.py" label="__main__.py">
         <CodeBlock className="language-python">
diff --git a/docs/03_guides/07_scrapling.mdx b/docs/03_guides/07_scrapling.mdx
@@ -4,11 +4,10 @@ title: Adaptive scraping with Scrapling
 description: Build an Apify Actor that scrapes web pages using the Scrapling adaptive web scraping library.
 ---
 
-import CodeBlock from '@theme/CodeBlock';
 import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
 
 import ScraplingExample from '!!raw-loader!roa-loader!./code/07_scrapling.py';
-import ScraplingBrowserScraper from '!!raw-loader!./code/07_scrapling_browser.py';
+import ScraplingBrowserScraper from '!!raw-loader!roa-loader!./code/07_scrapling_browser.py';
 
 In this guide, you'll learn how to use the [Scrapling](https://scrapling.readthedocs.io/) library for adaptive web scraping in your Apify Actors.
 
@@ -101,9 +100,9 @@ scrapling install
 
 To switch the example from HTTP to a real browser, fetch each page through a browser session instead of `AsyncFetcher`. Opening a fresh browser for every page would be wasteful, so `main` enters an `AsyncDynamicSession` once and reuses it for the whole crawl, while `scrape_page` fetches with `session.fetch`. The parsing API is identical, so the extraction code stays the same:
 
-<CodeBlock className="language-python">
+<RunnableCodeBlock className="language-python" language="python">
     {ScraplingBrowserScraper}
-</CodeBlock>
+</RunnableCodeBlock>
 
 Note that:
 
diff --git a/docs/03_guides/09_browser_use.mdx b/docs/03_guides/09_browser_use.mdx
@@ -4,9 +4,9 @@ title: Browser AI agents with Browser Use
 description: Build an Apify Actor that automates a browser with an LLM agent using the Browser Use library.
 ---
 
-import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
+import CodeBlock from '@theme/CodeBlock';
 
-import BrowserUseExample from '!!raw-loader!roa-loader!./code/09_browser_use.py';
+import BrowserUseExample from '!!raw-loader!./code/09_browser_use.py';
 
 In this guide, you'll learn how to use the [Browser Use](https://browser-use.com/) library to drive a browser with an LLM agent in your Apify Actors.
 
@@ -46,9 +46,10 @@ The following Actor runs a Browser Use agent for a single task and stores its st
 
 The whole Actor fits in a single file. A `run_agent_task` helper holds the Browser Use-specific logic: it defines the output schema and builds the LLM, browser, and agent. The `main` coroutine handles the [Actor](https://docs.apify.com/platform/actors) lifecycle, reads the input, sets up [Apify Proxy](https://docs.apify.com/platform/proxy), runs the agent, and stores the result:
 
-<RunnableCodeBlock className="language-python" language="python">
+{/* Not runnable from the docs: the agent needs an LLM API key (OPENAI_API_KEY) that the shared example runner does not provide. */}
+<CodeBlock className="language-python">
     {BrowserUseExample}
-</RunnableCodeBlock>
+</CodeBlock>
 
 Note that:
 
diff --git a/docs/03_guides/11_pydantic.mdx b/docs/03_guides/11_pydantic.mdx
@@ -56,7 +56,7 @@ The following Actor declares its input as a Pydantic `BaseModel`, validates the
 ### About the validation
 
 - `model_validate` parses the raw dictionary into a typed `ActorInput` instance. It fills in defaults and guarantees every field is valid, or raises a `ValidationError` that describes every problem at once.
-- Catching that error, logging a readable summary, and re-raising makes the Actor fail fast with a clear explanation right at the start, rather than crashing with an obscure error somewhere deep in the run. Because the body runs inside `async with Actor:`, the re-raised exception automatically marks the run as `FAILED`.
+- Catching that error, logging a readable summary, and failing the run with <ApiLink to="class/Actor#fail">`Actor.fail`</ApiLink> marks the run as `FAILED` with a clear status message. It fails fast right at the start with a readable explanation, instead of crashing with a raw traceback deeper in the run.
 - The error messages refer to the fields by their input-schema aliases. For invalid input like `{"searchTerms": [], "maxResults": 999, "outputFormat": "xml"}`, the log shows exactly what's wrong:
 
   ```text
diff --git a/docs/03_guides/code/01_beautifulsoup_httpx.py b/docs/03_guides/code/01_beautifulsoup_httpx.py
@@ -82,7 +82,7 @@ async def main() -> None:
             await request_queue.add_request(Request.from_url(url))
 
         # Cap the crawl. Raise or remove the limit to follow more pages.
-        max_requests = 50
+        max_requests = 10
         handled_requests = 0
 
         while handled_requests < max_requests and (
diff --git a/docs/03_guides/code/02_parsel_impit.py b/docs/03_guides/code/02_parsel_impit.py
@@ -82,7 +82,7 @@ async def main() -> None:
             await request_queue.add_request(Request.from_url(url))
 
         # Cap the crawl. Raise or remove the limit to follow more pages.
-        max_requests = 50
+        max_requests = 10
         handled_requests = 0
 
         while handled_requests < max_requests and (
diff --git a/docs/03_guides/code/03_playwright.py b/docs/03_guides/code/03_playwright.py
@@ -94,7 +94,7 @@ async def main() -> None:
             await request_queue.add_request(Request.from_url(url))
 
         # Cap the crawl. Raise or remove the limit to follow more pages.
-        max_requests = 50
+        max_requests = 10
         handled_requests = 0
 
         Actor.log.info('Launching Playwright...')
diff --git a/docs/03_guides/code/04_selenium.py b/docs/03_guides/code/04_selenium.py
@@ -151,7 +151,7 @@ async def main() -> None:
             await request_queue.add_request(Request.from_url(url))
 
         # Cap the crawl. Raise or remove the limit to follow more pages.
-        max_requests = 50
+        max_requests = 10
         handled_requests = 0
 
         # Fresh proxy URL for the run (None if no proxy).
diff --git a/docs/03_guides/code/05_crawlee_beautifulsoup.py b/docs/03_guides/code/05_crawlee_beautifulsoup.py
@@ -51,7 +51,7 @@ async def main() -> None:
             proxy_configuration=proxy_configuration,
             request_handler=router,
             # Cap the crawl. Remove or increase the limit to follow all links.
-            max_requests_per_crawl=50,
+            max_requests_per_crawl=10,
         )
 
         await crawler.run(start_urls)
diff --git a/docs/03_guides/code/05_crawlee_parsel.py b/docs/03_guides/code/05_crawlee_parsel.py
@@ -51,7 +51,7 @@ async def main() -> None:
             proxy_configuration=proxy_configuration,
             request_handler=router,
             # Cap the crawl. Remove or increase the limit to follow all links.
-            max_requests_per_crawl=50,
+            max_requests_per_crawl=10,
         )
 
         await crawler.run(start_urls)
diff --git a/docs/03_guides/code/05_crawlee_playwright.py b/docs/03_guides/code/05_crawlee_playwright.py
@@ -54,7 +54,7 @@ async def main() -> None:
             proxy_configuration=proxy_configuration,
             request_handler=router,
             # Cap the crawl. Remove or increase the limit to follow all links.
-            max_requests_per_crawl=50,
+            max_requests_per_crawl=10,
             headless=True,
             browser_launch_options={'args': browser_args},
         )
diff --git a/docs/03_guides/code/07_scrapling.py b/docs/03_guides/code/07_scrapling.py
@@ -84,7 +84,7 @@ async def main() -> None:
             await request_queue.add_request(Request.from_url(url))
 
         # Cap the crawl. Raise or remove the limit to follow more pages.
-        max_requests = 50
+        max_requests = 10
         handled_requests = 0
 
         while handled_requests < max_requests and (
diff --git a/docs/03_guides/code/07_scrapling_browser.py b/docs/03_guides/code/07_scrapling_browser.py
@@ -79,7 +79,7 @@ async def main() -> None:
             await request_queue.add_request(Request.from_url(url))
 
         # Cap the crawl. Raise or remove the limit to follow more pages.
-        max_requests = 50
+        max_requests = 10
         handled_requests = 0
 
         # Open the browser once and reuse it for every page in the crawl.
diff --git a/docs/03_guides/code/08_crawl4ai.py b/docs/03_guides/code/08_crawl4ai.py
@@ -82,7 +82,7 @@ async def main() -> None:
             await request_queue.add_request(Request.from_url(url))
 
         # Cap the crawl; raise or remove to follow more pages.
-        max_requests = 50
+        max_requests = 10
         handled_requests = 0
 
         # Reuse one headless browser-backed crawler for every request.
diff --git a/docs/03_guides/code/11_pydantic.py b/docs/03_guides/code/11_pydantic.py
@@ -44,9 +44,10 @@ async def main() -> None:
         try:
             actor_input = ActorInput.model_validate(raw_input)
         except ValidationError as exc:
-            # Log a per-field summary, then re-raise to fail the run.
+            # Log a per-field summary and fail the run cleanly, without a raw traceback.
             Actor.log.error('The Actor input is invalid:\n%s', exc)
-            raise
+            await Actor.fail(status_message='The Actor input is invalid.')
+            return
 
         # Work with typed attributes from here on.
         Actor.log.info('Input passed validation: %s', actor_input.model_dump())

Original file line number	Diff line number	Diff line change
`@@ -51,7 +51,7 @@ async def main() -> None:`
`51`	`51`	`proxy_configuration=proxy_configuration,`
`52`	`52`	`request_handler=router,`
`53`	`53`	`# Cap the crawl. Remove or increase the limit to follow all links.`
`54`		`- max_requests_per_crawl=50,`
	`54`	`+ max_requests_per_crawl=10,`
`55`	`55`	`)`
`56`	`56`
`57`	`57`	`await crawler.run(start_urls)`