Skip to content

Commit d3ced60

Browse files
authored
docs: Add post_navigation_hook to Playwright crawler guide (#1800)
### Description - Update `Playwright crawler` guide ### Issues - Follow-up: #1795
1 parent 8812d48 commit d3ced60

File tree

2 files changed

+13
-3
lines changed

2 files changed

+13
-3
lines changed

docs/guides/code_examples/playwright_crawler/pre_navigation_hook_example.py renamed to docs/guides/code_examples/playwright_crawler/navigation_hooks_example.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,10 @@
33
from crawlee.crawlers import (
44
PlaywrightCrawler,
55
PlaywrightCrawlingContext,
6+
PlaywrightPostNavCrawlingContext,
67
PlaywrightPreNavCrawlingContext,
78
)
9+
from crawlee.errors import SessionError
810

911

1012
async def main() -> None:
@@ -24,6 +26,14 @@ async def configure_page(context: PlaywrightPreNavCrawlingContext) -> None:
2426
# to speed up page loading
2527
await context.block_requests()
2628

29+
@crawler.post_navigation_hook
30+
async def custom_captcha_check(context: PlaywrightPostNavCrawlingContext) -> None:
31+
# check if the page contains a captcha
32+
captcha_element = context.page.locator('input[name="captcha"]').first
33+
if await captcha_element.is_visible():
34+
context.log.warning('Captcha detected! Skipping the page.')
35+
raise SessionError('Captcha detected')
36+
2737
# Run the crawler with the initial list of URLs.
2838
await crawler.run(['https://crawlee.dev'])
2939

docs/guides/playwright_crawler.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
1010

1111
import MultipleLaunchExample from '!!raw-loader!roa-loader!./code_examples/playwright_crawler/multiple_launch_example.py';
1212
import BrowserConfigurationExample from '!!raw-loader!roa-loader!./code_examples/playwright_crawler/browser_configuration_example.py';
13-
import PreNavigationExample from '!!raw-loader!roa-loader!./code_examples/playwright_crawler/pre_navigation_hook_example.py';
13+
import NavigationHooksExample from '!!raw-loader!roa-loader!./code_examples/playwright_crawler/navigation_hooks_example.py';
1414
import BrowserPoolPageHooksExample from '!!raw-loader!roa-loader!./code_examples/playwright_crawler/browser_pool_page_hooks_example.py';
1515
import PluginBrowserConfigExample from '!!raw-loader!./code_examples/playwright_crawler/plugin_browser_configuration_example.py';
1616

@@ -67,10 +67,10 @@ For additional setup or event-driven actions around page creation and closure, t
6767

6868
## Navigation hooks
6969

70-
Navigation hooks allow for additional configuration at specific points during page navigation. For example, the <ApiLink to="class/PlaywrightCrawler#pre_navigation_hook">`pre_navigation_hook`</ApiLink> is called before each navigation and provides <ApiLink to="class/PlaywrightPreNavCrawlingContext">`PlaywrightPreNavCrawlingContext`</ApiLink> - including the [page](https://playwright.dev/python/docs/api/class-page) instance and a <ApiLink to="class/PlaywrightPreNavCrawlingContext#block_requests">`block_requests`</ApiLink> helper for filtering unwanted resource types and URL patterns. See the [block requests example](https://crawlee.dev/python/docs/examples/playwright-crawler-with-block-requests) for a dedicated walkthrough.
70+
Navigation hooks allow for additional configuration at specific points during page navigation. The <ApiLink to="class/PlaywrightCrawler#pre_navigation_hook">`pre_navigation_hook`</ApiLink> is called before each navigation and provides <ApiLink to="class/PlaywrightPreNavCrawlingContext">`PlaywrightPreNavCrawlingContext`</ApiLink> - including the [page](https://playwright.dev/python/docs/api/class-page) instance and a <ApiLink to="class/PlaywrightPreNavCrawlingContext#block_requests">`block_requests`</ApiLink> helper for filtering unwanted resource types and URL patterns. See the [block requests example](https://crawlee.dev/python/docs/examples/playwright-crawler-with-block-requests) for a dedicated walkthrough. Similarly, the <ApiLink to="class/PlaywrightCrawler#post_navigation_hook">`post_navigation_hook`</ApiLink> is called after each navigation and provides <ApiLink to="class/PlaywrightPostNavCrawlingContext">`PlaywrightPostNavCrawlingContext`</ApiLink> - useful for post-load checks such as detecting CAPTCHAs or verifying page state.
7171

7272
<RunnableCodeBlock className="language-python" language="python">
73-
{PreNavigationExample}
73+
{NavigationHooksExample}
7474
</RunnableCodeBlock>
7575

7676
## Conclusion

0 commit comments

Comments
 (0)