crawlee-python/docs/examples/run_parallel_crawlers.mdx at e834eb2caf0d65054777b35ae3923849e8c4cba3 · Mantisus/crawlee-python

id	run-parallel-crawlers
title	Run parallel crawlers

import ApiLink from '@site/src/components/ApiLink'; import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';

import RunParallelCrawlersExample from '!!raw-loader!roa-loader!./code_examples/run_parallel_crawlers.py';

This example demonstrates how to run two parallel crawlers where one crawler processes links discovered by another crawler.

In some situations, you may need different approaches for scraping data from a website. For example, you might use PlaywrightCrawler for navigating JavaScript-heavy pages and a faster, more lightweight ParselCrawler for processing static pages. One way to solve this is to use AdaptivePlaywrightCrawler, see the Adaptive Playwright crawler example to learn more.

The code below demonstrates an alternative approach using two separate crawlers. Links are passed between crawlers via RequestQueue aliases. The keep_alive option allows the Playwright crawler to run in the background and wait for incoming links without stopping when its queue is empty. You can also use different storage clients for each crawler without losing the ability to pass links between queues. Learn more about available storage clients in this guide.

{RunParallelCrawlersExample}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

run_parallel_crawlers.mdx

Latest commit

History

run_parallel_crawlers.mdx

File metadata and controls