I have a list of web sites from which I am trying to scrape a given piece of info. For each site, once I have found that info, I want to stop and move on to the next (with several sites being scraped concurrently).
I have tried the following approach (emptying the request queue when my goal is found) :
request_queue = await RequestQueue.open()
crawler = PlaywrightCrawler(
request_provider=request_queue,
headless=True, # Show the browser window.
browser_type='firefox', # Use the Firefox browser.
)
await crawler.add_requests([root_url])
@crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None:
# ...
if found:
await request_queue.drop()
But that's actually raising an error :
ValueError: Request queue with id "default" does not exist.
Any idea how I should proceed to have a finer control over the request queue ? Thanks !
I have a list of web sites from which I am trying to scrape a given piece of info. For each site, once I have found that info, I want to stop and move on to the next (with several sites being scraped concurrently).
I have tried the following approach (emptying the request queue when my goal is found) :
But that's actually raising an error :
Any idea how I should proceed to have a finer control over the request queue ? Thanks !