crawlee-python/docs/guides/request_throttling.mdx at 44b93bbfbc5523d90c5bef2f00afaf6e7b5653c3 · apify/crawlee-python

id	request-throttling
title	Request throttling
description	How to throttle requests per domain using the ThrottlingRequestManager.

import ApiLink from '@site/src/components/ApiLink'; import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';

import ThrottlingExample from '!!raw-loader!roa-loader!./code_examples/request_throttling/throttling_example.py';

When crawling websites that enforce rate limits (HTTP 429) or specify crawl-delay in their robots.txt, you need a way to throttle requests per domain without blocking unrelated domains. The ThrottlingRequestManager provides exactly this.

Overview

The ThrottlingRequestManager wraps a RequestQueue and manages per-domain throttling. You specify which domains to throttle at initialization, and the manager automatically:

Routes requests for listed domains into dedicated sub-queues at insertion time.
Enforces delays from HTTP 429 responses (exponential backoff) and robots.txt crawl-delay directives.
Schedules fairly by fetching from the domain that has been waiting the longest.
Sleeps intelligently when all configured domains are throttled, instead of busy-waiting.

Requests for domains not in the configured list pass through to the main queue without any throttling.

Basic usage

To use request throttling, create a ThrottlingRequestManager with the domains you want to throttle and pass it as the request_manager to your crawler:

{ThrottlingExample}

How it works

Insertion-time routing: When you add requests via add_request or add_requests, each request is checked against the configured domain list. Matching requests go directly into a per-domain sub-queue; all others go to the main queue. This eliminates request duplication entirely.
429 backoff: When the crawler detects an HTTP 429 response, the ThrottlingRequestManager records an exponential backoff delay for that domain (starting at 2s, doubling up to 60s). If the response includes a Retry-After header, that value takes priority.
Crawl-delay: If robots.txt specifies a crawl-delay, the manager enforces a minimum interval between requests to that domain.
Fair scheduling: fetch_next_request sorts available sub-queues by how long each domain has been waiting, ensuring no domain is starved.

:::tip

The ThrottlingRequestManager is an opt-in feature. If you don't pass it to your crawler, requests are processed normally without any per-domain throttling.

:::

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overview

Basic usage

How it works

FilesExpand file tree

request_throttling.mdx

Latest commit

History

request_throttling.mdx

File metadata and controls

Overview

Basic usage

How it works