docs: fix broken links in current and versioned docs

vdusek · vdusek · commit 7e4647e5faa5 · 2026-06-18T13:42:25.000+02:00
diff --git a/docs/deployment/google_cloud_run.mdx b/docs/deployment/google_cloud_run.mdx
@@ -17,7 +17,7 @@ GCP Cloud Run allows you to deploy using Docker containers, giving you full cont
 
 ## Preparing the project
 
-We'll prepare our project using [Litestar](https://litestar.dev/) and the [Uvicorn](https://www.uvicorn.org/) web server. The HTTP server handler will wrap the crawler to communicate with clients. Because the Cloud Run platform sees only an opaque Docker container, we have to take care of this bit ourselves.
+We'll prepare our project using [Litestar](https://litestar.dev/) and the [Uvicorn](https://uvicorn.dev/) web server. The HTTP server handler will wrap the crawler to communicate with clients. Because the Cloud Run platform sees only an opaque Docker container, we have to take care of this bit ourselves.
 
 :::info
 
diff --git a/docs/examples/add_data_to_dataset.mdx b/docs/examples/add_data_to_dataset.mdx
@@ -12,7 +12,7 @@ import BeautifulSoupExample from '!!raw-loader!roa-loader!./code_examples/add_da
 import PlaywrightExample from '!!raw-loader!roa-loader!./code_examples/add_data_to_dataset_pw.py';
 import DatasetExample from '!!raw-loader!roa-loader!./code_examples/add_data_to_dataset_dataset.py';
 
-This example demonstrates how to store extracted data into datasets using the <ApiLink to="class/PushDataFunction#open">`context.push_data`</ApiLink> helper function. If the specified dataset does not already exist, it will be created automatically. Additionally, you can save data to custom datasets by providing `dataset_id` or `dataset_name` parameters to the <ApiLink to="class/PushDataFunction#open">`push_data`</ApiLink> function.
+This example demonstrates how to store extracted data into datasets using the <ApiLink to="class/PushDataFunction">`context.push_data`</ApiLink> helper function. If the specified dataset does not already exist, it will be created automatically. Additionally, you can save data to custom datasets by providing `dataset_id` or `dataset_name` parameters to the <ApiLink to="class/PushDataFunction">`push_data`</ApiLink> function.
 
 <Tabs groupId="main">
     <TabItem value="BeautifulSoupCrawler" label="BeautifulSoupCrawler">
diff --git a/docs/guides/architecture_overview.mdx b/docs/guides/architecture_overview.mdx
@@ -70,7 +70,7 @@ PlaywrightCrawler --|> StagehandCrawler
 
 ### HTTP crawlers
 
-HTTP crawlers use HTTP clients to fetch pages and parse them with HTML parsing libraries. They are fast and efficient for sites that do not require JavaScript rendering. HTTP clients are Crawlee components that wrap around HTTP libraries like [httpx](https://www.python-httpx.org/), [curl-impersonate](https://github.com/lwthiker/curl-impersonate) or [impit](https://apify.github.io/impit) and handle HTTP communication for requests and responses. You can learn more about them in the [HTTP clients guide](./http-clients).
+HTTP crawlers use HTTP clients to fetch pages and parse them with HTML parsing libraries. They are fast and efficient for sites that do not require JavaScript rendering. HTTP clients are Crawlee components that wrap around HTTP libraries like [httpx](https://www.python-httpx.org/), [curl-impersonate](https://github.com/lwthiker/curl-impersonate) or [impit](https://github.com/apify/impit) and handle HTTP communication for requests and responses. You can learn more about them in the [HTTP clients guide](./http-clients).
 
 HTTP crawlers inherit from <ApiLink to="class/AbstractHttpCrawler">`AbstractHttpCrawler`</ApiLink> and there are three crawlers that belong to this category:
 
@@ -235,7 +235,7 @@ Crawlee provides several built-in storage client implementations:
 
 - <ApiLink to="class/MemoryStorageClient">`MemoryStorageClient`</ApiLink> - Stores data in memory with no persistence (ideal for testing and fast operations).
 - <ApiLink to="class/FileSystemStorageClient">`FileSystemStorageClient`</ApiLink> - Provides persistent file system storage with caching (default client).
-- [`ApifyStorageClient`](https://docs.apify.com/sdk/python/reference/class/ApifyStorageClient) - Manages storage on the [Apify platform](https://apify.com/) (cloud-based). It is implemented in the [Apify SDK](https://github.com/apify/apify-sdk-python). You can find more information about it in the [Apify SDK documentation](https://docs.apify.com/sdk/python/docs/overview/introduction).
+- [`ApifyStorageClient`](https://docs.apify.com/sdk/python/reference/class/ApifyStorageClient) - Manages storage on the [Apify platform](https://apify.com/) (cloud-based). It is implemented in the [Apify SDK](https://github.com/apify/apify-sdk-python). You can find more information about it in the [Apify SDK documentation](https://docs.apify.com/sdk/python/).
 
 ```mermaid
 ---
@@ -332,7 +332,7 @@ Crawlee provides several implementations of the event manager:
 
 - <ApiLink to="class/EventManager">`EventManager`</ApiLink> is the base class for event management in Crawlee.
 - <ApiLink to="class/LocalEventManager">`LocalEventManager`</ApiLink> extends the base event manager for local environments by automatically emitting `SYSTEM_INFO` events at regular intervals. This provides real-time system metrics including CPU usage and memory consumption, which are essential for internal components like the <ApiLink to="class/Snapshotter">`Snapshotter`</ApiLink> and <ApiLink to="class/AutoscaledPool">`AutoscaledPool`</ApiLink>.
-- [`ApifyEventManager`](https://docs.apify.com/sdk/python/reference/class/PlatformEventManager) - Manages events on the [Apify platform](https://apify.com/) (cloud-based). It is implemented in the [Apify SDK](https://docs.apify.com/sdk/python/).
+- [`ApifyEventManager`](https://docs.apify.com/sdk/python/reference/class/ApifyEventManager) - Manages events on the [Apify platform](https://apify.com/) (cloud-based). It is implemented in the [Apify SDK](https://docs.apify.com/sdk/python/).
 
 :::info
 
diff --git a/docs/guides/http_clients.mdx b/docs/guides/http_clients.mdx
@@ -13,7 +13,7 @@ import ParselHttpxExample from '!!raw-loader!roa-loader!./code_examples/http_cli
 import ParselCurlImpersonateExample from '!!raw-loader!roa-loader!./code_examples/http_clients/parsel_curl_impersonate_example.py';
 import ParselImpitExample from '!!raw-loader!roa-loader!./code_examples/http_clients/parsel_impit_example.py';
 
-HTTP clients are utilized by HTTP-based crawlers (e.g., <ApiLink to="class/ParselCrawler">`ParselCrawler`</ApiLink> and <ApiLink to="class/BeautifulSoupCrawler">`BeautifulSoupCrawler`</ApiLink>) to communicate with web servers. They use external HTTP libraries for communication rather than a browser. Examples of such libraries include [httpx](https://pypi.org/project/httpx/), [aiohttp](https://pypi.org/project/aiohttp/), [curl-cffi](https://pypi.org/project/curl-cffi/), and [impit](https://apify.github.io/impit/). After retrieving page content, an HTML parsing library is typically used to facilitate data extraction. Examples of such libraries include [beautifulsoup](https://pypi.org/project/beautifulsoup4/), [parsel](https://pypi.org/project/parsel/), [selectolax](https://pypi.org/project/selectolax/), and [pyquery](https://pypi.org/project/pyquery/). These crawlers are faster than browser-based crawlers but cannot execute client-side JavaScript.
+HTTP clients are utilized by HTTP-based crawlers (e.g., <ApiLink to="class/ParselCrawler">`ParselCrawler`</ApiLink> and <ApiLink to="class/BeautifulSoupCrawler">`BeautifulSoupCrawler`</ApiLink>) to communicate with web servers. They use external HTTP libraries for communication rather than a browser. Examples of such libraries include [httpx](https://pypi.org/project/httpx/), [aiohttp](https://pypi.org/project/aiohttp/), [curl-cffi](https://pypi.org/project/curl-cffi/), and [impit](https://pypi.org/project/impit/). After retrieving page content, an HTML parsing library is typically used to facilitate data extraction. Examples of such libraries include [beautifulsoup](https://pypi.org/project/beautifulsoup4/), [parsel](https://pypi.org/project/parsel/), [selectolax](https://pypi.org/project/selectolax/), and [pyquery](https://pypi.org/project/pyquery/). These crawlers are faster than browser-based crawlers but cannot execute client-side JavaScript.
 
 ```mermaid
 ---
diff --git a/docs/introduction/02_first_crawler.mdx b/docs/introduction/02_first_crawler.mdx
@@ -86,7 +86,7 @@ When you run this code, you'll see exactly the same output as with the earlier,
 
 :::info
 
-This method not only makes the code shorter, it will help with performance too! Internally it calls  <ApiLink to="class/RequestQueue#add_requests_batched">`RequestQueue.add_requests_batched`</ApiLink> method. It will wait only for the initial batch of 1000 requests to be added to the queue before resolving, which means the processing will start almost instantly. After that, it will continue adding the rest of the requests in the background (again, in batches of 1000 items, once every second).
+This method not only makes the code shorter, it will help with performance too! Internally it calls  <ApiLink to="class/RequestQueue#add_requests">`RequestQueue.add_requests`</ApiLink> method. It will wait only for the initial batch of 1000 requests to be added to the queue before resolving, which means the processing will start almost instantly. After that, it will continue adding the rest of the requests in the background (again, in batches of 1000 items, once every second).
 
 :::
 
diff --git a/docs/upgrading/upgrading_to_v1.md b/docs/upgrading/upgrading_to_v1.md
@@ -34,7 +34,7 @@ HeaderGeneratorOptions(browsers=['safari'])
 
 ## New default HTTP client
 
-Crawlee v1.0 now uses `ImpitHttpClient` (based on [impit](https://apify.github.io/impit/) library) as the **default HTTP client**, replacing `HttpxHttpClient` (based on [httpx](https://www.python-httpx.org/) library).
+Crawlee v1.0 now uses `ImpitHttpClient` (based on [impit](https://github.com/apify/impit) library) as the **default HTTP client**, replacing `HttpxHttpClient` (based on [httpx](https://www.python-httpx.org/) library).
 
 If you want to keep using `HttpxHttpClient`, install Crawlee with `httpx` extra, e.g. using pip:
 
diff --git a/website/versioned_docs/version-0.6/deployment/google_cloud_run.mdx b/website/versioned_docs/version-0.6/deployment/google_cloud_run.mdx
@@ -17,7 +17,7 @@ GCP Cloud Run allows you to deploy using Docker containers, giving you full cont
 
 ## Preparing the project
 
-We'll prepare our project using [Litestar](https://litestar.dev/) and the [Uvicorn](https://www.uvicorn.org/) web server. The HTTP server handler will wrap the crawler to communicate with clients. Because the Cloud Run platform sees only an opaque Docker container, we have to take care of this bit ourselves.
+We'll prepare our project using [Litestar](https://litestar.dev/) and the [Uvicorn](https://uvicorn.dev/) web server. The HTTP server handler will wrap the crawler to communicate with clients. Because the Cloud Run platform sees only an opaque Docker container, we have to take care of this bit ourselves.
 
 :::info
 
diff --git a/website/versioned_docs/version-0.6/examples/add_data_to_dataset.mdx b/website/versioned_docs/version-0.6/examples/add_data_to_dataset.mdx
@@ -12,7 +12,7 @@ import BeautifulSoupExample from '!!raw-loader!roa-loader!./code_examples/add_da
 import PlaywrightExample from '!!raw-loader!roa-loader!./code_examples/add_data_to_dataset_pw.py';
 import DatasetExample from '!!raw-loader!roa-loader!./code_examples/add_data_to_dataset_dataset.py';
 
-This example demonstrates how to store extracted data into datasets using the <ApiLink to="class/PushDataFunction#open">`context.push_data`</ApiLink> helper function. If the specified dataset does not already exist, it will be created automatically. Additionally, you can save data to custom datasets by providing `dataset_id` or `dataset_name` parameters to the <ApiLink to="class/PushDataFunction#open">`push_data`</ApiLink> function.
+This example demonstrates how to store extracted data into datasets using the <ApiLink to="class/PushDataFunction">`context.push_data`</ApiLink> helper function. If the specified dataset does not already exist, it will be created automatically. Additionally, you can save data to custom datasets by providing `dataset_id` or `dataset_name` parameters to the <ApiLink to="class/PushDataFunction">`push_data`</ApiLink> function.
 
 <Tabs groupId="main">
     <TabItem value="BeautifulSoupCrawler" label="BeautifulSoupCrawler">
diff --git a/website/versioned_docs/version-0.6/introduction/01_setting_up.mdx b/website/versioned_docs/version-0.6/introduction/01_setting_up.mdx
@@ -111,7 +111,7 @@ First, ensure you have Pipx installed. You can check if Pipx is installed by run
 pipx --version
 ```
 
-If Pipx is not installed, follow the official [installation guide](https://pipx.pypa.io/stable/installation/).
+If Pipx is not installed, follow the official [installation guide](https://pipx.pypa.io/stable/).
 
 Then, run the Crawlee CLI using Pipx and choose from the available templates:
 
diff --git a/website/versioned_docs/version-1.7/deployment/google_cloud_run.mdx b/website/versioned_docs/version-1.7/deployment/google_cloud_run.mdx
@@ -17,7 +17,7 @@ GCP Cloud Run allows you to deploy using Docker containers, giving you full cont
 
 ## Preparing the project
 
-We'll prepare our project using [Litestar](https://litestar.dev/) and the [Uvicorn](https://www.uvicorn.org/) web server. The HTTP server handler will wrap the crawler to communicate with clients. Because the Cloud Run platform sees only an opaque Docker container, we have to take care of this bit ourselves.
+We'll prepare our project using [Litestar](https://litestar.dev/) and the [Uvicorn](https://uvicorn.dev/) web server. The HTTP server handler will wrap the crawler to communicate with clients. Because the Cloud Run platform sees only an opaque Docker container, we have to take care of this bit ourselves.
 
 :::info
 
diff --git a/website/versioned_docs/version-1.7/examples/add_data_to_dataset.mdx b/website/versioned_docs/version-1.7/examples/add_data_to_dataset.mdx
@@ -12,7 +12,7 @@ import BeautifulSoupExample from '!!raw-loader!roa-loader!./code_examples/add_da
 import PlaywrightExample from '!!raw-loader!roa-loader!./code_examples/add_data_to_dataset_pw.py';
 import DatasetExample from '!!raw-loader!roa-loader!./code_examples/add_data_to_dataset_dataset.py';
 
-This example demonstrates how to store extracted data into datasets using the <ApiLink to="class/PushDataFunction#open">`context.push_data`</ApiLink> helper function. If the specified dataset does not already exist, it will be created automatically. Additionally, you can save data to custom datasets by providing `dataset_id` or `dataset_name` parameters to the <ApiLink to="class/PushDataFunction#open">`push_data`</ApiLink> function.
+This example demonstrates how to store extracted data into datasets using the <ApiLink to="class/PushDataFunction">`context.push_data`</ApiLink> helper function. If the specified dataset does not already exist, it will be created automatically. Additionally, you can save data to custom datasets by providing `dataset_id` or `dataset_name` parameters to the <ApiLink to="class/PushDataFunction">`push_data`</ApiLink> function.
 
 <Tabs groupId="main">
     <TabItem value="BeautifulSoupCrawler" label="BeautifulSoupCrawler">
diff --git a/website/versioned_docs/version-1.7/guides/architecture_overview.mdx b/website/versioned_docs/version-1.7/guides/architecture_overview.mdx
@@ -70,7 +70,7 @@ PlaywrightCrawler --|> StagehandCrawler
 
 ### HTTP crawlers
 
-HTTP crawlers use HTTP clients to fetch pages and parse them with HTML parsing libraries. They are fast and efficient for sites that do not require JavaScript rendering. HTTP clients are Crawlee components that wrap around HTTP libraries like [httpx](https://www.python-httpx.org/), [curl-impersonate](https://github.com/lwthiker/curl-impersonate) or [impit](https://apify.github.io/impit) and handle HTTP communication for requests and responses. You can learn more about them in the [HTTP clients guide](./http-clients).
+HTTP crawlers use HTTP clients to fetch pages and parse them with HTML parsing libraries. They are fast and efficient for sites that do not require JavaScript rendering. HTTP clients are Crawlee components that wrap around HTTP libraries like [httpx](https://www.python-httpx.org/), [curl-impersonate](https://github.com/lwthiker/curl-impersonate) or [impit](https://github.com/apify/impit) and handle HTTP communication for requests and responses. You can learn more about them in the [HTTP clients guide](./http-clients).
 
 HTTP crawlers inherit from <ApiLink to="class/AbstractHttpCrawler">`AbstractHttpCrawler`</ApiLink> and there are three crawlers that belong to this category:
 
@@ -235,7 +235,7 @@ Crawlee provides several built-in storage client implementations:
 
 - <ApiLink to="class/MemoryStorageClient">`MemoryStorageClient`</ApiLink> - Stores data in memory with no persistence (ideal for testing and fast operations).
 - <ApiLink to="class/FileSystemStorageClient">`FileSystemStorageClient`</ApiLink> - Provides persistent file system storage with caching (default client).
-- [`ApifyStorageClient`](https://docs.apify.com/sdk/python/reference/class/ApifyStorageClient) - Manages storage on the [Apify platform](https://apify.com/) (cloud-based). It is implemented in the [Apify SDK](https://github.com/apify/apify-sdk-python). You can find more information about it in the [Apify SDK documentation](https://docs.apify.com/sdk/python/docs/overview/introduction).
+- [`ApifyStorageClient`](https://docs.apify.com/sdk/python/reference/class/ApifyStorageClient) - Manages storage on the [Apify platform](https://apify.com/) (cloud-based). It is implemented in the [Apify SDK](https://github.com/apify/apify-sdk-python). You can find more information about it in the [Apify SDK documentation](https://docs.apify.com/sdk/python/).
 
 ```mermaid
 ---
@@ -332,7 +332,7 @@ Crawlee provides several implementations of the event manager:
 
 - <ApiLink to="class/EventManager">`EventManager`</ApiLink> is the base class for event management in Crawlee.
 - <ApiLink to="class/LocalEventManager">`LocalEventManager`</ApiLink> extends the base event manager for local environments by automatically emitting `SYSTEM_INFO` events at regular intervals. This provides real-time system metrics including CPU usage and memory consumption, which are essential for internal components like the <ApiLink to="class/Snapshotter">`Snapshotter`</ApiLink> and <ApiLink to="class/AutoscaledPool">`AutoscaledPool`</ApiLink>.
-- [`ApifyEventManager`](https://docs.apify.com/sdk/python/reference/class/PlatformEventManager) - Manages events on the [Apify platform](https://apify.com/) (cloud-based). It is implemented in the [Apify SDK](https://docs.apify.com/sdk/python/).
+- [`ApifyEventManager`](https://docs.apify.com/sdk/python/reference/class/ApifyEventManager) - Manages events on the [Apify platform](https://apify.com/) (cloud-based). It is implemented in the [Apify SDK](https://docs.apify.com/sdk/python/).
 
 :::info
 
diff --git a/website/versioned_docs/version-1.7/guides/http_clients.mdx b/website/versioned_docs/version-1.7/guides/http_clients.mdx
@@ -13,7 +13,7 @@ import ParselHttpxExample from '!!raw-loader!roa-loader!./code_examples/http_cli
 import ParselCurlImpersonateExample from '!!raw-loader!roa-loader!./code_examples/http_clients/parsel_curl_impersonate_example.py';
 import ParselImpitExample from '!!raw-loader!roa-loader!./code_examples/http_clients/parsel_impit_example.py';
 
-HTTP clients are utilized by HTTP-based crawlers (e.g., <ApiLink to="class/ParselCrawler">`ParselCrawler`</ApiLink> and <ApiLink to="class/BeautifulSoupCrawler">`BeautifulSoupCrawler`</ApiLink>) to communicate with web servers. They use external HTTP libraries for communication rather than a browser. Examples of such libraries include [httpx](https://pypi.org/project/httpx/), [aiohttp](https://pypi.org/project/aiohttp/), [curl-cffi](https://pypi.org/project/curl-cffi/), and [impit](https://apify.github.io/impit/). After retrieving page content, an HTML parsing library is typically used to facilitate data extraction. Examples of such libraries include [beautifulsoup](https://pypi.org/project/beautifulsoup4/), [parsel](https://pypi.org/project/parsel/), [selectolax](https://pypi.org/project/selectolax/), and [pyquery](https://pypi.org/project/pyquery/). These crawlers are faster than browser-based crawlers but cannot execute client-side JavaScript.
+HTTP clients are utilized by HTTP-based crawlers (e.g., <ApiLink to="class/ParselCrawler">`ParselCrawler`</ApiLink> and <ApiLink to="class/BeautifulSoupCrawler">`BeautifulSoupCrawler`</ApiLink>) to communicate with web servers. They use external HTTP libraries for communication rather than a browser. Examples of such libraries include [httpx](https://pypi.org/project/httpx/), [aiohttp](https://pypi.org/project/aiohttp/), [curl-cffi](https://pypi.org/project/curl-cffi/), and [impit](https://pypi.org/project/impit/). After retrieving page content, an HTML parsing library is typically used to facilitate data extraction. Examples of such libraries include [beautifulsoup](https://pypi.org/project/beautifulsoup4/), [parsel](https://pypi.org/project/parsel/), [selectolax](https://pypi.org/project/selectolax/), and [pyquery](https://pypi.org/project/pyquery/). These crawlers are faster than browser-based crawlers but cannot execute client-side JavaScript.
 
 ```mermaid
 ---
diff --git a/website/versioned_docs/version-1.7/introduction/02_first_crawler.mdx b/website/versioned_docs/version-1.7/introduction/02_first_crawler.mdx
diff --git a/website/versioned_docs/version-1.7/upgrading/upgrading_to_v1.md b/website/versioned_docs/version-1.7/upgrading/upgrading_to_v1.md