Skip to content

Commit 9a8921e

Browse files
authored
docs: fix broken links in current and versioned docs (#1977)
Fixes broken links across the documentation, found by running the [lychee](https://github.com/lycheeverse/lychee) link checker over the built docs (the same setup used in apify-docs). Fixed in the current docs (`docs/`) and in both served versioned snapshots (`1.7` and `0.6`): - `apify.github.io/impit` (dead docs site) → impit's GitHub repo / PyPI page, matching each list's existing convention - `docs.apify.com/sdk/python/docs/overview/introduction` (404) → the SDK docs landing page - `.../reference/class/PlatformEventManager` (404) → renamed to `ApifyEventManager` - `www.uvicorn.org` (domain DNS is dead) → `uvicorn.dev`, Uvicorn's current docs site - `class/PushDataFunction#open` (no such anchor) → `class/PushDataFunction` - `class/RequestQueue#add_requests_batched` → `#add_requests` (method renamed in v1). Left unchanged in `0.6`, where the old name still exists. - `pipx.pypa.io/stable/installation/` (404, `0.6` only) → `pipx.pypa.io/stable/` Verified by rebuilding the docs (zero broken anchors from doc pages) and re-running lychee (zero errors) for all three versions. The auto-generated API reference and changelog were out of scope.
1 parent 970f93b commit 9a8921e

15 files changed

Lines changed: 19 additions & 19 deletions

File tree

docs/deployment/google_cloud_run.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ GCP Cloud Run allows you to deploy using Docker containers, giving you full cont
1717

1818
## Preparing the project
1919

20-
We'll prepare our project using [Litestar](https://litestar.dev/) and the [Uvicorn](https://www.uvicorn.org/) web server. The HTTP server handler will wrap the crawler to communicate with clients. Because the Cloud Run platform sees only an opaque Docker container, we have to take care of this bit ourselves.
20+
We'll prepare our project using [Litestar](https://litestar.dev/) and the [Uvicorn](https://uvicorn.dev/) web server. The HTTP server handler will wrap the crawler to communicate with clients. Because the Cloud Run platform sees only an opaque Docker container, we have to take care of this bit ourselves.
2121

2222
:::info
2323

docs/examples/add_data_to_dataset.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ import BeautifulSoupExample from '!!raw-loader!roa-loader!./code_examples/add_da
1212
import PlaywrightExample from '!!raw-loader!roa-loader!./code_examples/add_data_to_dataset_pw.py';
1313
import DatasetExample from '!!raw-loader!roa-loader!./code_examples/add_data_to_dataset_dataset.py';
1414

15-
This example demonstrates how to store extracted data into datasets using the <ApiLink to="class/PushDataFunction#open">`context.push_data`</ApiLink> helper function. If the specified dataset does not already exist, it will be created automatically. Additionally, you can save data to custom datasets by providing `dataset_id` or `dataset_name` parameters to the <ApiLink to="class/PushDataFunction#open">`push_data`</ApiLink> function.
15+
This example demonstrates how to store extracted data into datasets using the <ApiLink to="class/PushDataFunction">`context.push_data`</ApiLink> helper function. If the specified dataset does not already exist, it will be created automatically. Additionally, you can save data to custom datasets by providing `dataset_id` or `dataset_name` parameters to the <ApiLink to="class/PushDataFunction">`push_data`</ApiLink> function.
1616

1717
<Tabs groupId="main">
1818
<TabItem value="BeautifulSoupCrawler" label="BeautifulSoupCrawler">

docs/guides/architecture_overview.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ PlaywrightCrawler --|> StagehandCrawler
7070

7171
### HTTP crawlers
7272

73-
HTTP crawlers use HTTP clients to fetch pages and parse them with HTML parsing libraries. They are fast and efficient for sites that do not require JavaScript rendering. HTTP clients are Crawlee components that wrap around HTTP libraries like [httpx](https://www.python-httpx.org/), [curl-impersonate](https://github.com/lwthiker/curl-impersonate) or [impit](https://apify.github.io/impit) and handle HTTP communication for requests and responses. You can learn more about them in the [HTTP clients guide](./http-clients).
73+
HTTP crawlers use HTTP clients to fetch pages and parse them with HTML parsing libraries. They are fast and efficient for sites that do not require JavaScript rendering. HTTP clients are Crawlee components that wrap around HTTP libraries like [httpx](https://www.python-httpx.org/), [curl-impersonate](https://github.com/lwthiker/curl-impersonate) or [impit](https://github.com/apify/impit) and handle HTTP communication for requests and responses. You can learn more about them in the [HTTP clients guide](./http-clients).
7474

7575
HTTP crawlers inherit from <ApiLink to="class/AbstractHttpCrawler">`AbstractHttpCrawler`</ApiLink> and there are three crawlers that belong to this category:
7676

@@ -235,7 +235,7 @@ Crawlee provides several built-in storage client implementations:
235235

236236
- <ApiLink to="class/MemoryStorageClient">`MemoryStorageClient`</ApiLink> - Stores data in memory with no persistence (ideal for testing and fast operations).
237237
- <ApiLink to="class/FileSystemStorageClient">`FileSystemStorageClient`</ApiLink> - Provides persistent file system storage with caching (default client).
238-
- [`ApifyStorageClient`](https://docs.apify.com/sdk/python/reference/class/ApifyStorageClient) - Manages storage on the [Apify platform](https://apify.com/) (cloud-based). It is implemented in the [Apify SDK](https://github.com/apify/apify-sdk-python). You can find more information about it in the [Apify SDK documentation](https://docs.apify.com/sdk/python/docs/overview/introduction).
238+
- [`ApifyStorageClient`](https://docs.apify.com/sdk/python/reference/class/ApifyStorageClient) - Manages storage on the [Apify platform](https://apify.com/) (cloud-based). It is implemented in the [Apify SDK](https://github.com/apify/apify-sdk-python). You can find more information about it in the [Apify SDK documentation](https://docs.apify.com/sdk/python/).
239239

240240
```mermaid
241241
---
@@ -332,7 +332,7 @@ Crawlee provides several implementations of the event manager:
332332

333333
- <ApiLink to="class/EventManager">`EventManager`</ApiLink> is the base class for event management in Crawlee.
334334
- <ApiLink to="class/LocalEventManager">`LocalEventManager`</ApiLink> extends the base event manager for local environments by automatically emitting `SYSTEM_INFO` events at regular intervals. This provides real-time system metrics including CPU usage and memory consumption, which are essential for internal components like the <ApiLink to="class/Snapshotter">`Snapshotter`</ApiLink> and <ApiLink to="class/AutoscaledPool">`AutoscaledPool`</ApiLink>.
335-
- [`ApifyEventManager`](https://docs.apify.com/sdk/python/reference/class/PlatformEventManager) - Manages events on the [Apify platform](https://apify.com/) (cloud-based). It is implemented in the [Apify SDK](https://docs.apify.com/sdk/python/).
335+
- [`ApifyEventManager`](https://docs.apify.com/sdk/python/reference/class/ApifyEventManager) - Manages events on the [Apify platform](https://apify.com/) (cloud-based). It is implemented in the [Apify SDK](https://docs.apify.com/sdk/python/).
336336

337337
:::info
338338

docs/guides/http_clients.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ import ParselHttpxExample from '!!raw-loader!roa-loader!./code_examples/http_cli
1313
import ParselCurlImpersonateExample from '!!raw-loader!roa-loader!./code_examples/http_clients/parsel_curl_impersonate_example.py';
1414
import ParselImpitExample from '!!raw-loader!roa-loader!./code_examples/http_clients/parsel_impit_example.py';
1515

16-
HTTP clients are utilized by HTTP-based crawlers (e.g., <ApiLink to="class/ParselCrawler">`ParselCrawler`</ApiLink> and <ApiLink to="class/BeautifulSoupCrawler">`BeautifulSoupCrawler`</ApiLink>) to communicate with web servers. They use external HTTP libraries for communication rather than a browser. Examples of such libraries include [httpx](https://pypi.org/project/httpx/), [aiohttp](https://pypi.org/project/aiohttp/), [curl-cffi](https://pypi.org/project/curl-cffi/), and [impit](https://apify.github.io/impit/). After retrieving page content, an HTML parsing library is typically used to facilitate data extraction. Examples of such libraries include [beautifulsoup](https://pypi.org/project/beautifulsoup4/), [parsel](https://pypi.org/project/parsel/), [selectolax](https://pypi.org/project/selectolax/), and [pyquery](https://pypi.org/project/pyquery/). These crawlers are faster than browser-based crawlers but cannot execute client-side JavaScript.
16+
HTTP clients are utilized by HTTP-based crawlers (e.g., <ApiLink to="class/ParselCrawler">`ParselCrawler`</ApiLink> and <ApiLink to="class/BeautifulSoupCrawler">`BeautifulSoupCrawler`</ApiLink>) to communicate with web servers. They use external HTTP libraries for communication rather than a browser. Examples of such libraries include [httpx](https://pypi.org/project/httpx/), [aiohttp](https://pypi.org/project/aiohttp/), [curl-cffi](https://pypi.org/project/curl-cffi/), and [impit](https://pypi.org/project/impit/). After retrieving page content, an HTML parsing library is typically used to facilitate data extraction. Examples of such libraries include [beautifulsoup](https://pypi.org/project/beautifulsoup4/), [parsel](https://pypi.org/project/parsel/), [selectolax](https://pypi.org/project/selectolax/), and [pyquery](https://pypi.org/project/pyquery/). These crawlers are faster than browser-based crawlers but cannot execute client-side JavaScript.
1717

1818
```mermaid
1919
---

docs/introduction/02_first_crawler.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ When you run this code, you'll see exactly the same output as with the earlier,
8686

8787
:::info
8888

89-
This method not only makes the code shorter, it will help with performance too! Internally it calls <ApiLink to="class/RequestQueue#add_requests_batched">`RequestQueue.add_requests_batched`</ApiLink> method. It will wait only for the initial batch of 1000 requests to be added to the queue before resolving, which means the processing will start almost instantly. After that, it will continue adding the rest of the requests in the background (again, in batches of 1000 items, once every second).
89+
This method not only makes the code shorter, it will help with performance too! Internally it calls <ApiLink to="class/RequestQueue#add_requests">`RequestQueue.add_requests`</ApiLink> method. It will wait only for the initial batch of 1000 requests to be added to the queue before resolving, which means the processing will start almost instantly. After that, it will continue adding the rest of the requests in the background (again, in batches of 1000 items, once every second).
9090

9191
:::
9292

docs/upgrading/upgrading_to_v1.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ HeaderGeneratorOptions(browsers=['safari'])
3434

3535
## New default HTTP client
3636

37-
Crawlee v1.0 now uses `ImpitHttpClient` (based on [impit](https://apify.github.io/impit/) library) as the **default HTTP client**, replacing `HttpxHttpClient` (based on [httpx](https://www.python-httpx.org/) library).
37+
Crawlee v1.0 now uses `ImpitHttpClient` (based on [impit](https://github.com/apify/impit) library) as the **default HTTP client**, replacing `HttpxHttpClient` (based on [httpx](https://www.python-httpx.org/) library).
3838

3939
If you want to keep using `HttpxHttpClient`, install Crawlee with `httpx` extra, e.g. using pip:
4040

website/versioned_docs/version-0.6/deployment/google_cloud_run.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ GCP Cloud Run allows you to deploy using Docker containers, giving you full cont
1717

1818
## Preparing the project
1919

20-
We'll prepare our project using [Litestar](https://litestar.dev/) and the [Uvicorn](https://www.uvicorn.org/) web server. The HTTP server handler will wrap the crawler to communicate with clients. Because the Cloud Run platform sees only an opaque Docker container, we have to take care of this bit ourselves.
20+
We'll prepare our project using [Litestar](https://litestar.dev/) and the [Uvicorn](https://uvicorn.dev/) web server. The HTTP server handler will wrap the crawler to communicate with clients. Because the Cloud Run platform sees only an opaque Docker container, we have to take care of this bit ourselves.
2121

2222
:::info
2323

website/versioned_docs/version-0.6/examples/add_data_to_dataset.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ import BeautifulSoupExample from '!!raw-loader!roa-loader!./code_examples/add_da
1212
import PlaywrightExample from '!!raw-loader!roa-loader!./code_examples/add_data_to_dataset_pw.py';
1313
import DatasetExample from '!!raw-loader!roa-loader!./code_examples/add_data_to_dataset_dataset.py';
1414

15-
This example demonstrates how to store extracted data into datasets using the <ApiLink to="class/PushDataFunction#open">`context.push_data`</ApiLink> helper function. If the specified dataset does not already exist, it will be created automatically. Additionally, you can save data to custom datasets by providing `dataset_id` or `dataset_name` parameters to the <ApiLink to="class/PushDataFunction#open">`push_data`</ApiLink> function.
15+
This example demonstrates how to store extracted data into datasets using the <ApiLink to="class/PushDataFunction">`context.push_data`</ApiLink> helper function. If the specified dataset does not already exist, it will be created automatically. Additionally, you can save data to custom datasets by providing `dataset_id` or `dataset_name` parameters to the <ApiLink to="class/PushDataFunction">`push_data`</ApiLink> function.
1616

1717
<Tabs groupId="main">
1818
<TabItem value="BeautifulSoupCrawler" label="BeautifulSoupCrawler">

website/versioned_docs/version-0.6/introduction/01_setting_up.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ First, ensure you have Pipx installed. You can check if Pipx is installed by run
111111
pipx --version
112112
```
113113

114-
If Pipx is not installed, follow the official [installation guide](https://pipx.pypa.io/stable/installation/).
114+
If Pipx is not installed, follow the official [installation guide](https://pipx.pypa.io/stable/).
115115

116116
Then, run the Crawlee CLI using Pipx and choose from the available templates:
117117

website/versioned_docs/version-1.7/deployment/google_cloud_run.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ GCP Cloud Run allows you to deploy using Docker containers, giving you full cont
1717

1818
## Preparing the project
1919

20-
We'll prepare our project using [Litestar](https://litestar.dev/) and the [Uvicorn](https://www.uvicorn.org/) web server. The HTTP server handler will wrap the crawler to communicate with clients. Because the Cloud Run platform sees only an opaque Docker container, we have to take care of this bit ourselves.
20+
We'll prepare our project using [Litestar](https://litestar.dev/) and the [Uvicorn](https://uvicorn.dev/) web server. The HTTP server handler will wrap the crawler to communicate with clients. Because the Cloud Run platform sees only an opaque Docker container, we have to take care of this bit ourselves.
2121

2222
:::info
2323

0 commit comments

Comments
 (0)