Commit 81e4f15
authored
## Summary
Search on docs.aztec.network has been broken since #22861 was merged.
The nightly Typesense docsearch-scraper run dropped from indexing
**~12,457 records to 48 records** and has stayed there.
### Root cause
Two compounding regressions from #22861:
1. **`augment_sitemap.js` blasted the scraper.** It appends every
`aztec-nr-api/mainnet/**/*.html` URL into the published `sitemap.xml`,
which the scraper then queues for crawling via `sitemap_urls`. The
previous-day baseline `sitemap.xml` had hundreds of URLs; post-PR it had
thousands. The resulting request volume tripped Netlify's rate limiter,
which started returning HTTP 403 on ~36% of responses, including every
`/developers/tags/*` page and many content pages that worked the day
before.
2. **The `api-nr` `text` selector matched nothing.** It targeted
`.comments p, .comments li, .item-description` on nargo-doc pages.
`.item-description` is empty on most auto-generated index pages, so the
scraper produced **`0 records`** for every `aztec-nr-api/mainnet/*` URL
it managed to crawl.
Evidence from the most recent nightly run: `request_count=1677, 200=814,
403=609`, `Nb hits: 48`. The previous-day baseline run was `Nb hits:
12457`. Workflow exited 0 in both cases because the docker container
exits 0 regardless.
## Fix
`docs/typesense.config.json`:
- Remove `sitemap_urls`. Keep `augment_sitemap.js` and the augmented
sitemap in place for SEO; rely on link traversal from the two
`start_urls` for indexing. This shrinks the scraper's request volume
back toward baseline.
- Drop `sitemap_alternate_links: true` (only affects sitemap-driven
crawling, which we no longer do).
- Broaden the `api-nr` `text` selector to `main .comments p, main
.comments li, main .padded-description, main .item-description, main
.struct-field, main li`. Verified against the checked-in nargo-doc HTML
in `docs/static/aztec-nr-api/mainnet/`: 465 files use `.comments`,
struct/fn pages use `.padded-description`, and module-index pages need
`main li` to surface the names of nested items.
`.github/workflows/docs-typesense.yml`:
- Capture the scraper output and fail the run if fewer than 5,000
records are indexed. The container exits 0 even when the config is
broken, which let the 48-record regression land silently and stay broken
across many nightly runs. The threshold catches the failure mode while
leaving plenty of headroom below the 12k baseline.
## Test plan
- [ ] Manually dispatch the `Docs Scraper` workflow on this branch via
`workflow_dispatch` and confirm `Nb hits` returns to baseline (>>5,000)
and the run logs no longer report a flood of 403s.
- [ ] After merge, confirm site search on https://docs.aztec.network/
returns results for common queries (e.g. `PXE`, `deploy`, `account`,
`ContractClassId`).
- [ ] Confirm Aztec.nr API entries (e.g. searching for
`ContractClassId`, `protocol_types`) now appear in search results.
2 files changed
Lines changed: 39 additions & 31 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
30 | 37 | | |
| 38 | + | |
31 | 39 | | |
32 | 40 | | |
33 | 41 | | |
34 | 42 | | |
35 | 43 | | |
36 | 44 | | |
37 | | - | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
5 | | - | |
6 | | - | |
7 | | - | |
8 | | - | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
23 | 19 | | |
24 | 20 | | |
25 | 21 | | |
| |||
45 | 41 | | |
46 | 42 | | |
47 | 43 | | |
48 | | - | |
| 44 | + | |
49 | 45 | | |
50 | 46 | | |
51 | 47 | | |
52 | 48 | | |
53 | 49 | | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
| 50 | + | |
60 | 51 | | |
61 | 52 | | |
62 | 53 | | |
| |||
66 | 57 | | |
67 | 58 | | |
68 | 59 | | |
69 | | - | |
70 | | - | |
71 | | - | |
| 60 | + | |
72 | 61 | | |
73 | | - | |
| 62 | + | |
0 commit comments