Commit 5951e05
authored
fix(docs): apply api-nr selectors to nargo-doc pages (#23049)
## Summary
Follow-up to #23042. That PR restored the overall search index from 48
records back to ~12,360 records, but the underlying goal of #22861 —
making Aztec.nr API pages searchable — is still not actually working:
**all 2,222 crawled `aztec-nr-api/mainnet/...` URLs emit \`0 records\`
in the DocSearch summary**.
## Root cause
The docsearch-scraper resolves a URL's \`selectors_key\` in
[\`abstract_strategy.py\`
\`get_selectors_set_key()\`](https://github.com/typesense/typesense-docsearch-scraper/blob/master/scraper/src/strategies/abstract_strategy.py):
it walks \`start_urls\` in declaration order, matches each with
\`re.search\` (a substring search, not a prefix anchor), and breaks on
the first match.
Our config listed the homepage start_url first:
\`\`\`json
"start_urls": [
{"url": "https://docs.aztec.network/", "page_rank": 10},
{"url": "https://docs.aztec.network/aztec-nr-api/mainnet/",
"selectors_key": "api-nr", "page_rank": 2}
]
\`\`\`
Because \`"https://docs.aztec.network/"\` is a substring of every
aztec-nr-api URL, the homepage entry always matched first — so every API
URL was assigned \`selectors_key: "default"\` and the \`api-nr\`
selectors were never used.
The default selectors target Docusaurus markup (\`header h1\`, \`article
p\`, \`menu__list ... active\` XPath); none of those nodes exist on
rustdoc-style nargo-doc pages, so the scraper found nothing and emitted
zero records on every API page.
## Fix
Swap the order so the more-specific aztec-nr-api start_url is matched
first:
\`\`\`json
"start_urls": [
{"url": "https://docs.aztec.network/aztec-nr-api/mainnet/",
"selectors_key": "api-nr", "page_rank": 2},
{"url": "https://docs.aztec.network/", "page_rank": 10}
]
\`\`\`
Now \`/aztec-nr-api/mainnet/...\` URLs hit the api-nr entry; everything
else falls through to the homepage entry. This also matches the standard
docsearch convention of listing the most-specific URL first.
## Test plan
- [ ] Manually dispatch the \`Docs Scraper\` workflow on this branch via
\`workflow_dispatch\`. Confirm a non-trivial fraction of
\`aztec-nr-api/mainnet/...\` lines in the DocSearch summary report \`> 0
records\`.
- [ ] Confirm the overall \`Nb hits\` stays comfortably above the 5,000
threshold and ideally lands meaningfully above the previous run's
12,360.
- [ ] After merge, search docs.aztec.network for an Aztec.nr identifier
(e.g. \`ContractClassId\`, \`balance_set\`, \`compute_log_tag\`) and
confirm the API reference pages appear in results.1 file changed
Lines changed: 4 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
5 | | - | |
6 | | - | |
7 | | - | |
8 | 4 | | |
9 | 5 | | |
10 | 6 | | |
11 | 7 | | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| |||
0 commit comments