Skip to content

Commit 5951e05

Browse files
authored
fix(docs): apply api-nr selectors to nargo-doc pages (#23049)
## Summary Follow-up to #23042. That PR restored the overall search index from 48 records back to ~12,360 records, but the underlying goal of #22861 — making Aztec.nr API pages searchable — is still not actually working: **all 2,222 crawled `aztec-nr-api/mainnet/...` URLs emit \`0 records\` in the DocSearch summary**. ## Root cause The docsearch-scraper resolves a URL's \`selectors_key\` in [\`abstract_strategy.py\` \`get_selectors_set_key()\`](https://github.com/typesense/typesense-docsearch-scraper/blob/master/scraper/src/strategies/abstract_strategy.py): it walks \`start_urls\` in declaration order, matches each with \`re.search\` (a substring search, not a prefix anchor), and breaks on the first match. Our config listed the homepage start_url first: \`\`\`json "start_urls": [ {"url": "https://docs.aztec.network/", "page_rank": 10}, {"url": "https://docs.aztec.network/aztec-nr-api/mainnet/", "selectors_key": "api-nr", "page_rank": 2} ] \`\`\` Because \`"https://docs.aztec.network/"\` is a substring of every aztec-nr-api URL, the homepage entry always matched first — so every API URL was assigned \`selectors_key: "default"\` and the \`api-nr\` selectors were never used. The default selectors target Docusaurus markup (\`header h1\`, \`article p\`, \`menu__list ... active\` XPath); none of those nodes exist on rustdoc-style nargo-doc pages, so the scraper found nothing and emitted zero records on every API page. ## Fix Swap the order so the more-specific aztec-nr-api start_url is matched first: \`\`\`json "start_urls": [ {"url": "https://docs.aztec.network/aztec-nr-api/mainnet/", "selectors_key": "api-nr", "page_rank": 2}, {"url": "https://docs.aztec.network/", "page_rank": 10} ] \`\`\` Now \`/aztec-nr-api/mainnet/...\` URLs hit the api-nr entry; everything else falls through to the homepage entry. This also matches the standard docsearch convention of listing the most-specific URL first. ## Test plan - [ ] Manually dispatch the \`Docs Scraper\` workflow on this branch via \`workflow_dispatch\`. Confirm a non-trivial fraction of \`aztec-nr-api/mainnet/...\` lines in the DocSearch summary report \`> 0 records\`. - [ ] Confirm the overall \`Nb hits\` stays comfortably above the 5,000 threshold and ideally lands meaningfully above the previous run's 12,360. - [ ] After merge, search docs.aztec.network for an Aztec.nr identifier (e.g. \`ContractClassId\`, \`balance_set\`, \`compute_log_tag\`) and confirm the API reference pages appear in results.
2 parents 81e4f15 + 3b212ce commit 5951e05

1 file changed

Lines changed: 4 additions & 4 deletions

File tree

docs/typesense.config.json

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
{
22
"index_name": "aztec-docs",
33
"start_urls": [
4-
{
5-
"url": "https://docs.aztec.network/",
6-
"page_rank": 10
7-
},
84
{
95
"url": "https://docs.aztec.network/aztec-nr-api/mainnet/",
106
"selectors_key": "api-nr",
117
"page_rank": 2
8+
},
9+
{
10+
"url": "https://docs.aztec.network/",
11+
"page_rank": 10
1212
}
1313
],
1414
"stop_urls": [

0 commit comments

Comments
 (0)