Commit 3b212ce
committed
fix(docs): apply api-nr selectors to nargo-doc pages
The follow-up to #23042: that PR fixed the indexing rate-limit problem
but every aztec-nr-api page still emitted 0 records. Root cause: the
docsearch-scraper resolves a URL's selectors_key by walking start_urls
in order and matching with `re.search` (substring), breaking on first
match. With the homepage URL listed first, every aztec-nr-api URL
matched it (since "https://docs.aztec.network/" is a substring of every
aztec-nr-api URL) and was assigned the default selectors. The default
selectors target Docusaurus-only markup (`header h1`, `article p`,
`menu__list ... active` XPath), none of which exist on rustdoc-style
nargo-doc pages, so the scraper found no nodes and emitted no records.
Fix: list the more-specific aztec-nr-api start_url first so it wins
the selectors_key match for those URLs. The homepage start_url then
serves as the catch-all for everything else.
Reference: scraper/src/strategies/abstract_strategy.py
get_selectors_set_key() iterates start_urls in declaration order and
breaks on the first re.search hit.1 parent 81e4f15 commit 3b212ce
1 file changed
Lines changed: 4 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
5 | | - | |
6 | | - | |
7 | | - | |
8 | 4 | | |
9 | 5 | | |
10 | 6 | | |
11 | 7 | | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| |||
0 commit comments