fix(xref): preserve original case for dfn terms#508
Open
marcoscaceres wants to merge 7 commits intomainfrom
Open
fix(xref): preserve original case for dfn terms#508marcoscaceres wants to merge 7 commits intomainfrom
marcoscaceres wants to merge 7 commits intomainfrom
Conversation
* feat(xref): add headings lookup API for cross-spec section links
Extends the xref infrastructure to read and serve section heading data
from WebRef's ed/headings/ directory. This enables ReSpec's [[[SPEC#id]]]
syntax to display actual heading text instead of just the spec title.
New endpoint: POST /xref/headings
Request: { queries: [{ spec: 'fetch', id: 'cookie-header' }] }
Response: { result: [{ spec, id, title, number, href, level, specTitle }] }
Changes:
- scraper.ts: reads ed/headings/*.json during update, writes headings.json
- store.ts: loads headings data, adds getHeading(spec, id) lookup method
- headings.post.ts: new route handler
- index.ts: registers the /xref/headings POST endpoint
- update.ts: also triggers on ed/headings/ changes in webref webhook
* fix: address Copilot review feedback on headings API
- store.ts: index headings by id for O(1) lookup instead of linear scan;
precompute specTitleByShortname map; split readJson into required/optional
variants so missing xref.json still throws
- scraper.ts: fix doc comment on readAllHeadings
- headings.post.ts: validate queries is an array before mapping
* refactor(xref): address reviewer feedback on headings API
- Move `readdir` to top-level fs/promises import in scraper.ts
- Add generic type `readJSON<T>` with HeadingsJSON interface to reduce `any`
- Use typed readJSON<T> calls in getAllData and readAllHeadings
- Change specTitleByShortname to Map<string, string> in store.ts
- Add per-item validation (spec/id are strings) in headings.post.ts
Agent-Logs-Url: https://github.com/speced/respec-web-services/sessions/e14c5758-5d6f-4324-bb98-77839d96f660
* Apply suggestion from @sidvishnoi
Co-authored-by: Sid Vishnoi <8426945+sidvishnoi@users.noreply.github.com>
* Update routes/xref/index.ts
Co-authored-by: Sid Vishnoi <8426945+sidvishnoi@users.noreply.github.com>
* fix(xref/headings): fix specTitleMap, add query limit, harden error handling
- Fix buildSpecTitleMap to iterate nested specmap structure correctly
(was iterating top-level {current, snapshot} objects instead of entries)
- Add 1000-query limit and empty-string validation to headings endpoint
- Only catch ENOENT in readJsonOptional (was swallowing all errors)
- Remove dead spec?.shortname branch from scraper (webref doesn't include it)
- Update JSDoc to reflect actual route path (/xref/search/headings)
* refactor(xref/headings): pre-index headings by ID at scrape time
Per sidvishnoi's review: index headings by ID during scraping rather than
at store load time. Removes the redundant indexHeadings() runtime function.
* fix(xref/headings): trim inputs, add version/series fallback in getHeading
Per Copilot review:
- Trim spec and id before lookup (validation checked trim but passed raw)
- Add version-stripped and series-shortname fallback in getHeading so
both "cssom-view-1" and "cssom-view" resolve headings correctly
- Also resolves specTitle via stripped form as fallback
* fix: correct specmap type to match nested JSON structure
The specmap type previously described a flat map, but the actual data
is nested with current/snapshot groups. This caused TS2352 errors
after rebasing onto main's stricter type checking.
* fix: address Sid's review comments on PR #469
- Extract heading resolution fallback into resolveHeadings() method
- Remove unnecessary generator (specmapEntries) — iterate directly
- Use separate res.status()/res.json() calls instead of chaining
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Sid Vishnoi <8426945+sidvishnoi@users.noreply.github.com>
The scraper was lowercasing all dfn-type terms during indexing, destroying camelCase (e.g., processResponseConsumeBody became processresponseconsumebody). Webref has the correct casing. Changes: - Remove dfn lowercasing from normalizeTerm() in scraper - Add byTermLower index in Store for case-insensitive lookups - Use case-insensitive fallback in filterByTerm() when direct lookup misses (preserves fast path for exact-case IDL terms) - Update tests to reflect case-insensitive search behavior
The refactored sh.ts was rejecting with plain objects instead of Error
instances, breaking instanceof Error narrowing in callers. Restores the
original pattern: Object.assign(err, {...}) for spawn errors and
new Error(...) for non-zero exit codes.
Also removes unrelated files accidentally included in this branch.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the xref data/search pipeline so dfn terms keep the original casing supplied by Webref, with search using a secondary lowercase index for case-insensitive lookup. It also includes a small utils/sh error-handling change alongside the xref work.
Changes:
- Stop lowercasing dfn terms during xref scraping so camelCase terms survive into stored data.
- Add a lowercase lookup index in the xref store and use it as a fallback in search, with tests updated to cover the new behavior.
- Adjust shell command handling to reject on child-process errors and wait for
closeinstead ofexit.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| utils/sh.ts | Updates child-process completion/error handling for shell commands. |
| tests/routes/xref/lib/search.test.js | Revises search expectations for case-insensitive xref term lookup. |
| routes/xref/lib/store.ts | Adds and exports the lowercase term index builder used by the store/tests. |
| routes/xref/lib/search.ts | Implements direct term lookup plus lowercase fallback matching. |
| routes/xref/lib/scraper.ts | Removes dfn term lowercasing so scraped terms retain original case. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
WebIDL identifiers are case-sensitive, so "baseLine" must not match "Baseline" when types include an IDL type. The fallback now only fires when the query has no IDL types. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the case-insensitive fallback fires in filterByTerm, each returned
entry is now tagged with the canonical term it was indexed under. This
lets the xref UI build correct cite syntax (e.g., [=baseline=] and
{{Baseline}}) instead of using the user's potentially miscased input
(e.g., [=baseLine=]).
Changes:
- Add optional `term` field to DataEntry
- Clone entries in filterByTerm fallback path with canonical term
- Update xref UI to use entry.term for citation building
- Add test verifying term presence on fallback hits
Collaborator
Author
|
@copilot resolve the merge conflicts in this pull request |
# Conflicts: # tests/routes/xref/lib/search.test.js # utils/sh.ts Co-authored-by: marcoscaceres <870154+marcoscaceres@users.noreply.github.com>
Contributor
Resolved in commit 7b7ecc2. Two conflicts were present:
All 127 tests pass after the merge. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The scraper was lowercasing all dfn-type terms during indexing (
normalizeTerm()line 211), destroying camelCase. For example,processResponseConsumeBodyfrom the Fetch spec was stored asprocessresponseconsumebody. Webref has the correct casing; our scraper was the one destroying it.This broke the xref search UI display and the cite syntax for spec authors using
[=processResponseConsumeBody=].Changes:
if (type === "dfn") return term.toLowerCase()branch fromnormalizeTerm()in the scraper. Terms are now stored with their original case from webref.byTermLowerindex (Map<lowercase, original_keys[]>) inStore.fill()for O(1) case-insensitive lookups.filterByTerm()tries direct lookup first (fast path for exact-case IDL terms), then falls back to the lowercase index. No behavior change for IDL queries; concept/dfn queries now find camelCase terms via case-insensitive matching.buildTermLowerIndexfrom store.ts, import in test (no duplication).Behavioral note: Untyped queries (no
typesfilter) that previously returned empty for non-exact-case terms now return case-insensitive matches. For example,search({ term: "baseLine" })previously returned[]but now returns both the dfn and interface entries. This is intentional: the broadened matching is the core purpose of the fix.Requires re-running the scraper (
pnpm update-data-sources) after deploy to regenerate xref.json with preserved casing.