Skip to content

fix(xref): preserve original case for dfn terms#508

Open
marcoscaceres wants to merge 7 commits intomainfrom
fix/dfn-term-casing
Open

fix(xref): preserve original case for dfn terms#508
marcoscaceres wants to merge 7 commits intomainfrom
fix/dfn-term-casing

Conversation

@marcoscaceres
Copy link
Copy Markdown
Collaborator

@marcoscaceres marcoscaceres commented Apr 29, 2026

The scraper was lowercasing all dfn-type terms during indexing (normalizeTerm() line 211), destroying camelCase. For example, processResponseConsumeBody from the Fetch spec was stored as processresponseconsumebody. Webref has the correct casing; our scraper was the one destroying it.

This broke the xref search UI display and the cite syntax for spec authors using [=processResponseConsumeBody=].

Changes:

  • Remove the if (type === "dfn") return term.toLowerCase() branch from normalizeTerm() in the scraper. Terms are now stored with their original case from webref.
  • Add a byTermLower index (Map<lowercase, original_keys[]>) in Store.fill() for O(1) case-insensitive lookups.
  • filterByTerm() tries direct lookup first (fast path for exact-case IDL terms), then falls back to the lowercase index. No behavior change for IDL queries; concept/dfn queries now find camelCase terms via case-insensitive matching.
  • Export buildTermLowerIndex from store.ts, import in test (no duplication).

Behavioral note: Untyped queries (no types filter) that previously returned empty for non-exact-case terms now return case-insensitive matches. For example, search({ term: "baseLine" }) previously returned [] but now returns both the dfn and interface entries. This is intentional: the broadened matching is the core purpose of the fix.

Requires re-running the scraper (pnpm update-data-sources) after deploy to regenerate xref.json with preserved casing.

marcoscaceres and others added 2 commits April 28, 2026 21:07
* feat(xref): add headings lookup API for cross-spec section links

Extends the xref infrastructure to read and serve section heading data
from WebRef's ed/headings/ directory. This enables ReSpec's [[[SPEC#id]]]
syntax to display actual heading text instead of just the spec title.

New endpoint: POST /xref/headings
Request:  { queries: [{ spec: 'fetch', id: 'cookie-header' }] }
Response: { result: [{ spec, id, title, number, href, level, specTitle }] }

Changes:
- scraper.ts: reads ed/headings/*.json during update, writes headings.json
- store.ts: loads headings data, adds getHeading(spec, id) lookup method
- headings.post.ts: new route handler
- index.ts: registers the /xref/headings POST endpoint
- update.ts: also triggers on ed/headings/ changes in webref webhook

* fix: address Copilot review feedback on headings API

- store.ts: index headings by id for O(1) lookup instead of linear scan;
  precompute specTitleByShortname map; split readJson into required/optional
  variants so missing xref.json still throws
- scraper.ts: fix doc comment on readAllHeadings
- headings.post.ts: validate queries is an array before mapping

* refactor(xref): address reviewer feedback on headings API

- Move `readdir` to top-level fs/promises import in scraper.ts
- Add generic type `readJSON<T>` with HeadingsJSON interface to reduce `any`
- Use typed readJSON<T> calls in getAllData and readAllHeadings
- Change specTitleByShortname to Map<string, string> in store.ts
- Add per-item validation (spec/id are strings) in headings.post.ts

Agent-Logs-Url: https://github.com/speced/respec-web-services/sessions/e14c5758-5d6f-4324-bb98-77839d96f660

* Apply suggestion from @sidvishnoi

Co-authored-by: Sid Vishnoi <8426945+sidvishnoi@users.noreply.github.com>

* Update routes/xref/index.ts

Co-authored-by: Sid Vishnoi <8426945+sidvishnoi@users.noreply.github.com>

* fix(xref/headings): fix specTitleMap, add query limit, harden error handling

- Fix buildSpecTitleMap to iterate nested specmap structure correctly
  (was iterating top-level {current, snapshot} objects instead of entries)
- Add 1000-query limit and empty-string validation to headings endpoint
- Only catch ENOENT in readJsonOptional (was swallowing all errors)
- Remove dead spec?.shortname branch from scraper (webref doesn't include it)
- Update JSDoc to reflect actual route path (/xref/search/headings)

* refactor(xref/headings): pre-index headings by ID at scrape time

Per sidvishnoi's review: index headings by ID during scraping rather than
at store load time. Removes the redundant indexHeadings() runtime function.

* fix(xref/headings): trim inputs, add version/series fallback in getHeading

Per Copilot review:
- Trim spec and id before lookup (validation checked trim but passed raw)
- Add version-stripped and series-shortname fallback in getHeading so
  both "cssom-view-1" and "cssom-view" resolve headings correctly
- Also resolves specTitle via stripped form as fallback

* fix: correct specmap type to match nested JSON structure

The specmap type previously described a flat map, but the actual data
is nested with current/snapshot groups. This caused TS2352 errors
after rebasing onto main's stricter type checking.

* fix: address Sid's review comments on PR #469

- Extract heading resolution fallback into resolveHeadings() method
- Remove unnecessary generator (specmapEntries) — iterate directly
- Use separate res.status()/res.json() calls instead of chaining

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Sid Vishnoi <8426945+sidvishnoi@users.noreply.github.com>
The scraper was lowercasing all dfn-type terms during indexing,
destroying camelCase (e.g., processResponseConsumeBody became
processresponseconsumebody). Webref has the correct casing.

Changes:
- Remove dfn lowercasing from normalizeTerm() in scraper
- Add byTermLower index in Store for case-insensitive lookups
- Use case-insensitive fallback in filterByTerm() when direct
  lookup misses (preserves fast path for exact-case IDL terms)
- Update tests to reflect case-insensitive search behavior
The refactored sh.ts was rejecting with plain objects instead of Error
instances, breaking instanceof Error narrowing in callers. Restores the
original pattern: Object.assign(err, {...}) for spawn errors and
new Error(...) for non-zero exit codes.

Also removes unrelated files accidentally included in this branch.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the xref data/search pipeline so dfn terms keep the original casing supplied by Webref, with search using a secondary lowercase index for case-insensitive lookup. It also includes a small utils/sh error-handling change alongside the xref work.

Changes:

  • Stop lowercasing dfn terms during xref scraping so camelCase terms survive into stored data.
  • Add a lowercase lookup index in the xref store and use it as a fallback in search, with tests updated to cover the new behavior.
  • Adjust shell command handling to reject on child-process errors and wait for close instead of exit.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
utils/sh.ts Updates child-process completion/error handling for shell commands.
tests/routes/xref/lib/search.test.js Revises search expectations for case-insensitive xref term lookup.
routes/xref/lib/store.ts Adds and exports the lowercase term index builder used by the store/tests.
routes/xref/lib/search.ts Implements direct term lookup plus lowercase fallback matching.
routes/xref/lib/scraper.ts Removes dfn term lowercasing so scraped terms retain original case.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread routes/xref/lib/search.ts Outdated
Comment thread routes/xref/lib/search.ts Outdated
marcoscaceres and others added 2 commits May 3, 2026 21:26
WebIDL identifiers are case-sensitive, so "baseLine" must not
match "Baseline" when types include an IDL type. The fallback
now only fires when the query has no IDL types.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the case-insensitive fallback fires in filterByTerm, each returned
entry is now tagged with the canonical term it was indexed under. This
lets the xref UI build correct cite syntax (e.g., [=baseline=] and
{{Baseline}}) instead of using the user's potentially miscased input
(e.g., [=baseLine=]).

Changes:
- Add optional `term` field to DataEntry
- Clone entries in filterByTerm fallback path with canonical term
- Update xref UI to use entry.term for citation building
- Add test verifying term presence on fallback hits
@marcoscaceres
Copy link
Copy Markdown
Collaborator Author

@copilot resolve the merge conflicts in this pull request

# Conflicts:
#	tests/routes/xref/lib/search.test.js
#	utils/sh.ts

Co-authored-by: marcoscaceres <870154+marcoscaceres@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented May 7, 2026

@copilot resolve the merge conflicts in this pull request

Resolved in commit 7b7ecc2. Two conflicts were present:

  • utils/sh.ts: adopted main's Promise.withResolvers() refactor (same behavior as my branch's new Promise approach)
  • tests/routes/xref/lib/search.test.js: kept both new test cases from both branches (canonical term on fallback hits + element-type case preservation)

All 127 tests pass after the merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants