fix(xref): preserve original case for dfn terms by marcoscaceres · Pull Request #508 · speced/respec-web-services

marcoscaceres · 2026-04-29T23:30:19Z

The scraper was lowercasing all dfn-type terms during indexing (normalizeTerm() line 211), destroying camelCase. For example, processResponseConsumeBody from the Fetch spec was stored as processresponseconsumebody. Webref has the correct casing; our scraper was the one destroying it.

This broke the xref search UI display and the cite syntax for spec authors using [=processResponseConsumeBody=].

Changes:

Remove the if (type === "dfn") return term.toLowerCase() branch from normalizeTerm() in the scraper. Terms are now stored with their original case from webref.
Add a byTermLower index (Map<lowercase, original_keys[]>) in Store.fill() for O(1) case-insensitive lookups.
filterByTerm() tries direct lookup first (fast path for exact-case IDL terms), then falls back to the lowercase index. No behavior change for IDL queries; concept/dfn queries now find camelCase terms via case-insensitive matching.
Export buildTermLowerIndex from store.ts, import in test (no duplication).

Behavioral note: Untyped queries (no types filter) that previously returned empty for non-exact-case terms now return case-insensitive matches. For example, search({ term: "baseLine" }) previously returned [] but now returns both the dfn and interface entries. This is intentional: the broadened matching is the core purpose of the fix.

Requires re-running the scraper (pnpm update-data-sources) after deploy to regenerate xref.json with preserved casing.

@sidvishnoi

* feat(xref): add headings lookup API for cross-spec section links Extends the xref infrastructure to read and serve section heading data from WebRef's ed/headings/ directory. This enables ReSpec's [[[SPEC#id]]] syntax to display actual heading text instead of just the spec title. New endpoint: POST /xref/headings Request: { queries: [{ spec: 'fetch', id: 'cookie-header' }] } Response: { result: [{ spec, id, title, number, href, level, specTitle }] } Changes: - scraper.ts: reads ed/headings/*.json during update, writes headings.json - store.ts: loads headings data, adds getHeading(spec, id) lookup method - headings.post.ts: new route handler - index.ts: registers the /xref/headings POST endpoint - update.ts: also triggers on ed/headings/ changes in webref webhook * fix: address Copilot review feedback on headings API - store.ts: index headings by id for O(1) lookup instead of linear scan; precompute specTitleByShortname map; split readJson into required/optional variants so missing xref.json still throws - scraper.ts: fix doc comment on readAllHeadings - headings.post.ts: validate queries is an array before mapping * refactor(xref): address reviewer feedback on headings API - Move `readdir` to top-level fs/promises import in scraper.ts - Add generic type `readJSON<T>` with HeadingsJSON interface to reduce `any` - Use typed readJSON<T> calls in getAllData and readAllHeadings - Change specTitleByShortname to Map<string, string> in store.ts - Add per-item validation (spec/id are strings) in headings.post.ts Agent-Logs-Url: https://github.com/speced/respec-web-services/sessions/e14c5758-5d6f-4324-bb98-77839d96f660 * Apply suggestion from @sidvishnoi Co-authored-by: Sid Vishnoi <8426945+sidvishnoi@users.noreply.github.com> * Update routes/xref/index.ts Co-authored-by: Sid Vishnoi <8426945+sidvishnoi@users.noreply.github.com> * fix(xref/headings): fix specTitleMap, add query limit, harden error handling - Fix buildSpecTitleMap to iterate nested specmap structure correctly (was iterating top-level {current, snapshot} objects instead of entries) - Add 1000-query limit and empty-string validation to headings endpoint - Only catch ENOENT in readJsonOptional (was swallowing all errors) - Remove dead spec?.shortname branch from scraper (webref doesn't include it) - Update JSDoc to reflect actual route path (/xref/search/headings) * refactor(xref/headings): pre-index headings by ID at scrape time Per sidvishnoi's review: index headings by ID during scraping rather than at store load time. Removes the redundant indexHeadings() runtime function. * fix(xref/headings): trim inputs, add version/series fallback in getHeading Per Copilot review: - Trim spec and id before lookup (validation checked trim but passed raw) - Add version-stripped and series-shortname fallback in getHeading so both "cssom-view-1" and "cssom-view" resolve headings correctly - Also resolves specTitle via stripped form as fallback * fix: correct specmap type to match nested JSON structure The specmap type previously described a flat map, but the actual data is nested with current/snapshot groups. This caused TS2352 errors after rebasing onto main's stricter type checking. * fix: address Sid's review comments on PR #469 - Extract heading resolution fallback into resolveHeadings() method - Remove unnecessary generator (specmapEntries) — iterate directly - Use separate res.status()/res.json() calls instead of chaining --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Sid Vishnoi <8426945+sidvishnoi@users.noreply.github.com>

The scraper was lowercasing all dfn-type terms during indexing, destroying camelCase (e.g., processResponseConsumeBody became processresponseconsumebody). Webref has the correct casing. Changes: - Remove dfn lowercasing from normalizeTerm() in scraper - Add byTermLower index in Store for case-insensitive lookups - Use case-insensitive fallback in filterByTerm() when direct lookup misses (preserves fast path for exact-case IDL terms) - Update tests to reflect case-insensitive search behavior

The refactored sh.ts was rejecting with plain objects instead of Error instances, breaking instanceof Error narrowing in callers. Restores the original pattern: Object.assign(err, {...}) for spawn errors and new Error(...) for non-zero exit codes. Also removes unrelated files accidentally included in this branch.

Copilot

Pull request overview

This PR updates the xref data/search pipeline so dfn terms keep the original casing supplied by Webref, with search using a secondary lowercase index for case-insensitive lookup. It also includes a small utils/sh error-handling change alongside the xref work.

Changes:

Stop lowercasing dfn terms during xref scraping so camelCase terms survive into stored data.
Add a lowercase lookup index in the xref store and use it as a fallback in search, with tests updated to cover the new behavior.
Adjust shell command handling to reject on child-process errors and wait for close instead of exit.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
utils/sh.ts	Updates child-process completion/error handling for shell commands.
tests/routes/xref/lib/search.test.js	Revises search expectations for case-insensitive xref term lookup.
routes/xref/lib/store.ts	Adds and exports the lowercase term index builder used by the store/tests.
routes/xref/lib/search.ts	Implements direct term lookup plus lowercase fallback matching.
routes/xref/lib/scraper.ts	Removes dfn term lowercasing so scraped terms retain original case.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

WebIDL identifiers are case-sensitive, so "baseLine" must not match "Baseline" when types include an IDL type. The fallback now only fires when the query has no IDL types. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When the case-insensitive fallback fires in filterByTerm, each returned entry is now tagged with the canonical term it was indexed under. This lets the xref UI build correct cite syntax (e.g., [=baseline=] and {{Baseline}}) instead of using the user's potentially miscased input (e.g., [=baseLine=]). Changes: - Add optional `term` field to DataEntry - Clone entries in filterByTerm fallback path with canonical term - Update xref UI to use entry.term for citation building - Add test verifying term presence on fallback hits

marcoscaceres · 2026-05-07T06:47:50Z

@copilot resolve the merge conflicts in this pull request

# Conflicts: # tests/routes/xref/lib/search.test.js # utils/sh.ts Co-authored-by: marcoscaceres <870154+marcoscaceres@users.noreply.github.com>

Copilot · 2026-05-07T06:54:50Z

@copilot resolve the merge conflicts in this pull request

Resolved in commit 7b7ecc2. Two conflicts were present:

utils/sh.ts: adopted main's Promise.withResolvers() refactor (same behavior as my branch's new Promise approach)
tests/routes/xref/lib/search.test.js: kept both new test cases from both branches (canonical term on fallback hits + element-type case preservation)

All 127 tests pass after the merge.

…oncept #508 preserved original dfn term case, splitting "URL" and "url" into separate keys in the store. A bare [=URL=] lowercases to "url", which matched only the for-scoped basic URL parser variable and got filtered out by the no-for context check, never falling back to the canonical URL concept — breaking the link on every spec. filter() now tries the exact-case bucket first, then falls back to the case-insensitive variants when it yields nothing after filtering.

…oncept (#533) #508 preserved original dfn term case, splitting "URL" and "url" into separate keys in the store. A bare [=URL=] lowercases to "url", which matched only the for-scoped basic URL parser variable and got filtered out by the no-for context check, never falling back to the canonical URL concept — breaking the link on every spec. filter() now tries the exact-case bucket first, then falls back to the case-insensitive variants when it yields nothing after filtering.

marcoscaceres and others added 2 commits April 28, 2026 21:07

marcoscaceres requested a review from sidvishnoi April 29, 2026 23:30

marcoscaceres added 2 commits April 30, 2026 09:45

refactor: export buildTermLowerIndex, deduplicate test helper

db3a56c

marcoscaceres requested a review from Copilot May 3, 2026 11:13

Copilot started reviewing on behalf of marcoscaceres May 3, 2026 11:13 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

Comment thread routes/xref/lib/search.ts Outdated

Comment thread routes/xref/lib/search.ts Outdated

marcoscaceres and others added 2 commits May 3, 2026 21:26

Copilot started work on behalf of marcoscaceres May 7, 2026 06:48 View session

Merge remote-tracking branch 'origin/main' into fix/dfn-term-casing

7b7ecc2

# Conflicts: # tests/routes/xref/lib/search.test.js # utils/sh.ts Co-authored-by: marcoscaceres <870154+marcoscaceres@users.noreply.github.com>

Copilot finished work on behalf of marcoscaceres May 7, 2026 06:55

Merge branch 'main' into fix/dfn-term-casing

8abe74f

sidvishnoi approved these changes Jun 3, 2026

View reviewed changes

marcoscaceres merged commit d8327aa into main Jun 3, 2026
3 checks passed

marcoscaceres deleted the fix/dfn-term-casing branch June 3, 2026 14:57

marcoscaceres mentioned this pull request Jul 2, 2026

fix(xref): don't let a for-scoped lowercase term shadow a canonical concept #533

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(xref): preserve original case for dfn terms#508

fix(xref): preserve original case for dfn terms#508
marcoscaceres merged 8 commits into
mainfrom
fix/dfn-term-casing

marcoscaceres commented Apr 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

marcoscaceres commented May 7, 2026

Uh oh!

Copilot AI commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

marcoscaceres commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

marcoscaceres commented May 7, 2026

Uh oh!

Copilot AI commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

marcoscaceres commented Apr 29, 2026 •

edited

Loading