Add commands and logic to combine json apis from all blazor dlls for astro api docs#12
Add commands and logic to combine json apis from all blazor dlls for astro api docs#12MayaKirova wants to merge 15 commits intomasterfrom
Conversation
…odel + handing to merge model in 1 file.
| return value | ||
| .replace(/&/g, '&') |
Check failure
Code scanning / CodeQL
Double escaping or unescaping High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 12 days ago
In general, to fix double escaping/unescaping you need to ensure each piece of data is escaped or unescaped at most once, and that the escape character (& for HTML entities) is handled in the correct order relative to other entities. In this file, unescaping of HTML entities already happens in parseFile for "text" values that can contain HTML. Doing a second, broad unescape in the JSON.stringify replacer risks double-unescaping and is unrelated to keeping the JSON itself valid (JSON does not require & to be escaped).
The best fix here, without changing the intended functionality, is to stop unescaping HTML entities in the final JSON.stringify replacer. The toc structure is already valid JSON; JSON.stringify will escape control characters as needed. Removing the unescape logic eliminates the possibility of double-unescaping while leaving the structure and non-HTML-related behavior intact. Concretely:
- In
scripts/expand-toc.js, lines 72–80 define a custom replacer that, for every string value, performs.replace(/&/g, '&')and similar replacements. - Replace this with a plain
JSON.stringify(toc, null, 2)(orJSON.stringify(toc, null, 2)which preserves pretty-printing) so that no additional entity decoding happens at output time. - No new imports or helper methods are needed.
| @@ -69,16 +69,6 @@ | ||
| } | ||
|
|
||
| const DIST_PATH = path.join(distDir, toc.name + '.json'); | ||
| const output = JSON.stringify(toc, (key, value) => { | ||
| if (typeof value === 'string') { | ||
| return value | ||
| .replace(/&/g, '&') | ||
| .replace(/</g, '<') | ||
| .replace(/>/g, '>') | ||
| .replace(/"/g, '"') | ||
| .replace(/'/g, "'"); | ||
| } | ||
| return value; | ||
| }, 2); | ||
| const output = JSON.stringify(toc, null, 2); | ||
| fs.writeFileSync(DIST_PATH, output, 'utf-8'); | ||
| console.log(`Expanded ${DIST_PATH} — all hrefs inlined.`); |
| fileContent = fileContent.replace(/"text":\s*"((?:[^"\\]|\\.)*)"/g, (match, val) => { | ||
| let clean = val; | ||
| // Decode HTML entities so we can strip the resulting HTML tags | ||
| clean = clean.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>').replace(/"/g, '"').replace(/'/g, "'"); |
Check failure
Code scanning / CodeQL
Double escaping or unescaping High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 12 days ago
In general, to avoid double unescaping, HTML entities should be decoded in one well-defined place, not in multiple passes scattered through the code. Here, there are two decoding points: one inside parseFile (line 16) and another in the JSON.stringify replacer (lines 75–79). The cleanest fix that preserves existing behavior is to stop decoding entities inside parseFile and rely on the final replacer to handle entity decoding for all strings uniformly. parseFile only needs to strip HTML tags and clean up whitespace; it does not need to turn entities into characters to do that.
Concretely, in scripts/expand-toc.js, inside parseFile’s "text"-specific replacement (lines 13–24), remove the line that decodes &, <, >, ", and ' into raw characters. clean should start as val, then have HTML tags stripped directly (they are already literal <...> tags in the JSON source) and whitespace normalized, then quotes escaped for JSON. No other files need changes, and no new helpers or imports are needed.
| @@ -12,8 +12,6 @@ | ||
| // to avoid unescaped quotes/attributes breaking JSON | ||
| fileContent = fileContent.replace(/"text":\s*"((?:[^"\\]|\\.)*)"/g, (match, val) => { | ||
| let clean = val; | ||
| // Decode HTML entities so we can strip the resulting HTML tags | ||
| clean = clean.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>').replace(/"/g, '"').replace(/'/g, "'"); | ||
| // Strip HTML tags | ||
| clean = clean.replace(/<[^>]*>/g, ''); | ||
| // Collapse whitespace and escaped newlines |
| // Decode HTML entities so we can strip the resulting HTML tags | ||
| clean = clean.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>').replace(/"/g, '"').replace(/'/g, "'"); | ||
| // Strip HTML tags | ||
| clean = clean.replace(/<[^>]*>/g, ''); |
Check failure
Code scanning / CodeQL
Incomplete multi-character sanitization High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 12 days ago
In general terms, the problem arises because we decode HTML entities into <, >, ", and ', then attempt to remove HTML tags using a regex that may not handle all corner cases and can be sensitive to multi-character interactions. To remove the risk that unsafe HTML fragments like <script survive, we should ensure that the final "text" content we inject into JSON never contains raw < or > characters at all. Since comments say we only need plain text (“Strip HTML tags from "text" values ...”), removing all </> is acceptable and preserves semantics.
The best minimal fix within this file is to add an extra sanitization step after stripping tags: replace any remaining < or > characters in clean with an empty string (or some safe replacement such as a space). This guarantees that no <script sequence – or any HTML tag opener – can survive in the final clean string, even if the earlier tag-stripping regex misses some patterns. Concretely, in parseFile’s callback at lines 13–24, after line 18 (clean = clean.replace(/<[^>]*>/g, '');), we add a new line clean = clean.replace(/[<>]/g, '');. No new imports or external libraries are needed; this is simple string manipulation.
| @@ -16,6 +16,8 @@ | ||
| clean = clean.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>').replace(/"/g, '"').replace(/'/g, "'"); | ||
| // Strip HTML tags | ||
| clean = clean.replace(/<[^>]*>/g, ''); | ||
| // Ensure no raw angle brackets remain that could start HTML tags | ||
| clean = clean.replace(/[<>]/g, ''); | ||
| // Collapse whitespace and escaped newlines | ||
| clean = clean.replace(/\\n/g, ' ').replace(/\s+/g, ' ').trim(); | ||
| // Escape any double quotes for valid JSON |
| // Collapse whitespace and escaped newlines | ||
| clean = clean.replace(/\\n/g, ' ').replace(/\s+/g, ' ').trim(); | ||
| // Escape any double quotes for valid JSON | ||
| clean = clean.replace(/"/g, '\\"'); |
Check failure
Code scanning / CodeQL
Incomplete string escaping or encoding High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 12 days ago
In general, the robust way to fix this is to avoid hand-rolling JSON string escaping and instead use JSON.stringify on the string value you want to insert, then splice that encoded fragment into your larger JSON text. JSON.stringify correctly escapes backslashes, double quotes, and other control characters per the JSON spec, avoiding the incomplete escaping problem.
For this specific code, we can keep the current parsing logic but change the "text" replacement so that, after computing the cleaned clean string, we no longer manually replace " with \". Instead, we should call JSON.stringify(clean) and use the resulting value directly, which will include surrounding quotes and properly-escaped contents. Concretely, within parseFile in scripts/expand-toc.js, inside the fileContent.replace(/"text":\s*"((?:[^"\\]|\\.)*)"/g, ...) callback:
- Remove the line
clean = clean.replace(/"/g, '\\"');. - Change the
returnfromreturn \"text": "${clean}"`;toreturn `"text": ${JSON.stringify(clean)}`;`.
This change leaves the existing trimming, HTML-tag stripping, and whitespace collapsing logic intact, but delegates the final JSON escaping step to the built-in JSON engine, which correctly escapes backslashes and any other required characters.
No new imports are needed since JSON is global in Node.js.
| @@ -18,9 +18,8 @@ | ||
| clean = clean.replace(/<[^>]*>/g, ''); | ||
| // Collapse whitespace and escaped newlines | ||
| clean = clean.replace(/\\n/g, ' ').replace(/\s+/g, ' ').trim(); | ||
| // Escape any double quotes for valid JSON | ||
| clean = clean.replace(/"/g, '\\"'); | ||
| return `"text": "${clean}"`; | ||
| // Use JSON.stringify to produce a valid JSON string literal (escapes backslashes, quotes, etc.) | ||
| return `"text": ${JSON.stringify(clean)}`; | ||
| }); | ||
| return JSON.parse(fileContent); | ||
| } catch (e) { |
No description provided.