Skip to content

Add commands and logic to combine json apis from all blazor dlls for astro api docs#12

Open
MayaKirova wants to merge 15 commits intomasterfrom
mkirova/astro-json-model
Open

Add commands and logic to combine json apis from all blazor dlls for astro api docs#12
MayaKirova wants to merge 15 commits intomasterfrom
mkirova/astro-json-model

Conversation

@MayaKirova
Copy link
Copy Markdown
Contributor

No description provided.

Comment on lines +61 to +62
return value
.replace(/&/g, '&')

Check failure

Code scanning / CodeQL

Double escaping or unescaping High

This replacement may produce '&' characters that are double-unescaped
here
.

Copilot Autofix

AI 12 days ago

In general, to fix double escaping/unescaping you need to ensure each piece of data is escaped or unescaped at most once, and that the escape character (& for HTML entities) is handled in the correct order relative to other entities. In this file, unescaping of HTML entities already happens in parseFile for "text" values that can contain HTML. Doing a second, broad unescape in the JSON.stringify replacer risks double-unescaping and is unrelated to keeping the JSON itself valid (JSON does not require & to be escaped).

The best fix here, without changing the intended functionality, is to stop unescaping HTML entities in the final JSON.stringify replacer. The toc structure is already valid JSON; JSON.stringify will escape control characters as needed. Removing the unescape logic eliminates the possibility of double-unescaping while leaving the structure and non-HTML-related behavior intact. Concretely:

  • In scripts/expand-toc.js, lines 72–80 define a custom replacer that, for every string value, performs .replace(/&/g, '&') and similar replacements.
  • Replace this with a plain JSON.stringify(toc, null, 2) (or JSON.stringify(toc, null, 2) which preserves pretty-printing) so that no additional entity decoding happens at output time.
  • No new imports or helper methods are needed.
Suggested changeset 1
scripts/expand-toc.js

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/scripts/expand-toc.js b/scripts/expand-toc.js
--- a/scripts/expand-toc.js
+++ b/scripts/expand-toc.js
@@ -69,16 +69,6 @@
 }
 
 const DIST_PATH = path.join(distDir, toc.name + '.json');
-const output = JSON.stringify(toc, (key, value) => {
-    if (typeof value === 'string') {
-        return value
-            .replace(/&amp;/g, '&')
-            .replace(/&lt;/g, '<')
-            .replace(/&gt;/g, '>')
-            .replace(/&quot;/g, '"')
-            .replace(/&#39;/g, "'");
-    }
-    return value;
-}, 2);
+const output = JSON.stringify(toc, null, 2);
 fs.writeFileSync(DIST_PATH, output, 'utf-8');
 console.log(`Expanded ${DIST_PATH} — all hrefs inlined.`);
EOF
@@ -69,16 +69,6 @@
}

const DIST_PATH = path.join(distDir, toc.name + '.json');
const output = JSON.stringify(toc, (key, value) => {
if (typeof value === 'string') {
return value
.replace(/&amp;/g, '&')
.replace(/&lt;/g, '<')
.replace(/&gt;/g, '>')
.replace(/&quot;/g, '"')
.replace(/&#39;/g, "'");
}
return value;
}, 2);
const output = JSON.stringify(toc, null, 2);
fs.writeFileSync(DIST_PATH, output, 'utf-8');
console.log(`Expanded ${DIST_PATH} — all hrefs inlined.`);
Copilot is powered by AI and may make mistakes. Always verify output.
fileContent = fileContent.replace(/"text":\s*"((?:[^"\\]|\\.)*)"/g, (match, val) => {
let clean = val;
// Decode HTML entities so we can strip the resulting HTML tags
clean = clean.replace(/&amp;/g, '&').replace(/&lt;/g, '<').replace(/&gt;/g, '>').replace(/&quot;/g, '"').replace(/&#39;/g, "'");

Check failure

Code scanning / CodeQL

Double escaping or unescaping High

This replacement may produce '&' characters that are double-unescaped
here
.

Copilot Autofix

AI 12 days ago

In general, to avoid double unescaping, HTML entities should be decoded in one well-defined place, not in multiple passes scattered through the code. Here, there are two decoding points: one inside parseFile (line 16) and another in the JSON.stringify replacer (lines 75–79). The cleanest fix that preserves existing behavior is to stop decoding entities inside parseFile and rely on the final replacer to handle entity decoding for all strings uniformly. parseFile only needs to strip HTML tags and clean up whitespace; it does not need to turn entities into characters to do that.

Concretely, in scripts/expand-toc.js, inside parseFile’s "text"-specific replacement (lines 13–24), remove the line that decodes &amp;, &lt;, &gt;, &quot;, and &#39; into raw characters. clean should start as val, then have HTML tags stripped directly (they are already literal <...> tags in the JSON source) and whitespace normalized, then quotes escaped for JSON. No other files need changes, and no new helpers or imports are needed.

Suggested changeset 1
scripts/expand-toc.js

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/scripts/expand-toc.js b/scripts/expand-toc.js
--- a/scripts/expand-toc.js
+++ b/scripts/expand-toc.js
@@ -12,8 +12,6 @@
         // to avoid unescaped quotes/attributes breaking JSON
         fileContent = fileContent.replace(/"text":\s*"((?:[^"\\]|\\.)*)"/g, (match, val) => {
             let clean = val;
-            // Decode HTML entities so we can strip the resulting HTML tags
-            clean = clean.replace(/&amp;/g, '&').replace(/&lt;/g, '<').replace(/&gt;/g, '>').replace(/&quot;/g, '"').replace(/&#39;/g, "'");
             // Strip HTML tags
             clean = clean.replace(/<[^>]*>/g, '');
             // Collapse whitespace and escaped newlines
EOF
@@ -12,8 +12,6 @@
// to avoid unescaped quotes/attributes breaking JSON
fileContent = fileContent.replace(/"text":\s*"((?:[^"\\]|\\.)*)"/g, (match, val) => {
let clean = val;
// Decode HTML entities so we can strip the resulting HTML tags
clean = clean.replace(/&amp;/g, '&').replace(/&lt;/g, '<').replace(/&gt;/g, '>').replace(/&quot;/g, '"').replace(/&#39;/g, "'");
// Strip HTML tags
clean = clean.replace(/<[^>]*>/g, '');
// Collapse whitespace and escaped newlines
Copilot is powered by AI and may make mistakes. Always verify output.
// Decode HTML entities so we can strip the resulting HTML tags
clean = clean.replace(/&amp;/g, '&').replace(/&lt;/g, '<').replace(/&gt;/g, '>').replace(/&quot;/g, '"').replace(/&#39;/g, "'");
// Strip HTML tags
clean = clean.replace(/<[^>]*>/g, '');

Check failure

Code scanning / CodeQL

Incomplete multi-character sanitization High

This string may still contain
<script
, which may cause an HTML element injection vulnerability.

Copilot Autofix

AI 12 days ago

In general terms, the problem arises because we decode HTML entities into <, >, ", and ', then attempt to remove HTML tags using a regex that may not handle all corner cases and can be sensitive to multi-character interactions. To remove the risk that unsafe HTML fragments like <script survive, we should ensure that the final "text" content we inject into JSON never contains raw < or > characters at all. Since comments say we only need plain text (“Strip HTML tags from "text" values ...”), removing all </> is acceptable and preserves semantics.

The best minimal fix within this file is to add an extra sanitization step after stripping tags: replace any remaining < or > characters in clean with an empty string (or some safe replacement such as a space). This guarantees that no <script sequence – or any HTML tag opener – can survive in the final clean string, even if the earlier tag-stripping regex misses some patterns. Concretely, in parseFile’s callback at lines 13–24, after line 18 (clean = clean.replace(/<[^>]*>/g, '');), we add a new line clean = clean.replace(/[<>]/g, '');. No new imports or external libraries are needed; this is simple string manipulation.

Suggested changeset 1
scripts/expand-toc.js

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/scripts/expand-toc.js b/scripts/expand-toc.js
--- a/scripts/expand-toc.js
+++ b/scripts/expand-toc.js
@@ -16,6 +16,8 @@
             clean = clean.replace(/&amp;/g, '&').replace(/&lt;/g, '<').replace(/&gt;/g, '>').replace(/&quot;/g, '"').replace(/&#39;/g, "'");
             // Strip HTML tags
             clean = clean.replace(/<[^>]*>/g, '');
+            // Ensure no raw angle brackets remain that could start HTML tags
+            clean = clean.replace(/[<>]/g, '');
             // Collapse whitespace and escaped newlines
             clean = clean.replace(/\\n/g, ' ').replace(/\s+/g, ' ').trim();
             // Escape any double quotes for valid JSON
EOF
@@ -16,6 +16,8 @@
clean = clean.replace(/&amp;/g, '&').replace(/&lt;/g, '<').replace(/&gt;/g, '>').replace(/&quot;/g, '"').replace(/&#39;/g, "'");
// Strip HTML tags
clean = clean.replace(/<[^>]*>/g, '');
// Ensure no raw angle brackets remain that could start HTML tags
clean = clean.replace(/[<>]/g, '');
// Collapse whitespace and escaped newlines
clean = clean.replace(/\\n/g, ' ').replace(/\s+/g, ' ').trim();
// Escape any double quotes for valid JSON
Copilot is powered by AI and may make mistakes. Always verify output.
// Collapse whitespace and escaped newlines
clean = clean.replace(/\\n/g, ' ').replace(/\s+/g, ' ').trim();
// Escape any double quotes for valid JSON
clean = clean.replace(/"/g, '\\"');

Check failure

Code scanning / CodeQL

Incomplete string escaping or encoding High

This does not escape backslash characters in the input.

Copilot Autofix

AI 12 days ago

In general, the robust way to fix this is to avoid hand-rolling JSON string escaping and instead use JSON.stringify on the string value you want to insert, then splice that encoded fragment into your larger JSON text. JSON.stringify correctly escapes backslashes, double quotes, and other control characters per the JSON spec, avoiding the incomplete escaping problem.

For this specific code, we can keep the current parsing logic but change the "text" replacement so that, after computing the cleaned clean string, we no longer manually replace " with \". Instead, we should call JSON.stringify(clean) and use the resulting value directly, which will include surrounding quotes and properly-escaped contents. Concretely, within parseFile in scripts/expand-toc.js, inside the fileContent.replace(/"text":\s*"((?:[^"\\]|\\.)*)"/g, ...) callback:

  • Remove the line clean = clean.replace(/"/g, '\\"');.
  • Change the return from return \"text": "${clean}"`;toreturn `"text": ${JSON.stringify(clean)}`;`.

This change leaves the existing trimming, HTML-tag stripping, and whitespace collapsing logic intact, but delegates the final JSON escaping step to the built-in JSON engine, which correctly escapes backslashes and any other required characters.

No new imports are needed since JSON is global in Node.js.

Suggested changeset 1
scripts/expand-toc.js

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/scripts/expand-toc.js b/scripts/expand-toc.js
--- a/scripts/expand-toc.js
+++ b/scripts/expand-toc.js
@@ -18,9 +18,8 @@
             clean = clean.replace(/<[^>]*>/g, '');
             // Collapse whitespace and escaped newlines
             clean = clean.replace(/\\n/g, ' ').replace(/\s+/g, ' ').trim();
-            // Escape any double quotes for valid JSON
-            clean = clean.replace(/"/g, '\\"');
-            return `"text": "${clean}"`;
+            // Use JSON.stringify to produce a valid JSON string literal (escapes backslashes, quotes, etc.)
+            return `"text": ${JSON.stringify(clean)}`;
         });
         return JSON.parse(fileContent);
     } catch (e) {
EOF
@@ -18,9 +18,8 @@
clean = clean.replace(/<[^>]*>/g, '');
// Collapse whitespace and escaped newlines
clean = clean.replace(/\\n/g, ' ').replace(/\s+/g, ' ').trim();
// Escape any double quotes for valid JSON
clean = clean.replace(/"/g, '\\"');
return `"text": "${clean}"`;
// Use JSON.stringify to produce a valid JSON string literal (escapes backslashes, quotes, etc.)
return `"text": ${JSON.stringify(clean)}`;
});
return JSON.parse(fileContent);
} catch (e) {
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants