Skip to content

Commit 8ae1a4d

Browse files
HomenShumSun-sunshine06
authored andcommitted
fix(core): close-tag regex misses script/style end tags with attrs/whitespace
CodeQL js/bad-tag-filter (HIGH) on PR #241: the literal `<\/script>` / `<\/style>` patterns in stripTags() left bodies behind for HTML5-tolerated end-tag forms like `</script >` (trailing space) and `</script foo="bar">` (end-tag attributes, silently ignored by browsers per spec). A crafted source HTML could leak script/style body text into the visible-word vocabulary used for parity coverage scoring. Fix: mirror the opening-tag pattern's `\b[^>]*` on the close tag too. The `\b` after the tag name prevents over-matching `</scripts>` while the `[^>]*` consumes any tolerated end-tag content up to the closing `>`. Regression test covers all 4 previously-vulnerable forms: - `</script >` (trailing whitespace) - `</script foo="bar">` (end-tag attrs) - `</style >` (style branch) - `</SCRIPT>` (case) Asserts none of 4 leaked tokens appear in the parity report when the decomposition correctly omits the script/style content. Signed-off-by: homen <hshum2018@gmail.com>
1 parent 517d74c commit 8ae1a4d

2 files changed

Lines changed: 58 additions & 2 deletions

File tree

packages/core/src/tools/verify-ui-kit-parity.test.ts

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,56 @@ describe('makeVerifyUiKitParityTool', () => {
154154
expect(result.details.signals.visibleTextCoverage).toBe(1);
155155
});
156156

157+
// Regression for CodeQL js/bad-tag-filter (HIGH) flagged on PR #241.
158+
// The pre-fix regex `<\/script>` literally missed end tags with trailing
159+
// whitespace or attributes, both of which HTML5 parsers tolerate. A crafted
160+
// source could leak script body text into the visible-word coverage check.
161+
// This test asserts that a script body using the previously-vulnerable
162+
// syntax does NOT contribute words to the source vocabulary, so a faithful
163+
// decomposition that omits the script still scores OK.
164+
it('strips script + style bodies even with attrs/whitespace in end tags', async () => {
165+
const sourceWithCraftedScript = `
166+
<html>
167+
<body>
168+
<header><h1>Acme Analytics</h1></header>
169+
<main>
170+
<div class="metric"><span>MRR</span><span>$12,400</span></div>
171+
</main>
172+
<script>secretLeakedTokenAaa()</script >
173+
<script>anotherLeakedTokenBbb()</script foo="bar">
174+
<style>.x{color:#f00}/*secretLeakedCssCcc*/</style >
175+
<SCRIPT>upperCaseLeakedTokenDdd()</SCRIPT>
176+
</body>
177+
</html>
178+
`;
179+
const cleanDecomp = `
180+
<html>
181+
<body>
182+
<header><h1>Acme Analytics</h1></header>
183+
<main>
184+
<div class="metric"><span>MRR</span><span>$12,400</span></div>
185+
</main>
186+
</body>
187+
</html>
188+
`;
189+
const fs = makeFs({
190+
'index.html': sourceWithCraftedScript,
191+
'ui_kits/x/index.html': cleanDecomp,
192+
'ui_kits/x/tokens.css': '',
193+
});
194+
const tool = makeVerifyUiKitParityTool(fs);
195+
const result = await tool.execute('t', { slug: 'x' }, undefined);
196+
const first = result.content[0];
197+
if (first?.type !== 'text') throw new Error('expected text');
198+
// None of the leaked tokens should appear in the parity report — if they
199+
// did, it would mean stripTags failed to remove the script/style bodies
200+
// and they leaked into the visible-word vocabulary.
201+
expect(first.text.toLowerCase()).not.toContain('secretleakedtoken');
202+
expect(first.text.toLowerCase()).not.toContain('anotherleakedtoken');
203+
expect(first.text.toLowerCase()).not.toContain('secretleakedcss');
204+
expect(first.text.toLowerCase()).not.toContain('uppercaseleakedtoken');
205+
});
206+
157207
it('summary text reflects pass/fail status', async () => {
158208
const fsOk = makeFs({
159209
'index.html': SOURCE_HTML,

packages/core/src/tools/verify-ui-kit-parity.ts

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -127,9 +127,15 @@ function elementParityScore(
127127
}
128128

129129
function stripTags(html: string): string {
130+
// Close-tag patterns mirror the opening pattern's `\b[^>]*` so we strip
131+
// the full `<script>...</script foo="bar" >` form. HTML5 parsers tolerate
132+
// attributes and trailing whitespace inside end tags (silently ignored)
133+
// and CodeQL's "Bad HTML filtering regexp" rule flags the literal
134+
// `</script>` form because it leaves bodies behind for `</script >` etc.
135+
// The `\b` after the tag name prevents over-matching like `</scripts>`.
130136
return html
131-
.replace(/<script\b[^>]*>[\s\S]*?<\/script>/gi, ' ')
132-
.replace(/<style\b[^>]*>[\s\S]*?<\/style>/gi, ' ')
137+
.replace(/<script\b[^>]*>[\s\S]*?<\/script\b[^>]*>/gi, ' ')
138+
.replace(/<style\b[^>]*>[\s\S]*?<\/style\b[^>]*>/gi, ' ')
133139
.replace(/<[^>]+>/g, ' ');
134140
}
135141

0 commit comments

Comments
 (0)