Skip to content

fix: replace regex HTML pre-scan with linear scanner#49

Merged
Gumball12 merged 2 commits into
mainfrom
codex/fix-cpu-denial-of-service-vulnerability
May 29, 2026
Merged

fix: replace regex HTML pre-scan with linear scanner#49
Gumball12 merged 2 commits into
mainfrom
codex/fix-cpu-denial-of-service-vulnerability

Conversation

@Gumball12
Copy link
Copy Markdown
Owner

@Gumball12 Gumball12 commented May 21, 2026

Motivation

  • A default-on HTML tag pre-scan used a backtracking regex that can degrade to quadratic CPU time on inputs with many < characters or unterminated <!-- comments, creating a CPU DoS risk when ignoreHtmlTag is enabled by default.
  • The change replaces the expensive regex-based materialization with a safer linear scanner so the event loop is not blocked on attacker-controlled text.

Description

  • Replace the matchAll/regex pre-scan in useCheckIsHtmlTag with a linear getHtmlTagRangeList scanner that uses indexOf to find <, >, and --> boundaries.
  • The scanner recognizes <!-- ... --> comments and normal <...> tags.
  • Preserve external behavior by returning the same tag-inclusion predicate from useCheckIsHtmlTag and by producing the same range-list shape for downstream checks. In particular, an unterminated <!-- falls through to normal <...> scanning so subsequent tags are still recognized — matching the original /<!--[^]*?-->|<[^>]+>/g behavior.
  • Export a MatchRange type alias from utils and annotate extractMatchRangeList's return type so the declaration build resolves the symbol.

Follow-up fixes (9a91465)

Addresses the two findings from the Codex review on b06a7ac:

  • P1 — MatchRange import: utils.ts did not export the type, which caused TS2305 during vite-plugin-dts declaration emit. Now exported as export type MatchRange = [number, number], and extractMatchRangeList's signature is annotated to match.
  • P2 — unterminated <!-- regression: the initial scanner break'd on a missing -->, dropping every later tag. The scanner now falls through to the normal <...> handler so the < is treated as a regular tag opener, restoring the previous regex behavior.

Testing

  • pnpm --filter text-vide test51 tests passed (47 → 51).
  • Coverage: 100.00% statements / 100.00% branches / 100.00% functions / 100.00% lines (codecov patch & project both pass).
  • pnpm --filter text-vide build — declaration emit succeeds with no TS errors.
  • Regression tests added:
    • tag ranges preserved after unterminated <!--
    • multiple tags after an unterminated <!--
    • lone <!-- opener with no further tags
    • empty <> is ignored

Codex Task

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (8adb359) to head (9a91465).

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #49   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            7         7           
  Lines           80        98   +18     
  Branches        11        16    +5     
=========================================
+ Hits            80        98   +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Gumball12 Gumball12 self-assigned this May 21, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b06a7ac421

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@@ -1,10 +1,43 @@
import { extractMatchRangeList } from './utils';
import { MatchRange } from './utils';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fix unresolved MatchRange import

useCheckIsHtmlTag.ts now imports MatchRange from ./utils, but utils.ts does not export that symbol. This introduces a real TypeScript error (TS2305) during declaration generation (reproducible via pnpm --filter text-vide build), so any strict type-checking pipeline or downstream source type-check will fail until this type is exported or defined/imported correctly as a type.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9a91465. MatchRange is now exported from ./utils as export type MatchRange = [number, number], and extractMatchRangeList's return type is annotated to match. Verified locally: pnpm --filter text-vide build now completes declaration emit (vite-plugin-dts) without TS2305.

Comment on lines +15 to +16
if (closeIndex === -1) {
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Continue scanning after unterminated HTML comment opener

Breaking out of the scanner when <!-- has no matching --> causes all later tags to be ignored by the pre-scan, so words inside subsequent real tags can be incorrectly highlighted. In this commit, inputs like "x <!-- unterminated <div>ab</div> yy <i>cd</i>" now produce no tag ranges at all, whereas the previous implementation still identified later tag-like ranges; this is a behavior regression for malformed-but-encountered text inputs.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9a91465. When <!-- has no matching -->, the scanner no longer breaks; it falls through to the normal <...> handler so the < is treated as a regular tag opener. This matches the original /<!--[^]*?-->|<[^>]+>/g behavior, where an unterminated comment fell back to the <[^>]+> alternative.

Concrete check on the example from this review (x <!-- unterminated <div>ab</div> yy <i>cd</i>):

  • Before this commit: getHtmlTagRangeList() returned [], so all words got highlighted, including those inside <div>/<i>.
  • After this commit: ranges cover <!-- unterminated <div>, </div>, <i>, </i> — identical to the previous regex output.

Three regression tests were added covering this case (multi-tag after unterminated <!--), lone-<!--, and empty <>. Patch coverage is back to 100%.

- Export MatchRange from utils so useCheckIsHtmlTag's import resolves
  (fixes TS2305 during vite-plugin-dts declaration emit).
- Fall through to normal <...> scan when <!-- has no matching -->
  instead of aborting the loop, restoring the original regex behavior.
- Add regression tests for unterminated comments, lone <!--, and empty <>.
@Gumball12 Gumball12 merged commit 91b4e28 into main May 29, 2026
4 checks passed
@Gumball12 Gumball12 deleted the codex/fix-cpu-denial-of-service-vulnerability branch May 29, 2026 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant