Skip to content

ie-html: implement CR/CRLF normalization in tokenizer #88

@thomasnemer

Description

@thomasnemer

Goal

The WHATWG spec requires preprocessing that normalizes \r\n → \n and lone \r → \n before tokenization.

Current state

11 html5lib tokenizer tests fail due to missing CR normalization:

  • CR in comments, bogus comments
  • CRLF sequences not normalized

Implementation

  • Add preprocessing pass in Tokenizer::new() or handle in next_char()
  • Replace \r\n with \n, lone \r with \n
  • Or handle during character consumption

Impact

Fixes 11 failing conformance tests, improves pass rate from 99.6% to ~100%.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions