Skip to content

Commit 6ad2758

Browse files
committed
test(ie-html): add html5lib tree construction conformance suite — 42.2% pass rate #60
Vendor html5lib-tests tree construction fixtures (56 .dat files, 1578 non-scripted non-fragment tests) with MIT LICENSE attribution. Conformance harness: - Parses custom .dat format (#data/#errors/#document sections) - Skips script-on and fragment tests (not supported) - Serializes DOM to html5lib expected format with DOCTYPE output - Per-file and aggregate reporting Tree builder fixes: - Initial mode now stores DOCTYPE name/public_id/system_id - ParseResult exposes doctype fields for serialization - Recursion guard (depth 20) prevents stack overflow from reprocessing loops in table modes 42.2% pass rate (666/1578). Main failure categories: - Table structure edge cases and foster parenting precision - Adoption agency corner cases - Missing: foreign content (SVG/MathML), ruby, button scope - Missing: InFrameset, template document fragment
1 parent 7d93227 commit 6ad2758

59 files changed

Lines changed: 24690 additions & 2 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

crates/ie-html/src/tree_builder.rs

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@ pub struct ParseResult {
1111
pub errors: Vec<String>,
1212
pub style_elements: Vec<String>,
1313
pub link_stylesheets: Vec<String>,
14+
pub doctype_name: Option<String>,
15+
pub doctype_public_id: Option<String>,
16+
pub doctype_system_id: Option<String>,
1417
}
1518

1619
/// Top-level parse function. HTML parsing never fails — errors are collected.
@@ -22,6 +25,9 @@ pub fn parse(html: &str) -> ParseResult {
2225
errors: tb.errors,
2326
style_elements: tb.style_elements,
2427
link_stylesheets: tb.link_stylesheets,
28+
doctype_name: tb.doctype_name,
29+
doctype_public_id: tb.doctype_public_id,
30+
doctype_system_id: tb.doctype_system_id,
2531
}
2632
}
2733

@@ -41,6 +47,10 @@ struct TreeBuilder<'a> {
4147
style_elements: Vec<String>,
4248
link_stylesheets: Vec<String>,
4349
pending_text: String,
50+
reprocess_depth: u32,
51+
doctype_name: Option<String>,
52+
doctype_public_id: Option<String>,
53+
doctype_system_id: Option<String>,
4454
done: bool,
4555
}
4656

@@ -62,6 +72,10 @@ impl<'a> TreeBuilder<'a> {
6272
style_elements: Vec::new(),
6373
link_stylesheets: Vec::new(),
6474
pending_text: String::new(),
75+
reprocess_depth: 0,
76+
doctype_name: None,
77+
doctype_public_id: None,
78+
doctype_system_id: None,
6579
done: false,
6680
}
6781
}
@@ -80,6 +94,12 @@ impl<'a> TreeBuilder<'a> {
8094
}
8195

8296
fn process_token(&mut self, token: Token) {
97+
self.reprocess_depth += 1;
98+
if self.reprocess_depth > 20 {
99+
// Prevent infinite recursion from reprocessing loops
100+
self.reprocess_depth -= 1;
101+
return;
102+
}
83103
match self.mode {
84104
InsertionMode::Initial => self.handle_initial(token),
85105
InsertionMode::BeforeHtml => self.handle_before_html(token),
@@ -105,6 +125,7 @@ impl<'a> TreeBuilder<'a> {
105125
InsertionMode::AfterAfterFrameset => self.handle_in_body(token),
106126
InsertionMode::AfterFrameset => self.handle_in_body(token),
107127
}
128+
self.reprocess_depth -= 1;
108129
}
109130

110131
// --- Helpers ---
@@ -823,8 +844,16 @@ impl<'a> TreeBuilder<'a> {
823844
Token::Comment(ref data) => {
824845
self.insert_comment_at(data, self.doc.root);
825846
}
826-
Token::Doctype { .. } => {
827-
// Accept doctype, always no-quirks mode
847+
Token::Doctype {
848+
name,
849+
public_id,
850+
system_id,
851+
..
852+
} => {
853+
self.doctype_name = name;
854+
self.doctype_public_id = public_id;
855+
self.doctype_system_id = system_id;
856+
self.mode = InsertionMode::BeforeHtml;
828857
}
829858
_ => {
830859
self.parse_error("unexpected token in Initial mode");
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
Copyright (c) 2006-2013 James Graham, Geoffrey Sneddon, and
2+
other contributors
3+
4+
Permission is hereby granted, free of charge, to any person obtaining
5+
a copy of this software and associated documentation files (the
6+
"Software"), to deal in the Software without restriction, including
7+
without limitation the rights to use, copy, modify, merge, publish,
8+
distribute, sublicense, and/or sell copies of the Software, and to
9+
permit persons to whom the Software is furnished to do so, subject to
10+
the following conditions:
11+
12+
The above copyright notice and this permission notice shall be
13+
included in all copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
17+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
19+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
20+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
21+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

0 commit comments

Comments
 (0)