Skip to content

HTML API: Visit implied body at EOF after head#46

Open
sirreal wants to merge 3 commits into
trunkfrom
html-api-fuzz-fiz/implicit-body
Open

HTML API: Visit implied body at EOF after head#46
sirreal wants to merge 3 commits into
trunkfrom
html-api-fuzz-fiz/implicit-body

Conversation

@sirreal

@sirreal sirreal commented Jun 10, 2026

Copy link
Copy Markdown
Owner

What changed

This updates the HTML Processor EOF path so full-document parsing continues through the selected insertion mode once at EOF. That allows the parser to visit the implied <body> and the virtual EOF closers after documents that end in in head / after head states.

It also marks EOF-driven stack events as virtual even when the last concrete token was ignored, avoiding stale real-token provenance on implied EOF nodes.

Why

When a full document ended after the head, the processor could complete before after head inserted the implied BODY. Serialization then omitted <body></body> even though the HTML tree includes that element.

Validation

  • codex review --base 095fab550244b7e68541a95c5262a00089a45cc4 (only finding was an out-of-scope media CSS issue from the stale base; the PR compare is scoped to the HTML API files)
  • WP_TESTS_SKIP_INSTALL=1 ./vendor/bin/phpunit --group html-api,html-api-html5lib-tests
  • ./vendor/bin/phpcs src/wp-includes/html-api/class-wp-html-processor.php tests/phpunit/tests/html-api/wpHtmlProcessor-serialize.php

sirreal added 2 commits June 10, 2026 11:06
[P2] Mark EOF-implied BODY nodes as virtual — src/wp-includes/html-api/class-wp-html-processor.php:1072-1072

When EOF is processed after the last real token was an ignored BODY token, such as a full document ending with <template><body> or <noscript></body>, this branch leaves state->current_token pointing at that stale BODY. The implied BODY inserted at EOF is then classified by the stack push handler as the same real token, so next_token() exposes a null token and serialization emits </body> without a matching <body>. Please ensure EOF-created nodes are always treated as virtual, or clear/use a synthetic EOF token before implied nodes are inserted.
@sirreal sirreal marked this pull request as ready for review June 10, 2026 10:10
@github-actions

Copy link
Copy Markdown

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@sirreal

sirreal commented Jun 10, 2026

Copy link
Copy Markdown
Owner Author

Code review

Found 1 issue:

  1. The new $has_processed_eof property and is_eof_token() method are annotated @since 7.0.0, but trunk is at 7.1-alpha (src/wp-includes/version.php#L19) and Trac #65372 is milestoned 7.1. These should be @since 7.1.0 (same issue previously flagged on PR #44).

/**
* Whether the end-of-file token has been processed through the insertion modes.
*
* @since 7.0.0
*
* @var bool
*/
private $has_processed_eof = false;

/**
* Indicates if the Tag Processor has consumed all input.
*
* @since 7.0.0
*
* @return bool Whether the current token is the end-of-file token.
*/
private function is_eof_token(): bool {

Also verified (no action needed): the EOF reprocess chains terminate (each in template EOF pass shrinks the template insertion-mode stack before reprocessing); the insert_virtual_node() bookmark special-case is needed because WP_HTML_Tag_Processor::set_bookmark() refuses to allocate at STATE_COMPLETE/STATE_INCOMPLETE_INPUT; and the is_eof_token() provenance guard in the push/pop handlers only fires past end of input, so real tokens cannot be mis-marked as virtual.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

sirreal added a commit that referenced this pull request Jun 10, 2026
# Conflicts:
#	src/wp-includes/html-api/class-wp-html-processor.php
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant