Skip to content

Commit 26c781e

Browse files
committed
scan all tokens
Squashed commit of the following: commit 4959837 Author: Dennis Snell <dennis.snell@automattic.com> Date: Tue Jan 16 17:08:27 2024 -0600 Update to call `$this->next_token()` in `seek()` and fix tests commit 3d8b20e Author: Dennis Snell <dennis.snell@automattic.com> Date: Tue Jan 16 09:58:13 2024 -0600 RAWTEXT and SCRIPT elements do no decode character references commit f502153 Author: Dennis Snell <dennis.snell@automattic.com> Date: Mon Jan 15 15:56:54 2024 -0600 WPCS commit 543a0b8 Author: Dennis Snell <dennis.snell@automattic.com> Date: Mon Jan 15 15:54:41 2024 -0600 Fix span-of-dashes comment modifiable text commit b0aae8a Author: Jon Surrell <sirreal@users.noreply.github.com> Date: Mon Jan 15 22:07:27 2024 +0100 Add failing test for `<!----->` commit 283df46 Author: Dennis Snell <dennis.snell@automattic.com> Date: Mon Jan 15 11:55:30 2024 -0600 Expand comment introducing modifiable text. commit ede20ca Author: Dennis Snell <dennis.snell@automattic.com> Date: Mon Jan 15 11:38:59 2024 -0600 Rename INCOMPLETE state to INCOMPLETE_INPUT commit 7de4cc2 Author: Dennis Snell <dennis.snell@automattic.com> Date: Fri Jan 12 13:49:12 2024 -0500 PR Feedback Co-authored-by: Jon Surrell <sirreal@users.noreply.github.com> commit 094176e Author: Dennis Snell <dennis.snell@automattic.com> Date: Fri Jan 12 13:23:12 2024 -0500 Remove early bailout of special elements. It's duplicated. commit 7fa58c8 Author: Dennis Snell <dennis.snell@automattic.com> Date: Fri Jan 12 13:10:53 2024 -0500 Feedback updates. Co-authored-by: David Herrera <mail@dlh01.info> Co-authored-by: Jon Surrell <sirreal@users.noreply.github.com> commit 28fc54d Author: Dennis Snell <dennis.snell@automattic.com> Date: Fri Jan 12 12:54:22 2024 -0500 Expand docblocks for CDATA/PINodes and re-add removed tests commit 1194d6f Author: Dennis Snell <dennis.snell@automattic.com> Date: Fri Jan 12 07:53:28 2024 -0500 Provisionarily: add back CDATA and PI nodes commit b6d4300 Author: Dennis Snell <dennis.snell@automattic.com> Date: Wed Jan 10 12:05:45 2024 -0500 Fix + WPCS commit 7d1c2e8 Author: Dennis Snell <dennis.snell@automattic.com> Date: Thu Jan 11 21:22:51 2024 -0500 Remove support for CDATA sections. commit e91a33b Author: Dennis Snell <dennis.snell@automattic.com> Date: Wed Jan 10 11:51:17 2024 -0500 Remove support for Processing Instructions Attempting to parse processing instructions conflicts with parsing bogus comments when a document may be incomplete, which might create a divergence in the HTML API from browser behavior. commit 3d68e28 Author: Dennis Snell <dennis.snell@automattic.com> Date: Wed Jan 10 11:17:57 2024 -0500 Fix non-PI-node tests commit d596176 Author: Dennis Snell <dennis.snell@automattic.com> Date: Wed Jan 10 11:09:20 2024 -0500 Add basic conformance tests commit 2199e86 Author: Dennis Snell <dennis.snell@automattic.com> Date: Sun Dec 10 15:17:01 2023 +0100 HTML API: Avoid processing incomplete syntax elements. The HTML Tag Processor is able to know if it starts parsing a syntax element and reaches the end of the document before it reaches the end of the element. In these cases, after this patch, the processor will indicate this condition. For example, when processing `<div><input type="te` there is an incomplete INPUT element. The processor will fail to find the INPUT, it will pause right after the DIV, and `paused_at_incomplete_token()` will return `true`. This patch doesn't change any existing behaviors, but it adds the new method to report on the final failure condition. It provides a mechanism for later use to add chunked parsing to the class, wherein it will be possible to process a document without having the entire document loaded in memory, for example when processing unbuffered output. This is also a necessary change for adding the ability to scan every token in the document. Currently the Tag Processor only exposes tags as tokens, but it will need to process `#text` nodes, HTML comments, and other markup in order to enable behaviors in the HTML Processor and in refactors of existing HTML processing in Core.
1 parent a64c96f commit 26c781e

4 files changed

Lines changed: 1459 additions & 142 deletions

File tree

src/wp-includes/html-api/class-wp-html-processor.php

Lines changed: 20 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -149,17 +149,6 @@ class WP_HTML_Processor extends WP_HTML_Tag_Processor {
149149
*/
150150
const MAX_BOOKMARKS = 100;
151151

152-
/**
153-
* Static query for instructing the Tag Processor to visit every token.
154-
*
155-
* @access private
156-
*
157-
* @since 6.4.0
158-
*
159-
* @var array
160-
*/
161-
const VISIT_EVERYTHING = array( 'tag_closers' => 'visit' );
162-
163152
/**
164153
* Holds the working state of the parser, including the stack of
165154
* open elements and the stack of active formatting elements.
@@ -424,6 +413,23 @@ public function next_tag( $query = null ) {
424413
return false;
425414
}
426415

416+
/**
417+
* Steps through the HTML document and stop at the next token, if any.
418+
*
419+
* Currently only supports stepping through tags.
420+
*
421+
* @return bool
422+
*/
423+
public function next_token() {
424+
$found_a_token = parent::next_token();
425+
426+
if ( '#tag' === $this->get_token_type() ) {
427+
$this->step( self::REPROCESS_CURRENT_NODE );
428+
}
429+
430+
return $found_a_token;
431+
}
432+
427433
/**
428434
* Indicates if the currently-matched tag matches the given breadcrumbs.
429435
*
@@ -520,7 +526,9 @@ public function step( $node_to_process = self::PROCESS_NEXT_NODE ) {
520526
$this->state->stack_of_open_elements->pop();
521527
}
522528

523-
parent::next_tag( self::VISIT_EVERYTHING );
529+
while ( parent::next_token() && '#tag' !== $this->get_token_type() ) {
530+
continue;
531+
}
524532
}
525533

526534
// Finish stepping when there are no more tokens in the document.

0 commit comments

Comments
 (0)