Block parser: Share parsed innerHTML across validation fixes#77955
Open
ellatrix wants to merge 2 commits into
Open
Block parser: Share parsed innerHTML across validation fixes#77955ellatrix wants to merge 2 commits into
ellatrix wants to merge 2 commits into
Conversation
Invalid blocks went through `applyBuiltInValidationFixes` and the deprecation loop re-parsing the same `originalContent` repeatedly: each fix wrapped innerHTML in a synthetic `<div data-X>` and re-parsed to read a single root-element attribute, and every deprecation iteration re-parsed via `getBlockAttributes`. Parse `innerHTML` once in `parseRawBlock`, deep-clone the result so it's insulated from hpq's shared document body, and thread it down through the validation and deprecation paths. Fixes now read attributes directly off the pre-parsed root. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`parseRawBlock` types the cloned body as `Element | null`, but `getBlockAttributes` previously only accepted `Node | undefined`, causing a TS2345 in `parseRawBlock` and the deprecation iteration. Allowing `null` lets callers thread their captured-or-null body directly without coercion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
|
Size Change: +105 B (0%) Total Size: 7.94 MB 📦 View Changed
ℹ️ View Unchanged
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What?
When a block fails validation, the parser runs through
applyBuiltInValidationFixesand a deprecation loop that all repeatedly re-parse the sameoriginalContentHTML. This PR parses each block'sinnerHTMLonce inparseRawBlockand threads the parsed body down through attribute extraction, validation fixes, and deprecation iteration so they all share that single parse.Why?
For invalid blocks the parser was doing a lot of redundant HTML parsing:
getBlockAttributesparsesinnerHTMLonce for matchers and discards the result.fixCustomClassnameandfixGlobalAttributeeach wrapinnerHTMLin a synthetic<div data-X>and re-parse viaparseWithAttributeSchema, just to read a single attribute (class,id,aria-label) off the root element. That's 3 extra parses per invalid block per fix attempt.applyBlockDeprecatedVersionsre-parses the sameoriginalContentper deprecation iteration viagetBlockAttributes, multiplying the cost for blocks with many deprecation entries (gallery, embed, cover, image, …).For posts with many invalid or deprecation-heavy blocks, this is the hot path.
How?
parseRawBlockcallsparseHtml(innerHTML)once and deep-clones the resulting body. The clone is necessary because hpq parses into a single shared document body — any subsequentparseHtml(string)call anywhere downstream (notablyfixCustomClassnameparsing freshly serialized save content) would otherwise overwrite our captured body in place. The clone is independent.getBlockAttributesgains an optionalparsedBodyparameter; when supplied, it skips its internal parse.applyBlockValidation,applyBuiltInValidationFixes, andapplyBlockDeprecatedVersionsaccept the pre-parsed body and thread it down.applyBuiltInValidationFixesextractsfirstElementChildonce and passes the root element to each fix.fixCustomClassnameandfixGlobalAttributeaccept an optional pre-parsed root element. When provided, they readclass/id/aria-labeldirectly off it, replacing the synthetic-<div>wrap-and-reparse trick.Public API of
getBlockAttributesgains an optional 4th parameter; existing callers (includingshortcode-converter, the only external in-tree caller) work unchanged.Testing Instructions
attributes.className.Testing Instructions for Keyboard
No UI changes.
Screenshots or screencast
N/A — internal parser change.
Use of AI Tools
This PR was authored with assistance from Claude (Anthropic). The change was developed iteratively through dialogue: the human directed the goal and approach, the AI proposed code and explored tradeoffs, and the human reviewed each step before accepting. All code was reviewed and tested by the human author.