Icons Registry: improve SVG sanitizer#75550
Conversation
e537f45 to
7bdd4c1
Compare
| if ( ! preg_match( '/<svg\b/i', $filtered_content ) ) { | ||
| return ''; | ||
| } |
There was a problem hiding this comment.
wp_kses() removes disallowed tags but not the text inside them, so I added a logic that returns an empty string if the tag is not an SVG element.
| '<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" width="24" height="24" class="icon" aria-hidden="true"><path d="M0 0" fill="currentColor" /></svg>', | ||
| '<svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24" width="24" height="24" class="icon" aria-hidden="true"><path d="M0 0" fill="currentColor" /></svg>', |
There was a problem hiding this comment.
As far as I've tested, lowercase viewbox is automatically recognized as viewBox by major browsers, so as long as we're using inline SVG as HTML content, we shouldn't have any problems.
<!DOCTYPE html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Document</title>
</head>
<body>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100" width="100">
<rect x="0" y="0" width="100%" height="100%" />
<circle cx="50%" cy="50%" r="4" fill="white" />
</svg>
<svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 100 100" width="100">
<rect x="0" y="0" width="100%" height="100%" />
<circle cx="50%" cy="50%" r="4" fill="white" />
</svg>
</body>
</html>
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
|
@t-hamano this seems like a good improvement since it retains the structure of the existing sanitization while adding extra attributes. did you verify that none of the newly-allowed attributes allows for script execution? one note here is that SVG is complicated to parse, and even the search for $svg = '';
$processor = WP_HTML_Processor::create_fragment( $icon_content );
if ( ! $processor->next_token() || 'SVG' !== $processor->get_tag() ) {
return '';
}
$svg .= $processor->serialize_token();
$svg_depth = $processor->get_current_depth();
while ( $processor->next_token() && $processor->get_current_depth() > $svg_depth ) {
$svg .= $processor->serialize_token();
}
// Parsing has stopped even though there might be additional content.
$filtered = wp_kses( ... )This could also be done after filtering with Unfortunately there could be issues with self-closing tags, which exist within the SVG content but otherwise not in the HTML. In HTML, sticking with for gradual improvement I think this is good as long as we have a thorough review of the attributes and tags we’re allowing. |
| * @return string The sanitized icon SVG content. | ||
| */ | ||
| private function sanitize_icon_content( $icon_content ) { | ||
| // Core attributes applicable to most elements. |
There was a problem hiding this comment.
This appears to be using the elements and attributes from my earlier comment, can you confirm?
Would be nice to validate those, as I haven't gone in-depth in that, and there are plenty of elements and attributes that SVGs can support. Also we know that any KSES-related changes shouldn't be taken lightly 😉
|
Sorry for the late reply. I'm working with @mcsf to build the infrastructure to allow the Icon API to be extended on the Gutenberg plugin. If that works out, I'll come back to this PR. |
7bdd4c1 to
26a51db
Compare
|
Ok, I implemented the following two things based on your feedback:
There are three things that concern me about the operation of the sanitizer:
|
There was a problem hiding this comment.
To avoid duplicate work, I would like to create a core backport PR after this PR is approved.
Let’s discuss one aspect here that I think partially addresses these concerns: the HTML Processor is parsing SVGs as they would appear inline in HTML. It is not attempting to process external SVGs as they would be loaded in a context like There’s a big distinction there because browsers handle inline and external SVGs in extremely different ways due to the legacy of browser parsers and the ongoing challenge of securing the web. In this mix is a third context, which is SVGs as part of data URIs and I don’t know off the top of my head where that lies, but it’s closer to external assets. These distinctions appear as:
A sanitization pipeline for external assets needs to be almost entirely different than those for assets bound to be included in the HTML, and unfortunately due to the extensible nature of WordPress, we cannot guarantee that external SVGs will not be inlined into the HTML via some plugin. Therefore, we are basically left where it’s probably mostly appropriate to assume these are all bound for HTML.
I’m not sure I follow. If you are looking at something like
This relates entirely to HTML parsing of inline SVG, but the SVGs should continue working regardless. HTML elements and attributes don’t have namespacing like XML elements do. There are a few exceptions in HTML where these are faked for the sake of inline SVG and MathML, but that is why it might appear to unexpectedly strip out the The rules are adjusting SVG attributes in the HTML spec. The function
See above, the same rules apply. |
What?
In the experimental icon registry, any string can be registered as SVG content. Unfortunately, there is still no core function to properly sanitize SVGs, so we've enhanced the private methods for sanitization and defined as many SVG-friendly tags and attributes as possible.
Testing Instructions
Please refer to the unit test to see what the function expects.