security: always-on attribute + URL hardening; cap inline nesting#253
Merged
Conversation
Dangerous content was only neutralized when safe mode was explicitly enabled, but safe mode is off by default - so a plain HtmlRenderer emitted javascript: hrefs, on* handlers, srcdoc/formaction and CSS expression() unfiltered. Apply a baseline that runs regardless of safe mode; safe mode still layers stricter filtering (raw-HTML policy, style, allowlists) on top. - HtmlRenderer: strip on*/srcdoc/formaction attribute names and blank values carrying a dangerous URL scheme (javascript/vbscript/data/file) or CSS expression()/url()/@import/behavior, always-on. href/src get the dangerous scheme denylist before the safe-mode allowlist. Scheme detection normalizes C0 controls + spaces to defeat java\tscript: evasion. - InlineParser: cap inline recursion (deeply nested links/emphasis rescanned balanced brackets at each level, ~O(n^2) - a few thousand levels took ~10s); beyond the cap the remaining text is emitted literally. - SafeModeTest: the two cases that asserted dangerous URLs pass through with safe mode off now assert the always-on baseline, demonstrating the off-by- default distinction via raw-HTML passthrough instead.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #253 +/- ##
============================================
- Coverage 92.34% 92.32% -0.02%
- Complexity 3593 3617 +24
============================================
Files 107 107
Lines 10165 10221 +56
============================================
+ Hits 9387 9437 +50
- Misses 778 784 +6 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ports the confirmed default-output security hardening from the downstream carve-php fork.
Problem
All dangerous-content sanitization is gated on safe mode (
if ($this->safeMode !== null)inHtmlRenderer), and safe mode is off by default. So a plainnew DjotConverter()/new HtmlRenderer()emits, unfiltered:javascript:/vbscript:/data:URLs in linkhrefand imagesrconclick, ...) and the injection sinkssrcdoc/formactionexpression()instyleSeparately, deeply nested inline constructs (a bomb of nested links / emphasis) rescan balanced brackets at each recursion level - roughly O(n^2). A few thousand levels of nesting took ~10s of CPU, a parse-time DoS.
Change
HtmlRenderer- always-on baseline, independent of safe mode. Stripon*/srcdoc/formactionattribute names; blank a value carrying a dangerous URL scheme or a script-bearing CSS construct (expression()/url()/@import/behavior/-moz-binding).href/srcrun the dangerous-scheme denylist before the safe-mode allowlist. Scheme detection strips C0 controls + spaces first, sojava\tscript:cannot evade. Safe mode still layers its stricter filtering (raw-HTML policy,style, allowlists) on top - this only changes the default (no-safe-mode) output, which previously did nothing.InlineParser- cap inline recursion depth (100, far deeper than any real document). Beyond the cap the remaining text is emitted literally instead of recursing, bounding the O(n^2) blowup.AlwaysOnHardeningTest; twoSafeModeTestcases that asserted dangerous URLs pass through with safe mode off were updated to the new baseline (the off-by-default distinction is now shown via raw-HTML passthrough).Compatibility
This shifts the default rendering output: dangerous schemes / handlers that previously passed through are now neutralized even without safe mode. There is no legitimate reason to emit
javascript:oronclickfrom untrusted markup, so the baseline is unconditional; callers who genuinely need a permissive renderer still control the rest via safe-mode configuration.Scope notes
Confirmed-not-applicable and excluded: djot-php already drops a colliding
{href=...}attribute-block override on links/images (no change needed). The HTML importer (HtmlToDjot) carryingjavascript:through on import is a separate surface, left for a follow-up.