security: ignore round-trip raw attributes on untrusted HTML import; bound abbreviation expansion#254
Merged
Conversation
…bound abbreviation expansion Ports two confirmed hardening fixes from the downstream carve-php fork, on top of the prior always-on attribute/URL hardening baseline. HtmlToDjot round-trip injection: - The reverse converter re-emitted the data-djot-src (block) and data-djot-raw (inline) round-trip attributes verbatim as raw Djot, so untrusted HTML could smuggle a raw-HTML block and inject live markup (e.g. a script element) into the converted output. Both attributes are now ignored by default; honoring them is opt-in via new HtmlToDjot(trustedRoundTrip: true) and propagated through the recursive inline sub-conversion. The internal Tabs round-trip generators, which consume djot's own rendered HTML, use the trusted converter so faithful round-trips are preserved. Abbreviation output-amplification DoS: - Each abbreviation occurrence re-emitted its full definition, so a tiny source with a huge definition and many occurrences expanded to definition_len times occurrence_count bytes (a RAM-exhaustion DoS). A per-render byte budget, max(1MB, 8 * sourceLength), now bounds cumulative expansion across the HTML, Markdown and ANSI renderers; once the budget is exceeded an occurrence degrades to plain key text. Document carries the source length, set in DjotConverter::parse(). Adds HtmlToDjotRoundTripSecurityTest and AbbreviationBudgetTest; existing round-trip-fidelity tests use the trusted converter explicitly.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #254 +/- ##
============================================
+ Coverage 92.32% 92.35% +0.02%
- Complexity 3617 3628 +11
============================================
Files 107 108 +1
Lines 10221 10250 +29
============================================
+ Hits 9437 9466 +29
Misses 784 784 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ports two confirmed hardening fixes from the downstream carve-php fork, on top of the prior always-on attribute/URL hardening baseline. Both code paths in djot-php were verified to match the fixed carve vulnerabilities 1:1.
1. HtmlToDjot round-trip injection
HtmlToDjotre-emitted the round-trip attributesdata-djot-src(block, viaextractRoundTripSource()) anddata-djot-raw(inline, viaprocessRawInline()) verbatim as raw Djot. These attributes carry literal Djot source, so untrusted HTML could set e.g.and have a raw-HTML block (
`...`{=html}) injected into the converted output, becoming live markup once rendered. The reverse converter is explicitly not a sanitizer, but silently honoring an attacker-controlled attribute as raw output is an injection sink.Change
new HtmlToDjot(trustedRoundTrip: true), intended only for trusted, djot-produced HTML (faithful round-trips).TabsExtensionround-trip generators, which consume djot's own rendered HTML, use the trusted converter, so existing round-trip fidelity is preserved.data-djot-srcblock renders normally, adata-djot-rawspan falls through to ordinary span processing (thedata-djot-*attribute is already dropped bygetElementAttributes).This is the importer follow-up that the prior always-on hardening change deferred.
2. Abbreviation output-amplification DoS
Each abbreviation occurrence re-emitted its full definition (the
<abbr title>in HTML, the inline<abbr>in Markdown, the(definition)suffix in ANSI). A tiny source such as*[HT]: <50 KB of text>followed by manyHToccurrences expanded todefinition_len * occurrence_countbytes (hundreds of MB), a RAM-exhaustion DoS.Change
max(1MB, 8 * sourceLength), bounds the cumulative abbreviation expansion across the HTML, Markdown and ANSI renderers (newAbbreviationBudgetTrait). Once the budget would be exceeded, that occurrence and every later one degrade to plain key text (no<abbr>wrapper, no title). The budget sits far above any real document, so normal output is unchanged.Documentnow carries the parsed source byte length (getSourceLength()/setSourceLength()), set inDjotConverter::parse(). Documents built directly default to 0, which falls back to the 1MB floor.<abbr>, so a&/<in a degraded key stays literal.Compatibility
trustedRoundTrip: true. Existing round-trip-fidelity tests were updated to use the trusted converter explicitly.Tests
New
HtmlToDjotRoundTripSecurityTestandAbbreviationBudgetTestcover both default-ignore / opt-in trust and the expansion bound (including the Markdown degraded-escaping case).