Skip to content

security: ignore round-trip raw attributes on untrusted HTML import; bound abbreviation expansion#254

Merged
dereuromark merged 1 commit into
masterfrom
security/importer-injection-abbr-dos
Jun 26, 2026
Merged

security: ignore round-trip raw attributes on untrusted HTML import; bound abbreviation expansion#254
dereuromark merged 1 commit into
masterfrom
security/importer-injection-abbr-dos

Conversation

@dereuromark

Copy link
Copy Markdown
Contributor

Ports two confirmed hardening fixes from the downstream carve-php fork, on top of the prior always-on attribute/URL hardening baseline. Both code paths in djot-php were verified to match the fixed carve vulnerabilities 1:1.

1. HtmlToDjot round-trip injection

HtmlToDjot re-emitted the round-trip attributes data-djot-src (block, via extractRoundTripSource()) and data-djot-raw (inline, via processRawInline()) verbatim as raw Djot. These attributes carry literal Djot source, so untrusted HTML could set e.g.

<pre data-djot-src="`&lt;script&gt;...&lt;/script&gt;`{=html}">x</pre>

and have a raw-HTML block (`...`{=html}) injected into the converted output, becoming live markup once rendered. The reverse converter is explicitly not a sanitizer, but silently honoring an attacker-controlled attribute as raw output is an injection sink.

Change

  • Both round-trip attributes are now ignored by default. Honoring them is opt-in via new HtmlToDjot(trustedRoundTrip: true), intended only for trusted, djot-produced HTML (faithful round-trips).
  • The trust flag is propagated through the recursive inline-fragment sub-conversion.
  • The internal TabsExtension round-trip generators, which consume djot's own rendered HTML, use the trusted converter, so existing round-trip fidelity is preserved.
  • Untrusted input degrades safely: a data-djot-src block renders normally, a data-djot-raw span falls through to ordinary span processing (the data-djot-* attribute is already dropped by getElementAttributes).

This is the importer follow-up that the prior always-on hardening change deferred.

2. Abbreviation output-amplification DoS

Each abbreviation occurrence re-emitted its full definition (the <abbr title> in HTML, the inline <abbr> in Markdown, the (definition) suffix in ANSI). A tiny source such as *[HT]: <50 KB of text> followed by many HT occurrences expanded to definition_len * occurrence_count bytes (hundreds of MB), a RAM-exhaustion DoS.

Change

  • A per-render byte budget, max(1MB, 8 * sourceLength), bounds the cumulative abbreviation expansion across the HTML, Markdown and ANSI renderers (new AbbreviationBudgetTrait). Once the budget would be exceeded, that occurrence and every later one degrade to plain key text (no <abbr> wrapper, no title). The budget sits far above any real document, so normal output is unchanged.
  • Document now carries the parsed source byte length (getSourceLength() / setSourceLength()), set in DjotConverter::parse(). Documents built directly default to 0, which falls back to the 1MB floor.
  • The Markdown degraded fallback returns the children as ordinary Markdown text rather than the HTML-escaped string used inside <abbr>, so a & / < in a degraded key stays literal.

Compatibility

  • The importer change shifts the default import behavior: round-trip raw attributes from untrusted HTML are no longer honored. Callers that round-trip djot's own output opt in with trustedRoundTrip: true. Existing round-trip-fidelity tests were updated to use the trusted converter explicitly.
  • The abbreviation budget is far above any real document, so normal rendering output is byte-identical.

Tests

New HtmlToDjotRoundTripSecurityTest and AbbreviationBudgetTest cover both default-ignore / opt-in trust and the expansion bound (including the Markdown degraded-escaping case).

…bound abbreviation expansion

Ports two confirmed hardening fixes from the downstream carve-php fork, on top
of the prior always-on attribute/URL hardening baseline.

HtmlToDjot round-trip injection:
- The reverse converter re-emitted the data-djot-src (block) and data-djot-raw
  (inline) round-trip attributes verbatim as raw Djot, so untrusted HTML could
  smuggle a raw-HTML block and inject live markup (e.g. a script element) into
  the converted output. Both attributes are now ignored by default; honoring
  them is opt-in via new HtmlToDjot(trustedRoundTrip: true) and propagated
  through the recursive inline sub-conversion. The internal Tabs round-trip
  generators, which consume djot's own rendered HTML, use the trusted converter
  so faithful round-trips are preserved.

Abbreviation output-amplification DoS:
- Each abbreviation occurrence re-emitted its full definition, so a tiny source
  with a huge definition and many occurrences expanded to definition_len times
  occurrence_count bytes (a RAM-exhaustion DoS). A per-render byte budget,
  max(1MB, 8 * sourceLength), now bounds cumulative expansion across the HTML,
  Markdown and ANSI renderers; once the budget is exceeded an occurrence
  degrades to plain key text. Document carries the source length, set in
  DjotConverter::parse().

Adds HtmlToDjotRoundTripSecurityTest and AbbreviationBudgetTest; existing
round-trip-fidelity tests use the trusted converter explicitly.
@dereuromark dereuromark added the bug Something isn't working label Jun 26, 2026
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.29730% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 92.35%. Comparing base (f066e5f) to head (98438a7).

Files with missing lines Patch % Lines
src/Renderer/AnsiRenderer.php 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #254      +/-   ##
============================================
+ Coverage     92.32%   92.35%   +0.02%     
- Complexity     3617     3628      +11     
============================================
  Files           107      108       +1     
  Lines         10221    10250      +29     
============================================
+ Hits           9437     9466      +29     
  Misses          784      784              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dereuromark dereuromark merged commit ff46886 into master Jun 26, 2026
6 checks passed
@dereuromark dereuromark deleted the security/importer-injection-abbr-dos branch June 26, 2026 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant