PHP Version
8.4
Version
2.1.0
Bug Description
DOMParser::setDocument() calls DOMDocument::loadHTML() to parse HTML content. The behavior of loadHTML() differs between libxml versions when processing plain text (non-HTML) input:
- libxml 2.9.x (Ubuntu 22.04/CI): auto-wraps bare text in
<p> tags (HTML4/SGML behavior)
- libxml 2.10+ (macOS, newer distros): does not wrap bare text in
<p> tags (HTML5 tokenizer)
This means Editor->setContent('Hello world') produces different Tiptap JSON depending on the platform.
Steps to reproduce
$editor = new \Tiptap\Editor;
$editor->setContent('Hello world');
echo $editor->getJSON();
On libxml 2.9.x:
{"type":"doc","content":[{"type":"paragraph","content":[{"type":"text","text":"Hello world"}]}]}
On libxml 2.10+ (e.g. 2.15.x):
{"type":"doc","content":[{"type":"text","text":"Hello world"}]}
The difference is that on older libxml, DOMDocument::loadHTML() wraps bare text inside <body> in a <p> element, while newer libxml (with its HTML5-conformant tokenizer) treats it as a raw text node.
Root cause
In src/Core/DOMParser.php:45:
$this->DOM->loadHTML(
$this->makeValidXMLDocument(
$this->minify($value)
)
);
loadHTML() delegates to libxml2's HTML parser, whose behavior changed starting with libxml 2.10.0 ("The HTML tokenizer now conforms fully to HTML5"). See also libxml2 GitLab issue #414.
Impact
This primarily affects tests that pass plain text to Tiptap-backed fields (e.g. Filament RichEditor's fillForm()). In production, the browser-based Tiptap editor always produces proper HTML with <p> tags, so the issue doesn't surface in normal usage.
However, it breaks the contract that setContent() should produce consistent output regardless of the server environment.
Possible solutions
- Detect plain text input (no HTML tags) and wrap it in
<p> before calling loadHTML()
- Use PHP 8.4's
\Dom\HTMLDocument (which has a built-in HTML5 parser) when available, with DOMDocument as fallback
- Document that
setContent() expects HTML input, not plain text
Environment
- tiptap-php: 2.1.0
- PHP: 8.4
- libxml: 2.15.1 (macOS) / 2.9.14 (Ubuntu 22.04 CI)
Expected Behavior
Editor->setContent() should produce identical Tiptap JSON output regardless of the underlying libxml version. When plain text (without HTML tags) is passed as input, the output should be consistent across all platforms:
$editor = new \Tiptap\Editor;
$editor->setContent('Hello world');
echo $editor->getJSON();
// Should always produce the same result, e.g.:
// {"type":"doc","content":[{"type":"paragraph","content":[{"type":"text","text":"Hello world"}]}]}
The library should either normalize plain text input before passing it to DOMDocument::loadHTML(), or document that setContent() expects HTML input rather than plain text.
Additional Context (Optional)
No response
Dependency Updates
PHP Version
8.4
Version
2.1.0
Bug Description
DOMParser::setDocument()callsDOMDocument::loadHTML()to parse HTML content. The behavior ofloadHTML()differs between libxml versions when processing plain text (non-HTML) input:<p>tags (HTML4/SGML behavior)<p>tags (HTML5 tokenizer)This means
Editor->setContent('Hello world')produces different Tiptap JSON depending on the platform.Steps to reproduce
On libxml 2.9.x:
{"type":"doc","content":[{"type":"paragraph","content":[{"type":"text","text":"Hello world"}]}]}On libxml 2.10+ (e.g. 2.15.x):
{"type":"doc","content":[{"type":"text","text":"Hello world"}]}The difference is that on older libxml,
DOMDocument::loadHTML()wraps bare text inside<body>in a<p>element, while newer libxml (with its HTML5-conformant tokenizer) treats it as a raw text node.Root cause
In
src/Core/DOMParser.php:45:loadHTML()delegates to libxml2's HTML parser, whose behavior changed starting with libxml 2.10.0 ("The HTML tokenizer now conforms fully to HTML5"). See also libxml2 GitLab issue #414.Impact
This primarily affects tests that pass plain text to Tiptap-backed fields (e.g. Filament RichEditor's
fillForm()). In production, the browser-based Tiptap editor always produces proper HTML with<p>tags, so the issue doesn't surface in normal usage.However, it breaks the contract that
setContent()should produce consistent output regardless of the server environment.Possible solutions
<p>before callingloadHTML()\Dom\HTMLDocument(which has a built-in HTML5 parser) when available, withDOMDocumentas fallbacksetContent()expects HTML input, not plain textEnvironment
Expected Behavior
Editor->setContent()should produce identical Tiptap JSON output regardless of the underlying libxml version. When plain text (without HTML tags) is passed as input, the output should be consistent across all platforms:The library should either normalize plain text input before passing it to
DOMDocument::loadHTML(), or document thatsetContent()expects HTML input rather than plain text.Additional Context (Optional)
No response
Dependency Updates