-
Notifications
You must be signed in to change notification settings - Fork 137
fix(paste): detect heading levels from Google Docs styled paragraphs #2178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
167f5aa
fbc19c7
ab90150
0ec3a54
11c24d3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,6 +5,18 @@ import { createSingleItemList } from '../html/html-helpers.js'; | |
| import { getLvlTextForGoogleList, googleNumDefMap } from '../../helpers/pasteListHelpers.js'; | ||
| import { wrapTextsInRuns } from '../docx-paste/docx-paste.js'; | ||
|
|
||
| // Match Google Docs default heading sizes (H1=20pt, H2=18pt, H3=14pt, H4=12pt, H5=11pt). | ||
| // Descending order so oversized fonts (e.g. 24pt) still resolve to closest heading. | ||
| const headingSizeMap = [ | ||
| { minPt: 20, tag: 'h1' }, | ||
| { minPt: 16, tag: 'h2' }, | ||
| { minPt: 14, tag: 'h3' }, | ||
| { minPt: 12, tag: 'h4' }, | ||
| { minPt: 10, tag: 'h5' }, | ||
| ]; | ||
|
|
||
| const boldWeightRegex = /^(bold|700|800|900)$/i; | ||
|
|
||
| /** | ||
| * Main handler for pasted Google Docs content. | ||
| * | ||
|
|
@@ -21,7 +33,9 @@ export const handleGoogleDocsHtml = (html, editor, view) => { | |
| const tempDiv = document.createElement('div'); | ||
| tempDiv.innerHTML = cleanedHtml; | ||
|
|
||
| const htmlWithMergedLists = mergeSeparateLists(tempDiv); | ||
| const tempDivWithHeadings = convertStyledHeadings(tempDiv); | ||
|
|
||
| const htmlWithMergedLists = mergeSeparateLists(tempDivWithHeadings); | ||
| const flattenHtml = flattenListsInHtml(htmlWithMergedLists, editor); | ||
|
|
||
| let doc = DOMParser.fromSchema(editor.schema).parse(flattenHtml); | ||
|
|
@@ -253,3 +267,88 @@ function buildListPath(level, map) { | |
| } | ||
| return path; | ||
| } | ||
|
|
||
| /** | ||
| * Converts Google Docs styled <p> elements that represent headings into proper | ||
| * <h1>–<h5> tags before ProseMirror parsing. | ||
| * | ||
| * Google Docs converts heading levels to <p> tags with inline font-size / | ||
| * font-weight styling instead of semantic heading tags. This function detects | ||
| * that pattern and replaces the elements in-place. | ||
| * | ||
| * @param {HTMLElement} container | ||
| */ | ||
| function convertStyledHeadings(container) { | ||
| const paragraphs = Array.from(container.querySelectorAll('p')).filter((p) => !p.closest('li')); | ||
|
|
||
| paragraphs.forEach((p) => { | ||
| const { fontSize, isBold } = getHeadingStyleProps(p); | ||
| if (!isBold || fontSize === null) return; | ||
|
|
||
| const match = headingSizeMap.find(({ minPt }) => fontSize >= minPt); | ||
| if (!match) return; | ||
|
|
||
| const heading = document.createElement(match.tag); | ||
| heading.innerHTML = p.innerHTML; | ||
| Array.from(p.attributes).forEach((attr) => heading.setAttribute(attr.name, attr.value)); | ||
| p.replaceWith(heading); | ||
| }); | ||
|
|
||
| return container; | ||
| } | ||
|
|
||
| /** | ||
| * Reads font-size (in pt) and bold status from an element's inline style. | ||
| * When font-size is on the root element, bold is accepted from the root or | ||
| * all child spans. When font-size is only on child spans, all spans must | ||
| * share the same size, and bold is from the root or all child spans. | ||
| * | ||
| * @param {HTMLElement} el | ||
| * @returns {{ fontSize: number|null, isBold: boolean }} | ||
| */ | ||
| function getHeadingStyleProps(el) { | ||
| const elFontSize = parsePtValue(el.style.fontSize); | ||
| const elIsBold = boldWeightRegex.test(el.style.fontWeight || ''); | ||
| const spans = Array.from(el.querySelectorAll('span')); | ||
| const spanIsBold = (span) => boldWeightRegex.test(span.style.fontWeight || ''); | ||
| const notHeading = { fontSize: null, isBold: false }; | ||
|
|
||
| // font-size declared on root element: bold from itself or if all child spans are bold | ||
| const fromElement = () => { | ||
| const isBold = elIsBold || (spans.length > 0 && spans.every(spanIsBold)); | ||
| return { fontSize: elFontSize, isBold }; | ||
| }; | ||
|
|
||
| // font-size only on child spans: all must be same size, then bold from root or all spans | ||
| const fromSpans = () => { | ||
| // no span children, size is indeterminate | ||
| if (spans.length === 0) return notHeading; | ||
|
|
||
| // if not all spans declare a font-size, not a heading | ||
| const sizes = spans.map((span) => parsePtValue(span.style.fontSize)); | ||
| if (sizes.some((size) => size === null)) return notHeading; | ||
|
|
||
| // if inconsistent sizes, mixed body text, not a heading | ||
| const [firstSpanSize] = sizes; | ||
| if (sizes.some((size) => size !== firstSpanSize)) return notHeading; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these two checks don't have tests yet. if they broke, paragraphs with mixed spans would quietly turn into headings. two quick cases to add: <!-- one span has no font-size — should stay as <p> -->
<p><span style="font-size:20pt;font-weight:700">A</span><span style="font-weight:700">B</span></p>
<!-- spans have different sizes — should stay as <p> -->
<p><span style="font-size:20pt;font-weight:700">A</span><span style="font-size:14pt;font-weight:700">B</span></p>
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, more coverage makes perfect sense. Thank you for the patient review, I've sent the new tests. |
||
|
|
||
| // otherwise, first span size, and root element or all spans bold | ||
| const isBold = elIsBold || spans.every(spanIsBold); | ||
| return { fontSize: firstSpanSize, isBold }; | ||
| }; | ||
|
|
||
| return elFontSize !== null ? fromElement() : fromSpans(); | ||
| } | ||
|
|
||
| /** | ||
| * Parses a CSS font-size value in pt units, e.g. "20pt" → 20. Returns null | ||
| * for any other format. | ||
| * | ||
| * @param {string|undefined} cssValue | ||
| * @returns {number|null} | ||
| */ | ||
| function parsePtValue(cssValue) { | ||
| if (!cssValue) return null; | ||
| const m = cssValue.match(/^([\d.]+)pt$/i); | ||
| return m ? parseFloat(m[1]) : null; | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Google Docs clipboard already uses
<h1>–<h6>for headings set via the styles dropdown — ProseMirror parses these natively. This conversion only runs on<p>elements (normal text), so bold body text at 11pt gets wrongly turned into h5.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @caio-pizzol, I want to make sure I'm not missing something here, because this seems to contradict the issue #2152 itself.
The issue description states:
That's exactly the pattern this PR detects. So could you clarify? Did something change in how Google Docs serializes to clipboard, or was the issue description inaccurate?
If Google Docs already outputs semantic heading tags upfront, then the issue itself would be invalid and need to be closed, but the repro steps suggest otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ErickPetru, you're right to question it - the issue description is wrong, and that's on us for not verifying the assumption before writing it up.
I just tested both paste flows with debug logging on the clipboard HTML:
<h1>–<h6>tags, 2<p>elements (body text)<h1>–<h6>tags, 5<p>elements (body text)Both sources already output proper heading tags in the clipboard. ProseMirror handles them natively — no conversion needed.
The
convertStyledHeadingsfunction only operates on<p>elements, which in practice are body text. That's why bold text at 11pt (Google Docs' default body size) gets incorrectly promoted to h5.Sorry for the confusion on this - but appreciate your patience working through the reviews.
If you're gained, there are other
good first issuehere - feel free to pick any from the list (I will also double check if we might have any there are misinterpreted