Skip to content
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,17 @@ import { createSingleItemList } from '../html/html-helpers.js';
import { getLvlTextForGoogleList, googleNumDefMap } from '../../helpers/pasteListHelpers.js';
import { wrapTextsInRuns } from '../docx-paste/docx-paste.js';

// Ordered largest → smallest; first match wins.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a comment saying these numbers come from Google Docs default heading sizes (H1=20pt, H2=16pt, etc.) would help — not obvious where they came from otherwise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion; what's obvious to one developer might not be to another.

const headingSizeMap = [
{ minPt: 20, tag: 'h1' },
{ minPt: 16, tag: 'h2' },
{ minPt: 14, tag: 'h3' },
{ minPt: 12, tag: 'h4' },
{ minPt: 10, tag: 'h5' },
];

const boldWeightRegex = /^(bold|700|800|900)$/i;

/**
* Main handler for pasted Google Docs content.
*
Expand All @@ -21,7 +32,9 @@ export const handleGoogleDocsHtml = (html, editor, view) => {
const tempDiv = document.createElement('div');
tempDiv.innerHTML = cleanedHtml;

const htmlWithMergedLists = mergeSeparateLists(tempDiv);
const tempDivWithHeadings = convertStyledHeadings(tempDiv);

const htmlWithMergedLists = mergeSeparateLists(tempDivWithHeadings);
Comment on lines +36 to +38
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google Docs clipboard already uses <h1><h6> for headings set via the styles dropdown — ProseMirror parses these natively. This conversion only runs on <p> elements (normal text), so bold body text at 11pt gets wrongly turned into h5.

Suggested change
const tempDivWithHeadings = convertStyledHeadings(tempDiv);
const htmlWithMergedLists = mergeSeparateLists(tempDivWithHeadings);
// Google Docs already outputs semantic <h1>–<h6> for headings set via
// the paragraph style dropdown — ProseMirror handles them natively.
const htmlWithMergedLists = mergeSeparateLists(tempDiv);

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @caio-pizzol, I want to make sure I'm not missing something here, because this seems to contradict the issue #2152 itself.

The issue description states:

Google Docs converts most heading levels to <p> tags with inline font-size/font-weight styling instead of semantic <h1>-<h6> tags. The paragraph.parseDOM has rules for h1-h6 but they never fire for these styled <p> elements.

That's exactly the pattern this PR detects. So could you clarify? Did something change in how Google Docs serializes to clipboard, or was the issue description inaccurate?

If Google Docs already outputs semantic heading tags upfront, then the issue itself would be invalid and need to be closed, but the repro steps suggest otherwise.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ErickPetru, you're right to question it - the issue description is wrong, and that's on us for not verifying the assumption before writing it up.

I just tested both paste flows with debug logging on the clipboard HTML:

  • Google Docs into SuperDoc: 6 semantic <h1>–<h6> tags, 2 <p> elements (body text)
  • Word into SuperDoc: 6 semantic <h1>–<h6> tags, 5 <p> elements (body text)

Both sources already output proper heading tags in the clipboard. ProseMirror handles them natively — no conversion needed.

The convertStyledHeadings function only operates on <p> elements, which in practice are body text. That's why bold text at 11pt (Google Docs' default body size) gets incorrectly promoted to h5.

Sorry for the confusion on this - but appreciate your patience working through the reviews.

If you're gained, there are other good first issue here - feel free to pick any from the list (I will also double check if we might have any there are misinterpreted

const flattenHtml = flattenListsInHtml(htmlWithMergedLists, editor);

let doc = DOMParser.fromSchema(editor.schema).parse(flattenHtml);
Expand Down Expand Up @@ -253,3 +266,68 @@ function buildListPath(level, map) {
}
return path;
}

/**
* Converts Google Docs styled <p> elements that represent headings into proper
* <h1>–<h5> tags before ProseMirror parsing.
*
* Google Docs converts heading levels to <p> tags with inline font-size /
* font-weight styling instead of semantic heading tags. This function detects
* that pattern and replaces the elements in-place.
*
* @param {HTMLElement} container
*/
function convertStyledHeadings(container) {
const paragraphs = Array.from(container.querySelectorAll('p')).filter(
(p) => p.parentElement?.tagName?.toLowerCase() !== 'li',
);

paragraphs.forEach((p) => {
const { fontSize, isBold } = getHeadingStyleProps(p);
if (!isBold || fontSize === null) return;

const match = headingSizeMap.find(({ minPt }) => fontSize >= minPt);
if (!match) return;

const heading = document.createElement(match.tag);
heading.innerHTML = p.innerHTML;
Array.from(p.attributes).forEach((attr) => heading.setAttribute(attr.name, attr.value));
p.replaceWith(heading);
});

return container;
}

/**
* Reads font-size (in pt) and bold status from an element's inline style.
* Checks both the element itself and its first child <span> to cover both
* Google Docs style placements (style on <p> vs. style on inner <span>).
*
* @param {HTMLElement} el
* @returns {{ fontSize: number|null, isBold: boolean }}
*/
function getHeadingStyleProps(el) {
const fontSize = parsePtValue(el.style.fontSize);
const isBoldOnEl = boldWeightRegex.test(el.style.fontWeight || '');

const { children } = el;
const singleSpan = children.length === 1 && children[0].tagName?.toLowerCase() === 'span' ? children[0] : null;

return {
fontSize: fontSize ?? parsePtValue(singleSpan?.style.fontSize),
isBold: isBoldOnEl || boldWeightRegex.test(singleSpan?.style.fontWeight || ''),
};
}

/**
* Parses a CSS font-size value in pt units, e.g. "20pt" → 20. Returns null
* for any other format.
*
* @param {string|undefined} cssValue
* @returns {number|null}
*/
function parsePtValue(cssValue) {
if (!cssValue) return null;
const m = cssValue.match(/^([\d.]+)pt$/i);
return m ? parseFloat(m[1]) : null;
}
Original file line number Diff line number Diff line change
Expand Up @@ -95,4 +95,102 @@ describe('handleGoogleDocsHtml', () => {
expect(replaceSelectionWith).toHaveBeenCalledWith(parseResult, true);
expect(dispatch).toHaveBeenCalledWith('next');
});

describe('convertStyledHeadings', () => {
function makeEditor(dispatch, replaceSelectionWith) {
return {
editor: { schema: {}, view: { dispatch }, options: {} },
view: { state: { tr: { replaceSelectionWith } } },
};
}

function parseHeadings(html) {
const dispatch = vi.fn();
const replaceSelectionWith = vi.fn(() => 'next');
const { editor, view } = makeEditor(dispatch, replaceSelectionWith);
handleGoogleDocsHtml(html, editor, view);
return parseSpy.mock.calls[0][0];
}

it('converts bold <p> with large font-size to heading tags', () => {
const html = `
<p style="font-size:20pt;font-weight:700">Heading 1</p>
<p style="font-size:16pt;font-weight:bold">Heading 2</p>
<p style="font-size:14pt;font-weight:700">Heading 3</p>
<p style="font-size:12pt;font-weight:700">Heading 4</p>
<p style="font-size:11pt;font-weight:700">Heading 5</p>
`;
const dom = parseHeadings(html);
expect(dom.querySelector('h1')?.textContent?.trim()).toBe('Heading 1');
expect(dom.querySelector('h2')?.textContent?.trim()).toBe('Heading 2');
expect(dom.querySelector('h3')?.textContent?.trim()).toBe('Heading 3');
expect(dom.querySelector('h4')?.textContent?.trim()).toBe('Heading 4');
expect(dom.querySelector('h5')?.textContent?.trim()).toBe('Heading 5');
});

it('converts when style is on a child <span> instead of the <p>', () => {
const html = `
<p><span style="font-size:20pt;font-weight:700">Heading from span</span></p>
`;
const dom = parseHeadings(html);
expect(dom.querySelector('h1')?.textContent?.trim()).toBe('Heading from span');
expect(dom.querySelector('p')).toBeNull();
});

it('does not convert non-bold paragraphs', () => {
const html = `<p style="font-size:20pt">Not a heading</p>`;
const dom = parseHeadings(html);
expect(dom.querySelector('h1')).toBeNull();
expect(dom.querySelector('p')?.textContent?.trim()).toBe('Not a heading');
});

it('does not convert bold paragraphs with small font-size', () => {
const html = `<p style="font-size:9pt;font-weight:700">Small bold</p>`;
const dom = parseHeadings(html);
expect(dom.querySelector('h1,h2,h3,h4,h5')).toBeNull();
});

it('handles large font-sizes from alternate Google Docs themes (e.g. 24pt → h1)', () => {
const html = `<p style="font-size:24pt;font-weight:700">Big Heading</p>`;
const dom = parseHeadings(html);
expect(dom.querySelector('h1')?.textContent?.trim()).toBe('Big Heading');
});

it('does not convert a paragraph where only the first of multiple spans is bold', () => {
// Body paragraph with a bold opening word — must not become a heading.
const html = `
<p>
<span style="font-size:11pt;font-weight:700">Bold word</span>
<span style="font-size:11pt;">rest of text</span>
</p>
`;
const dom = parseHeadings(html);
expect(dom.querySelector('h1,h2,h3,h4,h5')).toBeNull();
});

it('does not convert <p> elements inside <li> to avoid corrupting list structure', () => {
const html = `
<ul>
<li><p style="font-size:20pt;font-weight:700">List item</p></li>
</ul>
`;
const dom = parseHeadings(html);
expect(dom.querySelector('h1')).toBeNull();
expect(dom.querySelector('p[data-num-id]')).not.toBeNull();
});

it('converts when font-size is on <p> but font-weight is only on the child <span>', () => {
const html = `
<p style="font-size:20pt"><span style="font-weight:700">Split style heading</span></p>
`;
const dom = parseHeadings(html);
expect(dom.querySelector('h1')?.textContent?.trim()).toBe('Split style heading');
});

it('preserves attributes from the original <p> on the new heading element', () => {
const html = `<p style="font-size:20pt;font-weight:700" data-custom="yes">With attr</p>`;
const dom = parseHeadings(html);
expect(dom.querySelector('h1')?.getAttribute('data-custom')).toBe('yes');
});
});
});
Loading