Skip to content

Commit 2407e93

Browse files
authored
fix(super-editor): stop default-rPr injection + normalize pgMar twips on round-trip (SD-2912) (#3237)
* fix(super-editor): stop injecting bCs/iCs/highlight defaults on round-trip (SD-2912) The customer fixture round-tripped to a document.xml ~37KB larger than the source, with 822 occurrences each of `<w:bCs/>`, `<w:iCs/>` and `<w:highlight w:val="none"/>` injected into runs whose source rPr contained none of them. Word treats explicit-off and absence as identical so the rendering was unchanged, but every consumer reading the XML saw thousands of "intentional" toggles that weren't real. Two changes: - `decodeRPrFromMarks` no longer auto-propagates `boldCs`/`italicCs` from the latin bold/italic mark. In OOXML these are independent properties (ECMA-376 §17.3.2). The propagation was emitting a complex-script companion element on every run that touched a bold or italic mark, even when the source rPr had no `<w:bCs/>` or `<w:iCs/>`. - The highlight case in the same function no longer emits an explicit `<w:highlight w:val="none"/>` for the transparent mark. That mark is synthesized from `<w:shd val="clear" fill="auto"/>` shading on import and the shading itself round-trips independently via its own element. To keep round-trip lossless for documents that genuinely carry `<w:bCs/>` or `<w:iCs/>` (which were previously preserved as a side effect of the auto-propagation), `boldCs` and `italicCs` are removed from `RUN_PROPERTIES_DERIVED_FROM_MARKS` in calculateInlineRunPropertiesPlugin. That moves them to the "preserve from existing runProperties" branch in `getInlineRunProperties`, so the values captured at import time survive appendTransactions verbatim. Verified on the SD-2912 fixture: - bCs: input=0 → before=822 → after=0 - iCs: input=0 → before=822 → after=0 - highlight: input=0 → before=822 → after=0 - document.xml: 1.13 MB → 1.08 MB (smaller than the 1.09 MB input) - All 12714 super-editor tests pass. * fix(super-editor): normalize pgMar attrs to integer twips on export (SD-2912) Customer ask: ECMA-376 §17.6.11 requires `ST_TwipsMeasure` values on <w:pgMar> to be non-negative whole numbers when expressed as raw twips. Athena Intelligence's source DOCX carries float-valued twips like `w:top="168.160400390625"` from an upstream pipeline. SuperDoc was preserving those floats verbatim on the paragraph-level sectPr passthrough (56 of 57 pgMar elements on the customer fixture). Strict consumers reject the result as schema-invalid. The body sectPr already produced integer twips (via the `pageMargins` → `inchesToTwips` write path); paragraph-level sectPrs went through the `savedTagsToRestore` passthrough that preserved source attrs verbatim, so floats survived. Fix is a single export-time normalization pass: `normalizePgMarTwipsInTree` walks the final XML JSON tree and rounds every numeric pgMar attribute to an integer via `Math.round`. Applied once at the export entry point in `translateDocumentNode`. Idempotent. Why a single pass instead of fixing each import or write site: - catches every path (current and future) that produces pgMar elements - one helper, one call site, easy to revert - doesn't touch import — `pageStyles.pageMargins` still in inches as before Verified on customer fixture: pgMar element count: 57 → 57 (preserved) decimal-valued pgMar attrs: 2549 → 0 customer-cited example: `w:top="168.160400390625"` → `w:top="168"` Tests: - 9 helper unit tests covering AAA scenarios (undefined/null input, no-pgMar tree, single pgMar, nested sectPrs, idempotency, integer-only no-op, non-numeric ignored, no cross-element bleed) - 3 extended integration tests: every exported pgMar attr is integer, raw XML has zero decimal-valued pgMar attrs, source fixture confirmed to carry decimals (otherwise the assertion would be vacuous) Full super-editor suite: 12 726 passed (+12), 13 skipped, 0 failed. * fix(super-editor): preserve highlight clears and cs formatting * fix(super-editor): cover review export regressions
1 parent ec5db77 commit 2407e93

17 files changed

Lines changed: 582 additions & 40 deletions

packages/super-editor/src/editors/v1/core/helpers/getMarksFromSelection.test.js

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -451,7 +451,8 @@ describe('getMarksFromSelection', () => {
451451

452452
const result = getSelectionFormattingState(cursorState);
453453

454-
expect(result.inlineRunProperties).toEqual({ bold: true, boldCs: true });
454+
// SD-2912: `boldCs` is no longer auto-propagated from the bold mark.
455+
expect(result.inlineRunProperties).toEqual({ bold: true });
455456
expect(result.inlineMarks.some((mark) => mark.type.name === 'bold')).toBe(true);
456457
expect(result.resolvedMarks.some((mark) => mark.type.name === 'bold')).toBe(true);
457458
});

packages/super-editor/src/editors/v1/core/helpers/syncParagraphRunProperties.test.js

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,8 @@ describe('syncParagraphRunProperties', () => {
5050
italic: true,
5151
styleId: 'Heading1Char',
5252
bold: true,
53-
boldCs: true,
53+
// SD-2912: `boldCs` is no longer auto-propagated from the bold mark — see
54+
// `decodeRPrFromMarks` in super-converter/styles.js.
5455
});
5556
});
5657

packages/super-editor/src/editors/v1/core/super-converter/exporter.js

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,44 @@ export const ensureSectionLayoutDefaults = (sectPr, converter) => {
106106
return sectPr;
107107
};
108108

109+
/**
110+
* Walk an XML JSON tree in place and normalize every `<w:pgMar>` element's
111+
* numeric attributes to integer twips.
112+
*
113+
* ECMA-376 §17.6.11 requires `ST_TwipsMeasure` values to be non-negative whole
114+
* numbers when expressed as raw twips, but some authoring pipelines emit
115+
* float-valued twips like `w:top="168.160400390625"` that strict consumers
116+
* reject. SuperDoc preserves those floats verbatim on the paragraph-level
117+
* sectPr passthrough path. This pass is the single normalization point —
118+
* applied once at the export root — so we produce schema-valid output
119+
* regardless of which path the pgMar values reached the tree through
120+
* (body sectPr → `pageMargins` → `inchesToTwips`, or paragraph-level
121+
* passthrough sectPr that preserved source attrs).
122+
*
123+
* Idempotent. Mutates the tree in place; returns nothing. SD-2912.
124+
*
125+
* @param {{ name?: string, attributes?: Record<string, unknown>, elements?: Array<unknown> } | null | undefined} node
126+
* @returns {void}
127+
*/
128+
export const normalizePgMarTwipsInTree = (node) => {
129+
if (!node || typeof node !== 'object') return;
130+
if (node.name === 'w:pgMar' && node.attributes && typeof node.attributes === 'object') {
131+
for (const key of Object.keys(node.attributes)) {
132+
const value = node.attributes[key];
133+
if (value == null) continue;
134+
const serialized = String(value).trim();
135+
if (!serialized) continue;
136+
const num = Number(serialized);
137+
if (Number.isFinite(num) && !/^-?\d+$/.test(serialized)) {
138+
node.attributes[key] = String(Math.round(num));
139+
}
140+
}
141+
}
142+
if (Array.isArray(node.elements)) {
143+
for (const child of node.elements) normalizePgMarTwipsInTree(child);
144+
}
145+
};
146+
109147
export const isLineBreakOnlyRun = (node) => {
110148
if (!node) return false;
111149
if (node.type === 'lineBreak' || node.type === 'hardBreak') return true;
@@ -402,6 +440,12 @@ function translateDocumentNode(params) {
402440
attributes,
403441
};
404442

443+
// SD-2912: normalize every <w:pgMar> in the final tree to integer twips,
444+
// catching both the body sectPr path (already integer-correct via
445+
// inchesToTwips) and the paragraph-level passthrough sectPr path that
446+
// preserves source attrs verbatim.
447+
normalizePgMarTwipsInTree(node);
448+
405449
return [node, params];
406450
}
407451

@@ -568,6 +612,9 @@ function translateMark(mark) {
568612
break;
569613
case 'highlight': {
570614
const highlightValue = attrs.color ?? attrs.highlight ?? null;
615+
if (String(highlightValue).trim().toLowerCase() === 'transparent' && !attrs.ooxmlHighlightClear) {
616+
return {};
617+
}
571618
const translated = wHighlightTranslator.decode({ node: { attrs: { highlight: highlightValue } } });
572619
return translated || {};
573620
}
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
import { describe, it, expect } from 'vitest';
2+
import { normalizePgMarTwipsInTree, processOutputMarks } from './exporter.js';
3+
4+
// SD-2912: <w:pgMar> attributes must be integer twips per ECMA-376 §17.6.11
5+
// (ST_TwipsMeasure). Some documents carry float-valued twips like
6+
// `w:top="168.160400390625"` that pass through the import → export pipeline
7+
// verbatim on paragraph-level sectPr passthrough; strict consumers reject the
8+
// result. This helper is the single normalization point: it walks the export
9+
// XML JSON tree and rounds every numeric pgMar attribute to an integer.
10+
11+
describe('normalizePgMarTwipsInTree', () => {
12+
it('does not throw on undefined input', () => {
13+
expect(() => normalizePgMarTwipsInTree(undefined)).not.toThrow();
14+
});
15+
16+
it('does not throw on null input', () => {
17+
expect(() => normalizePgMarTwipsInTree(null)).not.toThrow();
18+
});
19+
20+
it('leaves a tree without any w:pgMar element unchanged', () => {
21+
const tree = {
22+
name: 'w:document',
23+
elements: [{ name: 'w:body', elements: [{ name: 'w:p', elements: [] }] }],
24+
};
25+
const before = JSON.stringify(tree);
26+
normalizePgMarTwipsInTree(tree);
27+
expect(JSON.stringify(tree)).toBe(before);
28+
});
29+
30+
it('rounds a single float pgMar attribute to an integer twips value', () => {
31+
const tree = { name: 'w:pgMar', attributes: { 'w:top': '168.160400390625' } };
32+
normalizePgMarTwipsInTree(tree);
33+
expect(tree.attributes['w:top']).toBe('168');
34+
});
35+
36+
it('rounds every numeric pgMar attribute, leaves already-integer values exact', () => {
37+
const tree = {
38+
name: 'w:pgMar',
39+
attributes: {
40+
'w:top': '168.160400390625',
41+
'w:bottom': '146.0200023651123',
42+
'w:left': '352.31998443603516',
43+
'w:right': '663.9990234375',
44+
'w:gutter': '0',
45+
'w:header': '720',
46+
},
47+
};
48+
normalizePgMarTwipsInTree(tree);
49+
expect(tree.attributes).toEqual({
50+
'w:top': '168',
51+
'w:bottom': '146',
52+
'w:left': '352',
53+
'w:right': '664',
54+
'w:gutter': '0',
55+
'w:header': '720',
56+
});
57+
});
58+
59+
it('canonicalizes decimal pgMar tokens even when the numeric value is integral', () => {
60+
const tree = {
61+
name: 'w:pgMar',
62+
attributes: {
63+
'w:top': '168.0',
64+
'w:left': '352.000000',
65+
'w:right': '663.9990234375',
66+
'w:header': '720',
67+
},
68+
};
69+
70+
normalizePgMarTwipsInTree(tree);
71+
72+
expect(tree.attributes).toEqual({
73+
'w:top': '168',
74+
'w:left': '352',
75+
'w:right': '664',
76+
'w:header': '720',
77+
});
78+
});
79+
80+
it('walks into nested elements and normalizes pgMar attrs at any depth', () => {
81+
const tree = {
82+
name: 'w:document',
83+
elements: [
84+
{
85+
name: 'w:body',
86+
elements: [
87+
{
88+
name: 'w:p',
89+
elements: [
90+
{
91+
name: 'w:pPr',
92+
elements: [
93+
{
94+
name: 'w:sectPr',
95+
elements: [{ name: 'w:pgMar', attributes: { 'w:top': '146.0200023651123' } }],
96+
},
97+
],
98+
},
99+
],
100+
},
101+
{ name: 'w:sectPr', elements: [{ name: 'w:pgMar', attributes: { 'w:bottom': '352.31998443603516' } }] },
102+
],
103+
},
104+
],
105+
};
106+
normalizePgMarTwipsInTree(tree);
107+
const firstPgMar = tree.elements[0].elements[0].elements[0].elements[0].elements[0];
108+
const secondPgMar = tree.elements[0].elements[1].elements[0];
109+
expect(firstPgMar.attributes['w:top']).toBe('146');
110+
expect(secondPgMar.attributes['w:bottom']).toBe('352');
111+
});
112+
113+
it('is idempotent — re-running on already-normalized values is a no-op', () => {
114+
const tree = { name: 'w:pgMar', attributes: { 'w:top': '168.5' } };
115+
normalizePgMarTwipsInTree(tree);
116+
const afterFirst = { ...tree.attributes };
117+
normalizePgMarTwipsInTree(tree);
118+
expect(tree.attributes).toEqual(afterFirst);
119+
expect(tree.attributes['w:top']).toBe('169');
120+
});
121+
122+
it('ignores non-numeric attribute values (defensive against future OOXML extensions)', () => {
123+
const tree = { name: 'w:pgMar', attributes: { 'w:top': '168', 'w:custom': 'auto' } };
124+
normalizePgMarTwipsInTree(tree);
125+
expect(tree.attributes['w:custom']).toBe('auto');
126+
});
127+
128+
it('does not affect attributes on elements other than w:pgMar', () => {
129+
const tree = {
130+
name: 'w:document',
131+
elements: [
132+
{ name: 'w:pgSz', attributes: { 'w:w': '12240.5', 'w:h': '15840.7' } },
133+
{ name: 'w:pgMar', attributes: { 'w:top': '168.5' } },
134+
],
135+
};
136+
normalizePgMarTwipsInTree(tree);
137+
expect(tree.elements[0].attributes).toEqual({ 'w:w': '12240.5', 'w:h': '15840.7' });
138+
expect(tree.elements[1].attributes['w:top']).toBe('169');
139+
});
140+
});
141+
142+
describe('processOutputMarks highlight clear export', () => {
143+
it('does not emit highlight XML for plain transparent highlight marks', () => {
144+
const outputMarks = processOutputMarks([{ type: 'highlight', attrs: { color: 'transparent' } }]);
145+
146+
expect(outputMarks).toEqual([{}]);
147+
});
148+
149+
it('emits highlight none only for imported explicit highlight clears', () => {
150+
const outputMarks = processOutputMarks([
151+
{ type: 'highlight', attrs: { color: 'transparent', ooxmlHighlightClear: true } },
152+
]);
153+
154+
expect(outputMarks).toEqual([{ name: 'w:highlight', attributes: { 'w:val': 'none' } }]);
155+
});
156+
});

packages/super-editor/src/editors/v1/core/super-converter/styles.js

Lines changed: 23 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,8 @@ export function encodeMarksFromRPr(runProperties, docx) {
4545

4646
const marks = [];
4747
const textStyleAttrs = {};
48-
let highlightColor = null;
48+
/** @type {{ color: string, ooxmlHighlightClear?: boolean } | null} */
49+
let highlightAttrs = null;
4950
let hasHighlightTag = false;
5051
Object.keys(runProperties).forEach((key) => {
5152
const value = runProperties[key];
@@ -134,7 +135,10 @@ export function encodeMarksFromRPr(runProperties, docx) {
134135
const color = getHighLightValue(value);
135136
if (color) {
136137
hasHighlightTag = true;
137-
highlightColor = color;
138+
highlightAttrs = { color };
139+
if (color.toLowerCase() === 'transparent' && String(value?.['w:val']).toLowerCase() === 'none') {
140+
highlightAttrs.ooxmlHighlightClear = true;
141+
}
138142
}
139143
break;
140144
case 'shading': {
@@ -144,11 +148,11 @@ export function encodeMarksFromRPr(runProperties, docx) {
144148
const fill = value['fill'];
145149
const shdVal = value['val'];
146150
if (fill && String(fill).toLowerCase() !== 'auto') {
147-
highlightColor = `#${String(fill).replace('#', '')}`;
151+
highlightAttrs = { color: `#${String(fill).replace('#', '')}` };
148152
} else if (typeof shdVal === 'string') {
149153
const normalized = shdVal.toLowerCase();
150154
if (normalized === 'clear' || normalized === 'nil' || normalized === 'none') {
151-
highlightColor = 'transparent';
155+
highlightAttrs = { color: 'transparent' };
152156
}
153157
}
154158
break;
@@ -175,8 +179,8 @@ export function encodeMarksFromRPr(runProperties, docx) {
175179
marks.push({ type: 'textStyle', attrs: textStyleAttrs });
176180
}
177181

178-
if (highlightColor) {
179-
marks.push({ type: 'highlight', attrs: { color: highlightColor } });
182+
if (highlightAttrs) {
183+
marks.push({ type: 'highlight', attrs: highlightAttrs });
180184
}
181185

182186
return marks;
@@ -535,11 +539,14 @@ export function decodeRPrFromMarks(marks) {
535539
case 'italic':
536540
case 'bold':
537541
runProperties[type] = mark.attrs.value !== '0' && mark.attrs.value !== false;
538-
if (type === 'bold') {
539-
runProperties.boldCs = runProperties.bold;
540-
} else if (type === 'italic') {
541-
runProperties.italicCs = runProperties.italic;
542-
}
542+
// SD-2912: do NOT auto-propagate `boldCs` / `italicCs` from the latin
543+
// bold/italic mark. The complex-script companion is an independent OOXML
544+
// property (ECMA-376 §17.3.2). Auto-propagating it injects elements that
545+
// weren't in the source rPr — every run gets a `<w:bCs/>` regardless of
546+
// whether the original `<w:rPr>` contained one. When the source genuinely
547+
// had `<w:bCs/>`, it round-trips via the run's stored runProperties
548+
// (preserved by the plugin's existing-keys branch — see the matching
549+
// SD-2912 change in `calculateInlineRunPropertiesPlugin.js`).
543550
break;
544551
case 'underline': {
545552
const { underlineType, underlineColor, underlineThemeColor, underlineThemeTint, underlineThemeShade } =
@@ -566,12 +573,13 @@ export function decodeRPrFromMarks(marks) {
566573
break;
567574
}
568575
case 'highlight':
569-
if (mark.attrs.color) {
570-
if (mark.attrs.color.toLowerCase() === 'transparent') {
576+
if (!mark.attrs.color) break;
577+
if (mark.attrs.color.toLowerCase() === 'transparent') {
578+
if (mark.attrs.ooxmlHighlightClear) {
571579
runProperties.highlight = { 'w:val': 'none' };
572-
} else {
573-
runProperties.highlight = { 'w:val': mark.attrs.color };
574580
}
581+
} else {
582+
runProperties.highlight = { 'w:val': mark.attrs.color };
575583
}
576584
break;
577585
case 'link':

0 commit comments

Comments
 (0)