Skip to content

Commit 45c0532

Browse files
authored
fix(comments): one anchor pair per comment id, regardless of paragraph crossings (#3028)
* fix(comments): one anchor pair per comment id, regardless of paragraph crossings Resolving a comment whose mark spans more than one paragraph emitted N range pairs for an N-paragraph comment, producing a non-conformant DOCX with multiple `commentRangeStart` / `commentRangeEnd` / `commentReference` markers sharing the same `w:id`. Per ECMA-376 §17.13.4.3, §17.13.4.4, §17.13.4.5, `w:id` is the unique identifier for an annotation: each annotation produces exactly one start, one end, and one reference marker. Verified against Word output for a comment that spans a paragraph break: Word emits one pair, with the paragraph close sitting inside the range. `getCommentMarkRangesById` previously merged adjacent segments via `seg.from <= active.to`, which fails across paragraph boundaries because PM positions skip the close+open delta of the structural node. All segments returned by `getCommentMarkSegmentsById` belong to the same logical annotation by id (a single `commentMark` applied across the selection); collapse them into one envelope range covering the full extent. Tests: spec-derived suite for single-paragraph, multi-paragraph, three-paragraph, discontinuous, side-by-side, and overlapping nested cases (resolveCommentById emission), plus the canonical plugin-level regression test for the multi-paragraph case. Verified: 164/164 comment tests pass; 12087/12087 super-editor tests pass; runtime end-to-end against the BYO-UI demo (SuperDoc(11).docx → buggy: id=3 had 2/2/2 markers; SuperDoc(12).docx post-fix: id=3 has 1/1/1 markers). * fix(comments): preserve range scope for disjoint same-id segments (review feedback) The first iteration of this fix collapsed every segment carrying the same `commentId` into one envelope range. That handles the multi-paragraph case (the SuperDoc(8) bug) but expands the range scope when PM legitimately stores two non-adjacent regions for the same id — most commonly when a user copy-pastes a commented region. The CommentMark has no `transformPasted` hook, so PM preserves the mark attrs verbatim across paste. Both copies share the same `commentId`, but the user's intent is two annotations on two regions, not one annotation covering everything between them. Distinguish the two cases by walking the doc between consecutive segments: `doc.textBetween(prev.to, seg.from, '', '')` returns the concatenated text of every text leaf in the gap, with empty block separators so paragraph close+open contributes nothing. Empty result → structural boundary only → merge (paragraph-crossing). Non-empty result → real uncommented gap → keep ranges separate. Two range pairs with the same id is still non-conformant per spec (`w:id` should be unique per annotation), and remapping to fresh ids on resolve is the proper long-term fix. Filed as a follow-up. For now, preserving per-region scope is strictly better than collapsing it. Adds the disjoint regression test and tightens the assertions so it pins the layout, not just marker counts. * test(behavior): pin one comment range pair per id end-to-end (SD-3028) Pins the full pipeline (resolveCommentById → prepareCommentsForExport → comment-range translator → word/document.xml) at the export boundary. Builds a multi-segment TextTarget anchored across two paragraph blocks, posts the comment, resolves it, exports to DOCX, and asserts every comment id appears exactly once for each marker type (commentRangeStart / commentRangeEnd / commentReference). Verified the test catches the regression: against the pre-fix getCommentMarkRangesById it fails with 2/2/2 markers for the multi-paragraph id; against the fix it returns 1/1/1. 3/3 browsers green (chromium / firefox / webkit).
1 parent 72c3df5 commit 45c0532

4 files changed

Lines changed: 454 additions & 22 deletions

File tree

packages/super-editor/src/editors/v1/extensions/comment/comments-helpers.js

Lines changed: 73 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -106,9 +106,46 @@ const getCommentMarkSegmentsById = (commentId, doc, importedId) => {
106106
};
107107

108108
/**
109-
* Convert raw mark segments into merged contiguous ranges.
110-
* A single commentId can appear in multiple disjoint ranges (e.g. if content is split),
111-
* so this returns both the raw segments and the merged ranges.
109+
* Collapse raw mark segments for a single comment id into anchor ranges.
110+
*
111+
* Per ECMA-376 §17.13.4.3 / §17.13.4.4 / §17.13.4.5, `w:id` is the
112+
* unique identifier for an annotation, and the start / end / reference
113+
* triplet appears exactly once per id. A multi-paragraph comment is
114+
* still ONE annotation: PM splits it into multiple text-node mark
115+
* segments because the paragraph close + open structural delta sits
116+
* between them, but the OOXML emission must collapse them back into
117+
* a single `(commentRangeStart, commentRangeEnd)` pair covering the
118+
* full extent.
119+
*
120+
* Verified against Word: a comment that crosses a paragraph break
121+
* produces one `<w:commentRangeStart w:id="…"/>` at the first
122+
* commented position and one `<w:commentRangeEnd w:id="…"/>` after
123+
* the last commented position, with the paragraph break sitting
124+
* inside the range.
125+
*
126+
* Two flavors of "multiple segments" need to be told apart:
127+
*
128+
* 1. Paragraph-crossing: segments separated only by a structural
129+
* boundary (paragraph close + open). No uncommented text
130+
* between them. Logical extent is one contiguous range; merge.
131+
*
132+
* 2. Truly disjoint: segments separated by uncommented text. Most
133+
* common cause: a user copy-pasted commented content into a new
134+
* location; PM preserves the `commentMark` attrs (the mark has
135+
* no clipboard hook), so the same `commentId` ends up on
136+
* anchored regions that have unrelated content between them.
137+
* The two regions are logically two annotations that happen to
138+
* share an id; merging them into one envelope would expand the
139+
* comment's scope to cover the unrelated content. Keep them as
140+
* separate ranges instead — the resulting OOXML still has a
141+
* duplicate id (which a follow-up should remap to fresh ids),
142+
* but the per-range scope is preserved correctly.
143+
*
144+
* The previous adjacency-based merge (`seg.from <= active.to`)
145+
* conflated paragraph-crossing with disjoint and produced N pairs
146+
* for an N-paragraph contiguous comment. The fix walks the doc
147+
* between consecutive segments and merges only when the gap carries
148+
* no text content.
112149
*
113150
* @param {string} commentId The comment ID to match
114151
* @param {string} [importedId] The imported comment ID to match
@@ -119,33 +156,48 @@ const getCommentMarkRangesById = (commentId, doc, importedId) => {
119156
const segments = getCommentMarkSegmentsById(commentId, doc, importedId);
120157
if (!segments.length) return { segments, ranges: [] };
121158

159+
// Walk segments in document order, merging adjacent ones whenever
160+
// the gap between them carries no text content. PM's `textBetween`
161+
// walks every text leaf in the range and concatenates the text
162+
// (block separators omitted by passing empty strings), so a
163+
// paragraph close + open contributes nothing and a paragraph of
164+
// uncommented text contributes its full content.
165+
const sorted = [...segments].sort((a, b) => a.from - b.from);
122166
const ranges = [];
123-
let active = null;
124-
125-
segments.forEach((seg) => {
126-
if (!active) {
127-
active = {
128-
from: seg.from,
129-
to: seg.to,
130-
internal: !!seg.attrs?.internal,
131-
};
132-
return;
133-
}
134-
167+
let active = {
168+
from: sorted[0].from,
169+
to: sorted[0].to,
170+
internal: !!sorted[0].attrs?.internal,
171+
};
172+
for (let i = 1; i < sorted.length; i += 1) {
173+
const seg = sorted[i];
135174
if (seg.from <= active.to) {
136-
active.to = Math.max(active.to, seg.to);
137-
return;
175+
// Adjacent or overlapping in PM positions: definitely the
176+
// same logical region (e.g. two text nodes split by an inline
177+
// mark boundary).
178+
if (seg.to > active.to) active.to = seg.to;
179+
continue;
138180
}
139-
181+
const gapHasText = doc.textBetween(active.to, seg.from, '', '').length > 0;
182+
if (!gapHasText) {
183+
// Structural boundary only (paragraph break, inline node
184+
// boundary). Same logical annotation across a paragraph
185+
// crossing — merge.
186+
active.to = seg.to;
187+
continue;
188+
}
189+
// Real gap of uncommented content. Two logically distinct
190+
// anchored regions sharing an id (paste-preserved, etc.). Keep
191+
// them as separate ranges so the resolved range doesn't expand
192+
// over unrelated content.
140193
ranges.push(active);
141194
active = {
142195
from: seg.from,
143196
to: seg.to,
144197
internal: !!seg.attrs?.internal,
145198
};
146-
});
147-
148-
if (active) ranges.push(active);
199+
}
200+
ranges.push(active);
149201
return { segments, ranges };
150202
};
151203

packages/super-editor/src/editors/v1/extensions/comment/comments-helpers.test.js

Lines changed: 214 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import { Schema } from 'prosemirror-model';
2-
import { prepareCommentsForImport } from './comments-helpers.js';
2+
import { EditorState } from 'prosemirror-state';
3+
import { prepareCommentsForImport, resolveCommentById } from './comments-helpers.js';
34

45
vi.mock('./comment-import-helpers.js', () => {
56
return {
@@ -49,3 +50,215 @@ describe('prepareCommentsForImport', () => {
4950
expect(addMarkFn).not.toHaveBeenCalled();
5051
});
5152
});
53+
54+
/**
55+
* Spec-derived contract for `resolveCommentById`.
56+
*
57+
* Per ECMA-376 §17.13.4.3 / §17.13.4.4 / §17.13.4.5, a comment's
58+
* `w:id` is a "unique identifier for an annotation" and the start /
59+
* end / reference triplet for a single annotation appears exactly
60+
* once. Verified against Word output (`/tmp/comment-fixture.docx`):
61+
* a comment whose anchor crosses a paragraph break still produces one
62+
* `commentRangeStart` and one `commentRangeEnd` per id.
63+
*
64+
* `resolveCommentById` converts a live `commentMark` into anchor
65+
* atoms before export. The contract this suite pins: ONE
66+
* `(commentRangeStart, commentRangeEnd)` pair per id, no matter how
67+
* many disjoint mark segments PM stores along the way.
68+
*/
69+
describe('resolveCommentById — anchor atom emission', () => {
70+
const schema = new Schema({
71+
nodes: {
72+
doc: { content: 'block+' },
73+
paragraph: { group: 'block', content: 'inline*' },
74+
commentRangeStart: { group: 'inline', inline: true, attrs: { 'w:id': {}, internal: { default: false } } },
75+
commentRangeEnd: { group: 'inline', inline: true, attrs: { 'w:id': {}, internal: { default: false } } },
76+
text: { group: 'inline' },
77+
},
78+
marks: {
79+
commentMark: {
80+
attrs: { commentId: {}, importedId: { default: null }, internal: { default: false } },
81+
},
82+
},
83+
});
84+
85+
/** Count atoms by type name. */
86+
const countAtoms = (doc, typeName) => {
87+
let n = 0;
88+
doc.descendants((node) => {
89+
if (node.type.name === typeName) n += 1;
90+
});
91+
return n;
92+
};
93+
94+
/** Count atoms by type name AND `w:id` attribute. */
95+
const countByIdAndType = (doc, typeName, wid) => {
96+
let n = 0;
97+
doc.descendants((node) => {
98+
if (node.type.name === typeName && node.attrs?.['w:id'] === wid) n += 1;
99+
});
100+
return n;
101+
};
102+
103+
const runResolve = (doc, commentId) => {
104+
const state = EditorState.create({ doc, schema });
105+
const tr = state.tr;
106+
let dispatched = false;
107+
const ok = resolveCommentById({
108+
commentId,
109+
state,
110+
tr,
111+
dispatch: () => {
112+
dispatched = true;
113+
},
114+
});
115+
return { ok, dispatched, doc: tr.doc };
116+
};
117+
118+
it('single-paragraph comment: emits one commentRangeStart/End pair', () => {
119+
const mark = schema.marks.commentMark.create({ commentId: 'c1', internal: false });
120+
const para = schema.nodes.paragraph.create(null, schema.text('Hello world', [mark]));
121+
const doc = schema.nodes.doc.create(null, [para]);
122+
123+
const result = runResolve(doc, 'c1');
124+
125+
expect(result.ok).toBe(true);
126+
expect(countByIdAndType(result.doc, 'commentRangeStart', 'c1')).toBe(1);
127+
expect(countByIdAndType(result.doc, 'commentRangeEnd', 'c1')).toBe(1);
128+
});
129+
130+
it('multi-paragraph comment (the SuperDoc(8) regression): one pair, not two', () => {
131+
// The exact shape Word produces for a comment that spans two
132+
// paragraphs: one `commentRangeStart` at the first commented
133+
// position and one `commentRangeEnd` after the last commented
134+
// position, with the paragraph break sitting inside the range.
135+
const mark = schema.marks.commentMark.create({ commentId: 'c-multi', internal: false });
136+
const para1 = schema.nodes.paragraph.create(null, schema.text('First half', [mark]));
137+
const para2 = schema.nodes.paragraph.create(null, schema.text('Second half', [mark]));
138+
const doc = schema.nodes.doc.create(null, [para1, para2]);
139+
140+
const result = runResolve(doc, 'c-multi');
141+
142+
expect(result.ok).toBe(true);
143+
expect(countByIdAndType(result.doc, 'commentRangeStart', 'c-multi')).toBe(1);
144+
expect(countByIdAndType(result.doc, 'commentRangeEnd', 'c-multi')).toBe(1);
145+
});
146+
147+
it('three-paragraph comment: still one pair', () => {
148+
const mark = schema.marks.commentMark.create({ commentId: 'c3p', internal: false });
149+
const p1 = schema.nodes.paragraph.create(null, schema.text('Para one', [mark]));
150+
const p2 = schema.nodes.paragraph.create(null, schema.text('Para two', [mark]));
151+
const p3 = schema.nodes.paragraph.create(null, schema.text('Para three', [mark]));
152+
const doc = schema.nodes.doc.create(null, [p1, p2, p3]);
153+
154+
const result = runResolve(doc, 'c3p');
155+
156+
expect(result.ok).toBe(true);
157+
expect(countByIdAndType(result.doc, 'commentRangeStart', 'c3p')).toBe(1);
158+
expect(countByIdAndType(result.doc, 'commentRangeEnd', 'c3p')).toBe(1);
159+
});
160+
161+
it('disjoint same-id (paste-preserved): two ranges, scope of each is preserved', () => {
162+
// The user copy-pastes a commented region; PM preserves the
163+
// commentMark attrs (no clipboard hook on the mark), so the
164+
// same commentId now sits on two non-adjacent regions with
165+
// uncommented content between them. They are logically TWO
166+
// annotations sharing an id — collapsing them into a single
167+
// envelope range would expand the comment to cover the
168+
// unrelated middle content.
169+
//
170+
// The OOXML output is still imperfect (two range pairs sharing
171+
// an id is non-conformant per spec; ids should be unique). A
172+
// follow-up should remap to fresh ids on resolve. Keeping the
173+
// ranges separate is strictly better than collapsing them: the
174+
// anchored extent of each region is preserved, matching the
175+
// pre-fix behavior for this case while still fixing the
176+
// paragraph-crossing case.
177+
const mark = schema.marks.commentMark.create({ commentId: 'c-paste', internal: false });
178+
const p1 = schema.nodes.paragraph.create(null, schema.text('First', [mark]));
179+
const p2 = schema.nodes.paragraph.create(null, schema.text('Uncommented middle paragraph'));
180+
const p3 = schema.nodes.paragraph.create(null, schema.text('Third', [mark]));
181+
const doc = schema.nodes.doc.create(null, [p1, p2, p3]);
182+
183+
const result = runResolve(doc, 'c-paste');
184+
185+
expect(result.ok).toBe(true);
186+
// Two pairs, one per anchored region. Scope of each is the
187+
// originally-marked text — uncommented middle is NOT inside
188+
// either range.
189+
expect(countByIdAndType(result.doc, 'commentRangeStart', 'c-paste')).toBe(2);
190+
expect(countByIdAndType(result.doc, 'commentRangeEnd', 'c-paste')).toBe(2);
191+
192+
// Confirm the scope: walk the doc and verify the uncommented
193+
// middle paragraph is NOT between any START and END of c-paste.
194+
const events = [];
195+
result.doc.descendants((node, pos) => {
196+
if (node.type.name === 'commentRangeStart' && node.attrs['w:id'] === 'c-paste') {
197+
events.push({ kind: 'start', pos });
198+
} else if (node.type.name === 'commentRangeEnd' && node.attrs['w:id'] === 'c-paste') {
199+
events.push({ kind: 'end', pos });
200+
} else if (node.isText) {
201+
events.push({ kind: 'text', pos, text: node.text });
202+
}
203+
});
204+
// Expect ordering: start, "First", end, "Uncommented...", start, "Third", end
205+
const seq = events.map((e) => (e.kind === 'text' ? `T(${e.text})` : e.kind.toUpperCase()));
206+
expect(seq).toEqual(['START', 'T(First)', 'END', 'T(Uncommented middle paragraph)', 'START', 'T(Third)', 'END']);
207+
});
208+
209+
it('two distinct comments side-by-side: two independent pairs, ids unique per annotation', () => {
210+
const a = schema.marks.commentMark.create({ commentId: 'cA', internal: false });
211+
const b = schema.marks.commentMark.create({ commentId: 'cB', internal: false });
212+
const para = schema.nodes.paragraph.create(null, [
213+
schema.text('Left', [a]),
214+
schema.text(' '),
215+
schema.text('Right', [b]),
216+
]);
217+
const doc = schema.nodes.doc.create(null, [para]);
218+
219+
const r1 = runResolve(doc, 'cA');
220+
const r2 = runResolve(r1.doc, 'cB');
221+
222+
expect(countByIdAndType(r2.doc, 'commentRangeStart', 'cA')).toBe(1);
223+
expect(countByIdAndType(r2.doc, 'commentRangeEnd', 'cA')).toBe(1);
224+
expect(countByIdAndType(r2.doc, 'commentRangeStart', 'cB')).toBe(1);
225+
expect(countByIdAndType(r2.doc, 'commentRangeEnd', 'cB')).toBe(1);
226+
expect(countAtoms(r2.doc, 'commentRangeStart')).toBe(2);
227+
expect(countAtoms(r2.doc, 'commentRangeEnd')).toBe(2);
228+
});
229+
230+
it('overlapping comments (one nested inside another, across paragraphs): one pair per id', () => {
231+
// PM allows multiple comment marks on the same node. Resolving
232+
// each one independently must still produce one pair per id.
233+
const outer = schema.marks.commentMark.create({ commentId: 'outer', internal: false });
234+
const inner = schema.marks.commentMark.create({ commentId: 'inner', internal: false });
235+
const p1 = schema.nodes.paragraph.create(null, [
236+
schema.text('Outside ', [outer]),
237+
schema.text('inside both', [outer, inner]),
238+
]);
239+
const p2 = schema.nodes.paragraph.create(null, [
240+
schema.text('still both', [outer, inner]),
241+
schema.text(' just outer', [outer]),
242+
]);
243+
const doc = schema.nodes.doc.create(null, [p1, p2]);
244+
245+
const r1 = runResolve(doc, 'outer');
246+
const r2 = runResolve(r1.doc, 'inner');
247+
248+
expect(countByIdAndType(r2.doc, 'commentRangeStart', 'outer')).toBe(1);
249+
expect(countByIdAndType(r2.doc, 'commentRangeEnd', 'outer')).toBe(1);
250+
expect(countByIdAndType(r2.doc, 'commentRangeStart', 'inner')).toBe(1);
251+
expect(countByIdAndType(r2.doc, 'commentRangeEnd', 'inner')).toBe(1);
252+
});
253+
254+
it('returns false (no-op) when the commentId has no mark in the doc', () => {
255+
const para = schema.nodes.paragraph.create(null, schema.text('uncommented'));
256+
const doc = schema.nodes.doc.create(null, [para]);
257+
258+
const result = runResolve(doc, 'nonexistent');
259+
260+
expect(result.ok).toBe(false);
261+
expect(countAtoms(result.doc, 'commentRangeStart')).toBe(0);
262+
expect(countAtoms(result.doc, 'commentRangeEnd')).toBe(0);
263+
});
264+
});

packages/super-editor/src/editors/v1/extensions/comment/comments.test.js

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -582,6 +582,50 @@ describe('comments plugin commands', () => {
582582
]);
583583
});
584584

585+
it('resolves a multi-paragraph comment to one OOXML range pair', () => {
586+
const schema = createCommentSchema();
587+
const mark = schema.marks[CommentMarkName].create({ commentId: 'comment-1', internal: true });
588+
const doc = schema.nodes.doc.create(null, [
589+
schema.nodes.paragraph.create(null, schema.text('First paragraph', [mark])),
590+
schema.nodes.paragraph.create(null, schema.text('Second paragraph', [mark])),
591+
]);
592+
const state = EditorState.create({ schema, doc });
593+
const tr = state.tr;
594+
const dispatch = vi.fn();
595+
596+
const result = CommentHelpers.resolveCommentById({
597+
commentId: 'comment-1',
598+
state,
599+
tr,
600+
dispatch,
601+
});
602+
603+
expect(result).toBe(true);
604+
expect(dispatch).toHaveBeenCalledWith(tr);
605+
606+
const applied = state.apply(tr);
607+
const remainingMarkIds = [];
608+
const commentNodes = [];
609+
610+
applied.doc.descendants((node, pos) => {
611+
node.marks.forEach((nodeMark) => {
612+
if (nodeMark.type === schema.marks[CommentMarkName]) {
613+
remainingMarkIds.push(nodeMark.attrs.commentId);
614+
}
615+
});
616+
if (node.type.name === 'commentRangeStart' || node.type.name === 'commentRangeEnd') {
617+
commentNodes.push({ type: node.type.name, id: node.attrs['w:id'], pos });
618+
}
619+
});
620+
621+
expect(remainingMarkIds).toEqual([]);
622+
expect(commentNodes).toEqual([
623+
{ type: 'commentRangeStart', id: 'comment-1', pos: expect.any(Number) },
624+
{ type: 'commentRangeEnd', id: 'comment-1', pos: expect.any(Number) },
625+
]);
626+
expect(commentNodes[0].pos).toBeLessThan(commentNodes[1].pos);
627+
});
628+
585629
it('reopens a resolved comment by removing range nodes and restoring the mark', () => {
586630
const { commands, state, schema } = setup();
587631

0 commit comments

Comments
 (0)