fix(comments): one anchor pair per comment id, regardless of paragraph crossings (#3028)

caio-pizzol · web-flow · commit 45c0532ec882 · 2026-04-30T15:48:54.000-03:00
* fix(comments): one anchor pair per comment id, regardless of paragraph crossings

Resolving a comment whose mark spans more than one paragraph emitted
N range pairs for an N-paragraph comment, producing a non-conformant
DOCX with multiple `commentRangeStart` / `commentRangeEnd` /
`commentReference` markers sharing the same `w:id`.

Per ECMA-376 §17.13.4.3, §17.13.4.4, §17.13.4.5, `w:id` is the unique
identifier for an annotation: each annotation produces exactly one
start, one end, and one reference marker. Verified against Word
output for a comment that spans a paragraph break: Word emits one
pair, with the paragraph close sitting inside the range.

`getCommentMarkRangesById` previously merged adjacent segments via
`seg.from &lt;= active.to`, which fails across paragraph boundaries
because PM positions skip the close+open delta of the structural
node. All segments returned by `getCommentMarkSegmentsById` belong
to the same logical annotation by id (a single `commentMark`
applied across the selection); collapse them into one envelope
range covering the full extent.

Tests: spec-derived suite for single-paragraph, multi-paragraph,
three-paragraph, discontinuous, side-by-side, and overlapping
nested cases (resolveCommentById emission), plus the canonical
plugin-level regression test for the multi-paragraph case.

Verified: 164/164 comment tests pass; 12087/12087 super-editor
tests pass; runtime end-to-end against the BYO-UI demo
(SuperDoc(11).docx → buggy: id=3 had 2/2/2 markers; SuperDoc(12).docx
post-fix: id=3 has 1/1/1 markers).

* fix(comments): preserve range scope for disjoint same-id segments (review feedback)

The first iteration of this fix collapsed every segment carrying the
same `commentId` into one envelope range. That handles the
multi-paragraph case (the SuperDoc(8) bug) but expands the range
scope when PM legitimately stores two non-adjacent regions for the
same id — most commonly when a user copy-pastes a commented region.
The CommentMark has no `transformPasted` hook, so PM preserves the
mark attrs verbatim across paste. Both copies share the same
`commentId`, but the user's intent is two annotations on two
regions, not one annotation covering everything between them.

Distinguish the two cases by walking the doc between consecutive
segments: `doc.textBetween(prev.to, seg.from, '', '')` returns the
concatenated text of every text leaf in the gap, with empty block
separators so paragraph close+open contributes nothing. Empty
result → structural boundary only → merge (paragraph-crossing).
Non-empty result → real uncommented gap → keep ranges separate.

Two range pairs with the same id is still non-conformant per spec
(`w:id` should be unique per annotation), and remapping to fresh
ids on resolve is the proper long-term fix. Filed as a follow-up.
For now, preserving per-region scope is strictly better than
collapsing it.

Adds the disjoint regression test and tightens the assertions so
it pins the layout, not just marker counts.

* test(behavior): pin one comment range pair per id end-to-end (SD-3028)

Pins the full pipeline (resolveCommentById → prepareCommentsForExport
→ comment-range translator → word/document.xml) at the export
boundary. Builds a multi-segment TextTarget anchored across two
paragraph blocks, posts the comment, resolves it, exports to DOCX,
and asserts every comment id appears exactly once for each marker
type (commentRangeStart / commentRangeEnd / commentReference).

Verified the test catches the regression: against the pre-fix
getCommentMarkRangesById it fails with 2/2/2 markers for the
multi-paragraph id; against the fix it returns 1/1/1. 3/3 browsers
green (chromium / firefox / webkit).
diff --git a/packages/super-editor/src/editors/v1/extensions/comment/comments-helpers.js b/packages/super-editor/src/editors/v1/extensions/comment/comments-helpers.js
@@ -106,9 +106,46 @@ const getCommentMarkSegmentsById = (commentId, doc, importedId) => {
 };
 
 /**
- * Convert raw mark segments into merged contiguous ranges.
- * A single commentId can appear in multiple disjoint ranges (e.g. if content is split),
- * so this returns both the raw segments and the merged ranges.
+ * Collapse raw mark segments for a single comment id into anchor ranges.
+ *
+ * Per ECMA-376 §17.13.4.3 / §17.13.4.4 / §17.13.4.5, `w:id` is the
+ * unique identifier for an annotation, and the start / end / reference
+ * triplet appears exactly once per id. A multi-paragraph comment is
+ * still ONE annotation: PM splits it into multiple text-node mark
+ * segments because the paragraph close + open structural delta sits
+ * between them, but the OOXML emission must collapse them back into
+ * a single `(commentRangeStart, commentRangeEnd)` pair covering the
+ * full extent.
+ *
+ * Verified against Word: a comment that crosses a paragraph break
+ * produces one `<w:commentRangeStart w:id="…"/>` at the first
+ * commented position and one `<w:commentRangeEnd w:id="…"/>` after
+ * the last commented position, with the paragraph break sitting
+ * inside the range.
+ *
+ * Two flavors of "multiple segments" need to be told apart:
+ *
+ *   1. Paragraph-crossing: segments separated only by a structural
+ *      boundary (paragraph close + open). No uncommented text
+ *      between them. Logical extent is one contiguous range; merge.
+ *
+ *   2. Truly disjoint: segments separated by uncommented text. Most
+ *      common cause: a user copy-pasted commented content into a new
+ *      location; PM preserves the `commentMark` attrs (the mark has
+ *      no clipboard hook), so the same `commentId` ends up on
+ *      anchored regions that have unrelated content between them.
+ *      The two regions are logically two annotations that happen to
+ *      share an id; merging them into one envelope would expand the
+ *      comment's scope to cover the unrelated content. Keep them as
+ *      separate ranges instead — the resulting OOXML still has a
+ *      duplicate id (which a follow-up should remap to fresh ids),
+ *      but the per-range scope is preserved correctly.
+ *
+ * The previous adjacency-based merge (`seg.from <= active.to`)
+ * conflated paragraph-crossing with disjoint and produced N pairs
+ * for an N-paragraph contiguous comment. The fix walks the doc
+ * between consecutive segments and merges only when the gap carries
+ * no text content.
  *
  * @param {string} commentId The comment ID to match
  * @param {string} [importedId] The imported comment ID to match
@@ -119,33 +156,48 @@ const getCommentMarkRangesById = (commentId, doc, importedId) => {
   const segments = getCommentMarkSegmentsById(commentId, doc, importedId);
   if (!segments.length) return { segments, ranges: [] };
 
+  // Walk segments in document order, merging adjacent ones whenever
+  // the gap between them carries no text content. PM's `textBetween`
+  // walks every text leaf in the range and concatenates the text
+  // (block separators omitted by passing empty strings), so a
+  // paragraph close + open contributes nothing and a paragraph of
+  // uncommented text contributes its full content.
+  const sorted = [...segments].sort((a, b) => a.from - b.from);
   const ranges = [];
-  let active = null;
-
-  segments.forEach((seg) => {
-    if (!active) {
-      active = {
-        from: seg.from,
-        to: seg.to,
-        internal: !!seg.attrs?.internal,
-      };
-      return;
-    }
-
+  let active = {
+    from: sorted[0].from,
+    to: sorted[0].to,
+    internal: !!sorted[0].attrs?.internal,
+  };
+  for (let i = 1; i < sorted.length; i += 1) {
+    const seg = sorted[i];
     if (seg.from <= active.to) {
-      active.to = Math.max(active.to, seg.to);
-      return;
+      // Adjacent or overlapping in PM positions: definitely the
+      // same logical region (e.g. two text nodes split by an inline
+      // mark boundary).
+      if (seg.to > active.to) active.to = seg.to;
+      continue;
     }
-
+    const gapHasText = doc.textBetween(active.to, seg.from, '', '').length > 0;
+    if (!gapHasText) {
+      // Structural boundary only (paragraph break, inline node
+      // boundary). Same logical annotation across a paragraph
+      // crossing — merge.
+      active.to = seg.to;
+      continue;
+    }
+    // Real gap of uncommented content. Two logically distinct
+    // anchored regions sharing an id (paste-preserved, etc.). Keep
+    // them as separate ranges so the resolved range doesn't expand
+    // over unrelated content.
     ranges.push(active);
     active = {
       from: seg.from,
       to: seg.to,
       internal: !!seg.attrs?.internal,
     };
-  });
-
-  if (active) ranges.push(active);
+  }
+  ranges.push(active);
   return { segments, ranges };
 };
 
diff --git a/packages/super-editor/src/editors/v1/extensions/comment/comments-helpers.test.js b/packages/super-editor/src/editors/v1/extensions/comment/comments-helpers.test.js
@@ -1,5 +1,6 @@
 import { Schema } from 'prosemirror-model';
-import { prepareCommentsForImport } from './comments-helpers.js';
+import { EditorState } from 'prosemirror-state';
+import { prepareCommentsForImport, resolveCommentById } from './comments-helpers.js';
 
 vi.mock('./comment-import-helpers.js', () => {
   return {
@@ -49,3 +50,215 @@ describe('prepareCommentsForImport', () => {
     expect(addMarkFn).not.toHaveBeenCalled();
   });
 });
+
+/**
+ * Spec-derived contract for `resolveCommentById`.
+ *
+ * Per ECMA-376 §17.13.4.3 / §17.13.4.4 / §17.13.4.5, a comment's
+ * `w:id` is a "unique identifier for an annotation" and the start /
+ * end / reference triplet for a single annotation appears exactly
+ * once. Verified against Word output (`/tmp/comment-fixture.docx`):
+ * a comment whose anchor crosses a paragraph break still produces one
+ * `commentRangeStart` and one `commentRangeEnd` per id.
+ *
+ * `resolveCommentById` converts a live `commentMark` into anchor
+ * atoms before export. The contract this suite pins: ONE
+ * `(commentRangeStart, commentRangeEnd)` pair per id, no matter how
+ * many disjoint mark segments PM stores along the way.
+ */
+describe('resolveCommentById — anchor atom emission', () => {
+  const schema = new Schema({
+    nodes: {
+      doc: { content: 'block+' },
+      paragraph: { group: 'block', content: 'inline*' },
+      commentRangeStart: { group: 'inline', inline: true, attrs: { 'w:id': {}, internal: { default: false } } },
+      commentRangeEnd: { group: 'inline', inline: true, attrs: { 'w:id': {}, internal: { default: false } } },
+      text: { group: 'inline' },
+    },
+    marks: {
+      commentMark: {
+        attrs: { commentId: {}, importedId: { default: null }, internal: { default: false } },
+      },
+    },
+  });
+
+  /** Count atoms by type name. */
+  const countAtoms = (doc, typeName) => {
+    let n = 0;
+    doc.descendants((node) => {
+      if (node.type.name === typeName) n += 1;
+    });
+    return n;
+  };
+
+  /** Count atoms by type name AND `w:id` attribute. */
+  const countByIdAndType = (doc, typeName, wid) => {
+    let n = 0;
+    doc.descendants((node) => {
+      if (node.type.name === typeName && node.attrs?.['w:id'] === wid) n += 1;
+    });
+    return n;
+  };
+
+  const runResolve = (doc, commentId) => {
+    const state = EditorState.create({ doc, schema });
+    const tr = state.tr;
+    let dispatched = false;
+    const ok = resolveCommentById({
+      commentId,
+      state,
+      tr,
+      dispatch: () => {
+        dispatched = true;
+      },
+    });
+    return { ok, dispatched, doc: tr.doc };
+  };
+
+  it('single-paragraph comment: emits one commentRangeStart/End pair', () => {
+    const mark = schema.marks.commentMark.create({ commentId: 'c1', internal: false });
+    const para = schema.nodes.paragraph.create(null, schema.text('Hello world', [mark]));
+    const doc = schema.nodes.doc.create(null, [para]);
+
+    const result = runResolve(doc, 'c1');
+
+    expect(result.ok).toBe(true);
+    expect(countByIdAndType(result.doc, 'commentRangeStart', 'c1')).toBe(1);
+    expect(countByIdAndType(result.doc, 'commentRangeEnd', 'c1')).toBe(1);
+  });
+
+  it('multi-paragraph comment (the SuperDoc(8) regression): one pair, not two', () => {
+    // The exact shape Word produces for a comment that spans two
+    // paragraphs: one `commentRangeStart` at the first commented
+    // position and one `commentRangeEnd` after the last commented
+    // position, with the paragraph break sitting inside the range.
+    const mark = schema.marks.commentMark.create({ commentId: 'c-multi', internal: false });
+    const para1 = schema.nodes.paragraph.create(null, schema.text('First half', [mark]));
+    const para2 = schema.nodes.paragraph.create(null, schema.text('Second half', [mark]));
+    const doc = schema.nodes.doc.create(null, [para1, para2]);
+
+    const result = runResolve(doc, 'c-multi');
+
+    expect(result.ok).toBe(true);
+    expect(countByIdAndType(result.doc, 'commentRangeStart', 'c-multi')).toBe(1);
+    expect(countByIdAndType(result.doc, 'commentRangeEnd', 'c-multi')).toBe(1);
+  });
+
+  it('three-paragraph comment: still one pair', () => {
+    const mark = schema.marks.commentMark.create({ commentId: 'c3p', internal: false });
+    const p1 = schema.nodes.paragraph.create(null, schema.text('Para one', [mark]));
+    const p2 = schema.nodes.paragraph.create(null, schema.text('Para two', [mark]));
+    const p3 = schema.nodes.paragraph.create(null, schema.text('Para three', [mark]));
+    const doc = schema.nodes.doc.create(null, [p1, p2, p3]);
+
+    const result = runResolve(doc, 'c3p');
+
+    expect(result.ok).toBe(true);
+    expect(countByIdAndType(result.doc, 'commentRangeStart', 'c3p')).toBe(1);
+    expect(countByIdAndType(result.doc, 'commentRangeEnd', 'c3p')).toBe(1);
+  });
+
+  it('disjoint same-id (paste-preserved): two ranges, scope of each is preserved', () => {
+    // The user copy-pastes a commented region; PM preserves the
+    // commentMark attrs (no clipboard hook on the mark), so the
+    // same commentId now sits on two non-adjacent regions with
+    // uncommented content between them. They are logically TWO
+    // annotations sharing an id — collapsing them into a single
+    // envelope range would expand the comment to cover the
+    // unrelated middle content.
+    //
+    // The OOXML output is still imperfect (two range pairs sharing
+    // an id is non-conformant per spec; ids should be unique). A
+    // follow-up should remap to fresh ids on resolve. Keeping the
+    // ranges separate is strictly better than collapsing them: the
+    // anchored extent of each region is preserved, matching the
+    // pre-fix behavior for this case while still fixing the
+    // paragraph-crossing case.
+    const mark = schema.marks.commentMark.create({ commentId: 'c-paste', internal: false });
+    const p1 = schema.nodes.paragraph.create(null, schema.text('First', [mark]));
+    const p2 = schema.nodes.paragraph.create(null, schema.text('Uncommented middle paragraph'));
+    const p3 = schema.nodes.paragraph.create(null, schema.text('Third', [mark]));
+    const doc = schema.nodes.doc.create(null, [p1, p2, p3]);
+
+    const result = runResolve(doc, 'c-paste');
+
+    expect(result.ok).toBe(true);
+    // Two pairs, one per anchored region. Scope of each is the
+    // originally-marked text — uncommented middle is NOT inside
+    // either range.
+    expect(countByIdAndType(result.doc, 'commentRangeStart', 'c-paste')).toBe(2);
+    expect(countByIdAndType(result.doc, 'commentRangeEnd', 'c-paste')).toBe(2);
+
+    // Confirm the scope: walk the doc and verify the uncommented
+    // middle paragraph is NOT between any START and END of c-paste.
+    const events = [];
+    result.doc.descendants((node, pos) => {
+      if (node.type.name === 'commentRangeStart' && node.attrs['w:id'] === 'c-paste') {
+        events.push({ kind: 'start', pos });
+      } else if (node.type.name === 'commentRangeEnd' && node.attrs['w:id'] === 'c-paste') {
+        events.push({ kind: 'end', pos });
+      } else if (node.isText) {
+        events.push({ kind: 'text', pos, text: node.text });
+      }
+    });
+    // Expect ordering: start, "First", end, "Uncommented...", start, "Third", end
+    const seq = events.map((e) => (e.kind === 'text' ? `T(${e.text})` : e.kind.toUpperCase()));
+    expect(seq).toEqual(['START', 'T(First)', 'END', 'T(Uncommented middle paragraph)', 'START', 'T(Third)', 'END']);
+  });
+
+  it('two distinct comments side-by-side: two independent pairs, ids unique per annotation', () => {
+    const a = schema.marks.commentMark.create({ commentId: 'cA', internal: false });
+    const b = schema.marks.commentMark.create({ commentId: 'cB', internal: false });
+    const para = schema.nodes.paragraph.create(null, [
+      schema.text('Left', [a]),
+      schema.text(' '),
+      schema.text('Right', [b]),
+    ]);
+    const doc = schema.nodes.doc.create(null, [para]);
+
+    const r1 = runResolve(doc, 'cA');
+    const r2 = runResolve(r1.doc, 'cB');
+
+    expect(countByIdAndType(r2.doc, 'commentRangeStart', 'cA')).toBe(1);
+    expect(countByIdAndType(r2.doc, 'commentRangeEnd', 'cA')).toBe(1);
+    expect(countByIdAndType(r2.doc, 'commentRangeStart', 'cB')).toBe(1);
+    expect(countByIdAndType(r2.doc, 'commentRangeEnd', 'cB')).toBe(1);
+    expect(countAtoms(r2.doc, 'commentRangeStart')).toBe(2);
+    expect(countAtoms(r2.doc, 'commentRangeEnd')).toBe(2);
+  });
+
+  it('overlapping comments (one nested inside another, across paragraphs): one pair per id', () => {
+    // PM allows multiple comment marks on the same node. Resolving
+    // each one independently must still produce one pair per id.
+    const outer = schema.marks.commentMark.create({ commentId: 'outer', internal: false });
+    const inner = schema.marks.commentMark.create({ commentId: 'inner', internal: false });
+    const p1 = schema.nodes.paragraph.create(null, [
+      schema.text('Outside ', [outer]),
+      schema.text('inside both', [outer, inner]),
+    ]);
+    const p2 = schema.nodes.paragraph.create(null, [
+      schema.text('still both', [outer, inner]),
+      schema.text(' just outer', [outer]),
+    ]);
+    const doc = schema.nodes.doc.create(null, [p1, p2]);
+
+    const r1 = runResolve(doc, 'outer');
+    const r2 = runResolve(r1.doc, 'inner');
+
+    expect(countByIdAndType(r2.doc, 'commentRangeStart', 'outer')).toBe(1);
+    expect(countByIdAndType(r2.doc, 'commentRangeEnd', 'outer')).toBe(1);
+    expect(countByIdAndType(r2.doc, 'commentRangeStart', 'inner')).toBe(1);
+    expect(countByIdAndType(r2.doc, 'commentRangeEnd', 'inner')).toBe(1);
+  });
+
+  it('returns false (no-op) when the commentId has no mark in the doc', () => {
+    const para = schema.nodes.paragraph.create(null, schema.text('uncommented'));
+    const doc = schema.nodes.doc.create(null, [para]);
+
+    const result = runResolve(doc, 'nonexistent');
+
+    expect(result.ok).toBe(false);
+    expect(countAtoms(result.doc, 'commentRangeStart')).toBe(0);
+    expect(countAtoms(result.doc, 'commentRangeEnd')).toBe(0);
+  });
+});
diff --git a/packages/super-editor/src/editors/v1/extensions/comment/comments.test.js b/packages/super-editor/src/editors/v1/extensions/comment/comments.test.js
@@ -582,6 +582,50 @@ describe('comments plugin commands', () => {
     ]);
   });
 
+  it('resolves a multi-paragraph comment to one OOXML range pair', () => {
+    const schema = createCommentSchema();
+    const mark = schema.marks[CommentMarkName].create({ commentId: 'comment-1', internal: true });
+    const doc = schema.nodes.doc.create(null, [
+      schema.nodes.paragraph.create(null, schema.text('First paragraph', [mark])),
+      schema.nodes.paragraph.create(null, schema.text('Second paragraph', [mark])),
+    ]);
+    const state = EditorState.create({ schema, doc });
+    const tr = state.tr;
+    const dispatch = vi.fn();
+
+    const result = CommentHelpers.resolveCommentById({
+      commentId: 'comment-1',
+      state,
+      tr,
+      dispatch,
+    });
+
+    expect(result).toBe(true);
+    expect(dispatch).toHaveBeenCalledWith(tr);
+
+    const applied = state.apply(tr);
+    const remainingMarkIds = [];
+    const commentNodes = [];
+
+    applied.doc.descendants((node, pos) => {
+      node.marks.forEach((nodeMark) => {
+        if (nodeMark.type === schema.marks[CommentMarkName]) {
+          remainingMarkIds.push(nodeMark.attrs.commentId);
+        }
+      });
+      if (node.type.name === 'commentRangeStart' || node.type.name === 'commentRangeEnd') {
+        commentNodes.push({ type: node.type.name, id: node.attrs['w:id'], pos });
+      }
+    });
+
+    expect(remainingMarkIds).toEqual([]);
+    expect(commentNodes).toEqual([
+      { type: 'commentRangeStart', id: 'comment-1', pos: expect.any(Number) },
+      { type: 'commentRangeEnd', id: 'comment-1', pos: expect.any(Number) },
+    ]);
+    expect(commentNodes[0].pos).toBeLessThan(commentNodes[1].pos);
+  });
+
   it('reopens a resolved comment by removing range nodes and restoring the mark', () => {
     const { commands, state, schema } = setup();
 
diff --git a/tests/behavior/tests/comments/sd-3028-multi-paragraph-comment-range-cardinality.spec.ts b/tests/behavior/tests/comments/sd-3028-multi-paragraph-comment-range-cardinality.spec.ts