fix(diffing): gate paragraph identity on structural depth (SD-3174)#3355
Merged
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 28f88a9866
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Diff replay was overwriting table-cell content with body-paragraph text when a comparison contained both a structural change (table removed) and trailing text changes. DOCX importers renumber w14:paraId values across structural changes, so cell-paragraph IDs in the source can collide with top-level paragraph IDs in the target. The flat sequence diff then paired them as the same paragraph and emitted modification ops anchored inside the table. Gate paragraphComparator and canTreatAsModification on depth equality before any identity signal. Cross-depth content-signature matches were the same hazard for blank or repeated-label paragraphs. Add a fixture pair derived from the IT-1065 reporter's documents and four unit tests covering paraId, content-signature, and similarity cross-depth rejection.
28f88a9 to
ff97a7b
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Diff replay used to overwrite table-cell content with body text whenever a comparison surfaced both a structural change (table add/remove) and trailing paragraph edits. The IT-1065 Google Docs fixtures renumber
w14:paraIdvalues across the structural change, so cell-paragraph IDs in the source collide with top-level paragraph IDs in the target. The flat sequence diff was pairing them as the same paragraph and anchoring modification ops inside the now-deleted table.Reported in #3347 with a before/after fixture pair that reproduces on
main.Fixes #3347
paragraphComparatorandcanTreatAsModificationnow reject pairing across different structural depths before any paraId or content-signature check. Cross-depth content-signature matches were the same hazard for blank paragraphs and short repeated labels, so the guard is hoisted to the top of both functions.diff_before_it1065.docx/diff_after_it1065.docx) and wired into the replayDiffs fixture suite.Review: confirm depth gating is the right granularity vs an ancestor-path key. Depth handles the reporter's fixture and likely sibling cases; a path key would also catch same-depth-different-parent collisions if a fixture ever surfaces one.
Verified:
pnpm exec vitest run --root packages/super-editor src/editors/v1/extensions/diffing→ 247 passed across 26 files. Stashing the source change makes the new fixture test fail with the expected text-content divergence after "Key Milestones", confirming regression coverage.