fix(editor): prevent LaTeX escaping when pasting markdown#14904
fix(editor): prevent LaTeX escaping when pasting markdown#14904ygcaicn wants to merge 2 commits into
Conversation
📝 WalkthroughWalkthroughPreprocessing for Markdown now normalizes single-line ChangesSingle-Line LaTeX Block Handling
Sequence DiagramsequenceDiagram
participant Input as Markdown Input
participant CodePrep as Code Preprocessor
participant LatexPrep as LaTeX Preprocessor
participant Output as Document Model
Input->>CodePrep: Raw markdown containing $$...$$ or \[...\]
Note over CodePrep: Detect/open mathFence ($$ or \[)<br/>Skip code-fence handling while active
CodePrep->>LatexPrep: Pass-through lines with mathFence preserved
Note over LatexPrep: Normalize single-line $$...$$ -> $$\n...\n$$<br/>Protect inline $...$ into <<LATEX_n>> placeholders
LatexPrep->>Output: Emit affine:latex block or inline deltas with exact latex string
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
c6c15e1 to
cddc421
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@blocksuite/affine/all/src/__tests__/adapters/markdown.unit.spec.ts`:
- Around line 4355-4357: The test uses rawSliceSnapshot?.content[0].children
which will throw if content is an empty array; update the expressions to safely
optional-chain the array and index access (e.g., use
rawSliceSnapshot?.content?.[0]?.children) wherever
rawSliceSnapshot?.content[0].children appears (including the other occurrences
around the lines referenced), and similarly guard children access before
asserting flavours so the tests fail with a clear Vitest assertion instead of a
TypeError; search for rawSliceSnapshot and replace content[0].children,
content[1].children, etc., with content?.[0]?.children / content?.[1]?.children
as appropriate.
In `@blocksuite/affine/blocks/code/src/adapters/markdown/preprocessor.ts`:
- Around line 22-26: The math-fence closing check currently uses
trimmedLine.startsWith(mathFence) which prematurely closes a fence when a
content line begins with the delimiter (e.g., "$$\\vec{v}"); change the guard in
the math-fence handling (the block using the mathFence variable and trimmedLine)
to require the entire line (after trimming trailing whitespace) to equal the
delimiter (e.g., use strict equality against the trimmed/trimEnd version of the
line) before clearing mathFence and returning, so only a line that exactly
matches the fence closes it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 90a1ac39-7d20-420f-be77-267ec0717c67
📒 Files selected for processing (3)
blocksuite/affine/all/src/__tests__/adapters/markdown.unit.spec.tsblocksuite/affine/blocks/code/src/adapters/markdown/preprocessor.tsblocksuite/affine/blocks/latex/src/adapters/markdown/preprocessor.ts
cddc421 to
e981bb1
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## canary #14904 +/- ##
==========================================
- Coverage 55.40% 53.82% -1.59%
==========================================
Files 3006 3006
Lines 168964 168981 +17
Branches 24933 24487 -446
==========================================
- Hits 93621 90956 -2665
- Misses 72277 74952 +2675
- Partials 3066 3073 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
e981bb1 to
2794395
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@blocksuite/affine/blocks/latex/src/adapters/markdown/preprocessor.ts`:
- Around line 35-43: protectInlineDollarLatex's regex allows a carriage return
inside the inline-LaTeX body which can leave `\r` in extracted latexExpressions;
update the body character class in the regex used in protectInlineDollarLatex
(the part currently written as [^\n\\$]) to also exclude `\r` (e.g., change to
[^\r\n\\$]) so CRLF line endings won't embed `\r` in captured inline
expressions; keep the rest of the pattern and the push/placeholder logic in the
function unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 15af92c4-db69-4544-867a-32d06a432f62
📒 Files selected for processing (2)
blocksuite/affine/all/src/__tests__/adapters/markdown.unit.spec.tsblocksuite/affine/blocks/latex/src/adapters/markdown/preprocessor.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- blocksuite/affine/all/src/tests/adapters/markdown.unit.spec.ts
| function protectInlineDollarLatex(text: string, latexExpressions: string[]) { | ||
| return text.replace( | ||
| /(^|[^\\$])\$(?!\$|\s)((?:\\.|[^\n\\$])*?[^\s\\$])\$(?!\$)/g, | ||
| (_, prefix: string, latex: string) => { | ||
| latexExpressions.push(`$${latex}$`); | ||
| return `${prefix}<<LATEX_${latexExpressions.length - 1}>>`; | ||
| } | ||
| ); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm whether the test suite exercises CRLF content or raw \r inside inline LaTeX
rg -n $'\r' --type=ts blocksuite/affine/all/src/__tests__/adapters/markdown.unit.spec.ts | head -20
rg -n 'protectInlineDollarLatex\|normalizeSingleLineBlock' --type=tsRepository: toeverything/AFFiNE
Length of output: 45
🏁 Script executed:
fd -type f -name "*.spec.ts" -o -name "*.test.ts" | grep -i markdown | head -10Repository: toeverything/AFFiNE
Length of output: 234
🏁 Script executed:
rg -l "preprocessLatex|protectInlineDollarLatex" --type=ts | head -20Repository: toeverything/AFFiNE
Length of output: 132
🏁 Script executed:
# Check the actual file location and content
fd -type f "preprocessor.ts" | grep latexRepository: toeverything/AFFiNE
Length of output: 234
🏁 Script executed:
fd "\.spec\." blocksuite/affine/blocks/latex | head -20Repository: toeverything/AFFiNE
Length of output: 45
🏁 Script executed:
rg "preprocessLatex" --type ts | head -30Repository: toeverything/AFFiNE
Length of output: 282
🏁 Script executed:
# Read the preprocessor file to understand the full context
cat blocksuite/affine/blocks/latex/src/adapters/markdown/preprocessor.ts | head -120Repository: toeverything/AFFiNE
Length of output: 3441
🏁 Script executed:
# Find where preprocessLatex is called and how input is provided
rg "preprocessLatex|LatexMarkdownPreprocessorExtension" -A 3 -B 3 --type ts | head -50Repository: toeverything/AFFiNE
Length of output: 3886
🏁 Script executed:
# Look for test files related to markdown preprocessing
fd . blocksuite/affine/blocks/latex/src/__tests__ 2>/dev/null | head -20Repository: toeverything/AFFiNE
Length of output: 45
🏁 Script executed:
# Search for any CRLF-related tests in the repo
rg "\\\\r" --type ts | grep -i test | head -20Repository: toeverything/AFFiNE
Length of output: 4210
🏁 Script executed:
# Search for where MarkdownPreprocessorExtension or markdown adapters are invoked
rg "MarkdownPreprocessorExtension|preprocess.*content" --type ts -B 3 -A 3 | grep -A 5 -B 5 "content"Repository: toeverything/AFFiNE
Length of output: 15810
🏁 Script executed:
# Check if there's any line-ending normalization in the adapter pipeline
rg "normalize.*line|CRLF|\\\\r\\\\n" --type ts | head -20Repository: toeverything/AFFiNE
Length of output: 1753
🏁 Script executed:
# Look for where markdown preprocessing happens in the adapter flow
fd "adapter" blocksuite/affine/shared/src/adapters | head -10Repository: toeverything/AFFiNE
Length of output: 375
🏁 Script executed:
# Search for MarkdownPreprocessorManager and how it's invoked
rg "MarkdownPreprocessorManager|\.preprocess\(" --type ts -B 2 -A 2 | head -50Repository: toeverything/AFFiNE
Length of output: 3925
🏁 Script executed:
# Check the markdown adapter flow to see how content reaches preprocessors
cat blocksuite/affine/shared/src/adapters/markdown/preprocessor.ts | head -80Repository: toeverything/AFFiNE
Length of output: 1134
🏁 Script executed:
# Find where MarkdownPreprocessorManager is used to understand the flow
rg "preprocessorManager\." --type ts -B 3 -A 3 | head -60Repository: toeverything/AFFiNE
Length of output: 2002
🏁 Script executed:
# Check the markdown adapter implementation to see preprocessing flow
cat blocksuite/affine/shared/src/adapters/markdown/markdown.ts | head -150Repository: toeverything/AFFiNE
Length of output: 4729
🏁 Script executed:
# Check how payload.file is created and if there's any normalization
rg "ToBlockSnapshotPayload|file:\s" --type ts -B 3 -A 3 | head -50Repository: toeverything/AFFiNE
Length of output: 3858
🏁 Script executed:
# Look for file reading/loading code that might normalize line endings
rg "readFile|readText|load.*markdown|import.*markdown" --type ts | grep -v node_modules | head -20Repository: toeverything/AFFiNE
Length of output: 1657
Add \r to the body character class in protectInlineDollarLatex regex to handle CRLF line endings.
The regex [^\n\\$] allows carriage return (\r) within inline-LaTeX bodies. On Windows systems or when markdown is imported from sources that preserve CRLF line endings, a \r could be captured mid-body (e.g., $a\rb$) and stored in latexExpressions, potentially causing downstream rendering issues with KaTeX.
The trailing position guard [^\s\\$] already excludes \r, so matches won't end on \r, but the body class itself doesn't exclude it. Since normalizeSingleLineBlockLatex only normalizes block-LaTeX lines (not the entire document), inline-LaTeX lines can retain \r if the input isn't pre-normalized.
Suggested fix
- /(^|[^\\$])\$(?!\$|\s)((?:\\.|[^\n\\$])*?[^\s\\$])\$(?!\$)/g,
+ /(^|[^\\$])\$(?!\$|\s)((?:\\.|[^\n\r\\$])*?[^\s\\$])\$(?!\$)/g,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function protectInlineDollarLatex(text: string, latexExpressions: string[]) { | |
| return text.replace( | |
| /(^|[^\\$])\$(?!\$|\s)((?:\\.|[^\n\\$])*?[^\s\\$])\$(?!\$)/g, | |
| (_, prefix: string, latex: string) => { | |
| latexExpressions.push(`$${latex}$`); | |
| return `${prefix}<<LATEX_${latexExpressions.length - 1}>>`; | |
| } | |
| ); | |
| } | |
| function protectInlineDollarLatex(text: string, latexExpressions: string[]) { | |
| return text.replace( | |
| /(^|[^\\$])\$(?!\$|\s)((?:\\.|[^\n\r\\$])*?[^\s\\$])\$(?!\$)/g, | |
| (_, prefix: string, latex: string) => { | |
| latexExpressions.push(`$${latex}$`); | |
| return `${prefix}<<LATEX_${latexExpressions.length - 1}>>`; | |
| } | |
| ); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@blocksuite/affine/blocks/latex/src/adapters/markdown/preprocessor.ts` around
lines 35 - 43, protectInlineDollarLatex's regex allows a carriage return inside
the inline-LaTeX body which can leave `\r` in extracted latexExpressions; update
the body character class in the regex used in protectInlineDollarLatex (the part
currently written as [^\n\\$]) to also exclude `\r` (e.g., change to [^\r\n\\$])
so CRLF line endings won't embed `\r` in captured inline expressions; keep the
rest of the pattern and the push/placeholder logic in the function unchanged.
Summary
$$...$$block math before remark parsingRoot Cause
cmd+shift+vuses the plain-text paste path throughMixTextAdapter.toSliceSnapshot, which runsslice-level markdown preprocessors. The code preprocessor encoded leading spaces as without excluding math fences, so KaTeX received entity text inside the formula. Markdown file import goes through the doc-level path and did not hit that slice-only preprocessor.Validation
yarn vitest run blocksuite/affine/all/src/__tests__/adapters/markdown.unit.spec.tsyarn prettier --check blocksuite/affine/blocks/code/src/adapters/markdown/preprocessor.ts blocksuite/affine/blocks/latex/src/adapters/markdown/preprocessor.ts blocksuite/affine/all/src/__tests__/adapters/markdown.unit.spec.tsSummary by CodeRabbit
Tests
Bug Fixes