Skip to content

fix(editor): prevent LaTeX escaping when pasting markdown#14904

Open
ygcaicn wants to merge 2 commits into
toeverything:canaryfrom
ygcaicn:codex/fix-markdown-latex-paste
Open

fix(editor): prevent LaTeX escaping when pasting markdown#14904
ygcaicn wants to merge 2 commits into
toeverything:canaryfrom
ygcaicn:codex/fix-markdown-latex-paste

Conversation

@ygcaicn
Copy link
Copy Markdown
Contributor

@ygcaicn ygcaicn commented May 5, 2026

Summary

  • skip markdown slice code preprocessing inside block math fences so pasted LaTeX indentation is preserved
  • normalize single-line $$...$$ block math before remark parsing
  • cover plain-text paste and markdown adapter LaTeX block cases

Root Cause

cmd+shift+v uses the plain-text paste path through MixTextAdapter.toSliceSnapshot, which runs slice-level markdown preprocessors. The code preprocessor encoded leading spaces as   without excluding math fences, so KaTeX received entity text inside the formula. Markdown file import goes through the doc-level path and did not hit that slice-only preprocessor.

Validation

  • yarn vitest run blocksuite/affine/all/src/__tests__/adapters/markdown.unit.spec.ts
  • yarn prettier --check blocksuite/affine/blocks/code/src/adapters/markdown/preprocessor.ts blocksuite/affine/blocks/latex/src/adapters/markdown/preprocessor.ts blocksuite/affine/all/src/__tests__/adapters/markdown.unit.spec.ts

Summary by CodeRabbit

  • Tests

    • Expanded test coverage for inline and block LaTeX, including dollar-sign inline math starting with digits, CJK/bold/list contexts, single-line $$...$$ blocks, bracket-style blocks, pasted single-line blocks, whitespace preservation, and indented block handling.
  • Bug Fixes

    • Improved markdown preprocessing to reliably detect and normalize single-line and multi-line LaTeX fences so pasted and inline math parse and preserve exact content and spacing.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Preprocessing for Markdown now normalizes single-line $$ ... $$ LaTeX blocks into multi-line form and prevents math display fences ($$ / \[ / \]) from being misinterpreted as code fences. Tests were added to validate parsing and paste behavior for single-line block LaTeX.

Changes

Single-Line LaTeX Block Handling

Layer / File(s) Summary
Single-Line Block Normalization
blocksuite/affine/blocks/latex/src/adapters/markdown/preprocessor.ts
Adds normalizeSingleLineBlockLatex() to rewrite single-line $$...$$ (with optional indentation) into a newline-wrapped multi-line block and calls it early in preprocessLatex().
Inline-dollar LaTeX Protection
blocksuite/affine/blocks/latex/src/adapters/markdown/preprocessor.ts
Adds protectInlineDollarLatex(text, latexExpressions) to extract $...$ inline expressions into indexed <<LATEX_n>> placeholders and collect literal latex strings.
Math Fence Infrastructure
blocksuite/affine/blocks/code/src/adapters/markdown/preprocessor.ts
Introduces a mathFence state ('$$' | '\\]' | '') to track multi-line display math, skip code-fence processing while inside them, and clear the state on matching closing delimiters.
Test Coverage
blocksuite/affine/all/src/__tests__/adapters/markdown.unit.spec.ts
Adds MixTextAdapter to imports, expands inline latex and latex block parameterized cases to include digit-starting inline math and single-line $$...$$ variants, and adds MixTextAdapter paste/slice tests asserting pasted single-line dollar- and bracket-form blocks convert to affine:latex with exact latex strings and preserved whitespace (no HTML-encoded spaces).

Sequence Diagram

sequenceDiagram
    participant Input as Markdown Input
    participant CodePrep as Code Preprocessor
    participant LatexPrep as LaTeX Preprocessor
    participant Output as Document Model

    Input->>CodePrep: Raw markdown containing $$...$$ or \[...\]
    Note over CodePrep: Detect/open mathFence ($$ or \[)<br/>Skip code-fence handling while active
    CodePrep->>LatexPrep: Pass-through lines with mathFence preserved
    Note over LatexPrep: Normalize single-line $$...$$ -> $$\n...\n$$<br/>Protect inline $...$ into <<LATEX_n>> placeholders
    LatexPrep->>Output: Emit affine:latex block or inline deltas with exact latex string
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main fix: preventing LaTeX escaping issues when pasting markdown content, which aligns with the PR's core objective of skipping code preprocessing inside math fences to preserve LaTeX.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 5, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions Bot added the test Related to test cases label May 5, 2026
@ygcaicn ygcaicn force-pushed the codex/fix-markdown-latex-paste branch from c6c15e1 to cddc421 Compare May 5, 2026 04:20
@ygcaicn ygcaicn marked this pull request as ready for review May 5, 2026 04:25
@ygcaicn ygcaicn requested a review from a team as a code owner May 5, 2026 04:25
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@blocksuite/affine/all/src/__tests__/adapters/markdown.unit.spec.ts`:
- Around line 4355-4357: The test uses rawSliceSnapshot?.content[0].children
which will throw if content is an empty array; update the expressions to safely
optional-chain the array and index access (e.g., use
rawSliceSnapshot?.content?.[0]?.children) wherever
rawSliceSnapshot?.content[0].children appears (including the other occurrences
around the lines referenced), and similarly guard children access before
asserting flavours so the tests fail with a clear Vitest assertion instead of a
TypeError; search for rawSliceSnapshot and replace content[0].children,
content[1].children, etc., with content?.[0]?.children / content?.[1]?.children
as appropriate.

In `@blocksuite/affine/blocks/code/src/adapters/markdown/preprocessor.ts`:
- Around line 22-26: The math-fence closing check currently uses
trimmedLine.startsWith(mathFence) which prematurely closes a fence when a
content line begins with the delimiter (e.g., "$$\\vec{v}"); change the guard in
the math-fence handling (the block using the mathFence variable and trimmedLine)
to require the entire line (after trimming trailing whitespace) to equal the
delimiter (e.g., use strict equality against the trimmed/trimEnd version of the
line) before clearing mathFence and returning, so only a line that exactly
matches the fence closes it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 90a1ac39-7d20-420f-be77-267ec0717c67

📥 Commits

Reviewing files that changed from the base of the PR and between 4e169ea and cddc421.

📒 Files selected for processing (3)
  • blocksuite/affine/all/src/__tests__/adapters/markdown.unit.spec.ts
  • blocksuite/affine/blocks/code/src/adapters/markdown/preprocessor.ts
  • blocksuite/affine/blocks/latex/src/adapters/markdown/preprocessor.ts

Comment thread blocksuite/affine/all/src/__tests__/adapters/markdown.unit.spec.ts Outdated
Comment thread blocksuite/affine/blocks/code/src/adapters/markdown/preprocessor.ts
@ygcaicn ygcaicn force-pushed the codex/fix-markdown-latex-paste branch from cddc421 to e981bb1 Compare May 5, 2026 04:43
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 94.11765% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 53.82%. Comparing base (4e169ea) to head (e981bb1).

Files with missing lines Patch % Lines
.../blocks/code/src/adapters/markdown/preprocessor.ts 92.85% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           canary   #14904      +/-   ##
==========================================
- Coverage   55.40%   53.82%   -1.59%     
==========================================
  Files        3006     3006              
  Lines      168964   168981      +17     
  Branches    24933    24487     -446     
==========================================
- Hits        93621    90956    -2665     
- Misses      72277    74952    +2675     
- Partials     3066     3073       +7     
Flag Coverage Δ
server-test 75.77% <ø> (-2.98%) ⬇️
unittest 29.89% <94.11%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ygcaicn ygcaicn changed the title fix markdown latex paste fix(editor): prevent LaTeX escaping when pasting markdown May 5, 2026
@ygcaicn ygcaicn force-pushed the codex/fix-markdown-latex-paste branch from e981bb1 to 2794395 Compare May 7, 2026 18:50
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@blocksuite/affine/blocks/latex/src/adapters/markdown/preprocessor.ts`:
- Around line 35-43: protectInlineDollarLatex's regex allows a carriage return
inside the inline-LaTeX body which can leave `\r` in extracted latexExpressions;
update the body character class in the regex used in protectInlineDollarLatex
(the part currently written as [^\n\\$]) to also exclude `\r` (e.g., change to
[^\r\n\\$]) so CRLF line endings won't embed `\r` in captured inline
expressions; keep the rest of the pattern and the push/placeholder logic in the
function unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 15af92c4-db69-4544-867a-32d06a432f62

📥 Commits

Reviewing files that changed from the base of the PR and between 2794395 and 41a7b95.

📒 Files selected for processing (2)
  • blocksuite/affine/all/src/__tests__/adapters/markdown.unit.spec.ts
  • blocksuite/affine/blocks/latex/src/adapters/markdown/preprocessor.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • blocksuite/affine/all/src/tests/adapters/markdown.unit.spec.ts

Comment on lines +35 to +43
function protectInlineDollarLatex(text: string, latexExpressions: string[]) {
return text.replace(
/(^|[^\\$])\$(?!\$|\s)((?:\\.|[^\n\\$])*?[^\s\\$])\$(?!\$)/g,
(_, prefix: string, latex: string) => {
latexExpressions.push(`$${latex}$`);
return `${prefix}<<LATEX_${latexExpressions.length - 1}>>`;
}
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm whether the test suite exercises CRLF content or raw \r inside inline LaTeX
rg -n $'\r' --type=ts blocksuite/affine/all/src/__tests__/adapters/markdown.unit.spec.ts | head -20
rg -n 'protectInlineDollarLatex\|normalizeSingleLineBlock' --type=ts

Repository: toeverything/AFFiNE

Length of output: 45


🏁 Script executed:

fd -type f -name "*.spec.ts" -o -name "*.test.ts" | grep -i markdown | head -10

Repository: toeverything/AFFiNE

Length of output: 234


🏁 Script executed:

rg -l "preprocessLatex|protectInlineDollarLatex" --type=ts | head -20

Repository: toeverything/AFFiNE

Length of output: 132


🏁 Script executed:

# Check the actual file location and content
fd -type f "preprocessor.ts" | grep latex

Repository: toeverything/AFFiNE

Length of output: 234


🏁 Script executed:

fd "\.spec\." blocksuite/affine/blocks/latex | head -20

Repository: toeverything/AFFiNE

Length of output: 45


🏁 Script executed:

rg "preprocessLatex" --type ts | head -30

Repository: toeverything/AFFiNE

Length of output: 282


🏁 Script executed:

# Read the preprocessor file to understand the full context
cat blocksuite/affine/blocks/latex/src/adapters/markdown/preprocessor.ts | head -120

Repository: toeverything/AFFiNE

Length of output: 3441


🏁 Script executed:

# Find where preprocessLatex is called and how input is provided
rg "preprocessLatex|LatexMarkdownPreprocessorExtension" -A 3 -B 3 --type ts | head -50

Repository: toeverything/AFFiNE

Length of output: 3886


🏁 Script executed:

# Look for test files related to markdown preprocessing
fd . blocksuite/affine/blocks/latex/src/__tests__ 2>/dev/null | head -20

Repository: toeverything/AFFiNE

Length of output: 45


🏁 Script executed:

# Search for any CRLF-related tests in the repo
rg "\\\\r" --type ts | grep -i test | head -20

Repository: toeverything/AFFiNE

Length of output: 4210


🏁 Script executed:

# Search for where MarkdownPreprocessorExtension or markdown adapters are invoked
rg "MarkdownPreprocessorExtension|preprocess.*content" --type ts -B 3 -A 3 | grep -A 5 -B 5 "content"

Repository: toeverything/AFFiNE

Length of output: 15810


🏁 Script executed:

# Check if there's any line-ending normalization in the adapter pipeline
rg "normalize.*line|CRLF|\\\\r\\\\n" --type ts | head -20

Repository: toeverything/AFFiNE

Length of output: 1753


🏁 Script executed:

# Look for where markdown preprocessing happens in the adapter flow
fd "adapter" blocksuite/affine/shared/src/adapters | head -10

Repository: toeverything/AFFiNE

Length of output: 375


🏁 Script executed:

# Search for MarkdownPreprocessorManager and how it's invoked
rg "MarkdownPreprocessorManager|\.preprocess\(" --type ts -B 2 -A 2 | head -50

Repository: toeverything/AFFiNE

Length of output: 3925


🏁 Script executed:

# Check the markdown adapter flow to see how content reaches preprocessors
cat blocksuite/affine/shared/src/adapters/markdown/preprocessor.ts | head -80

Repository: toeverything/AFFiNE

Length of output: 1134


🏁 Script executed:

# Find where MarkdownPreprocessorManager is used to understand the flow
rg "preprocessorManager\." --type ts -B 3 -A 3 | head -60

Repository: toeverything/AFFiNE

Length of output: 2002


🏁 Script executed:

# Check the markdown adapter implementation to see preprocessing flow
cat blocksuite/affine/shared/src/adapters/markdown/markdown.ts | head -150

Repository: toeverything/AFFiNE

Length of output: 4729


🏁 Script executed:

# Check how payload.file is created and if there's any normalization
rg "ToBlockSnapshotPayload|file:\s" --type ts -B 3 -A 3 | head -50

Repository: toeverything/AFFiNE

Length of output: 3858


🏁 Script executed:

# Look for file reading/loading code that might normalize line endings
rg "readFile|readText|load.*markdown|import.*markdown" --type ts | grep -v node_modules | head -20

Repository: toeverything/AFFiNE

Length of output: 1657


Add \r to the body character class in protectInlineDollarLatex regex to handle CRLF line endings.

The regex [^\n\\$] allows carriage return (\r) within inline-LaTeX bodies. On Windows systems or when markdown is imported from sources that preserve CRLF line endings, a \r could be captured mid-body (e.g., $a\rb$) and stored in latexExpressions, potentially causing downstream rendering issues with KaTeX.

The trailing position guard [^\s\\$] already excludes \r, so matches won't end on \r, but the body class itself doesn't exclude it. Since normalizeSingleLineBlockLatex only normalizes block-LaTeX lines (not the entire document), inline-LaTeX lines can retain \r if the input isn't pre-normalized.

Suggested fix
-    /(^|[^\\$])\$(?!\$|\s)((?:\\.|[^\n\\$])*?[^\s\\$])\$(?!\$)/g,
+    /(^|[^\\$])\$(?!\$|\s)((?:\\.|[^\n\r\\$])*?[^\s\\$])\$(?!\$)/g,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function protectInlineDollarLatex(text: string, latexExpressions: string[]) {
return text.replace(
/(^|[^\\$])\$(?!\$|\s)((?:\\.|[^\n\\$])*?[^\s\\$])\$(?!\$)/g,
(_, prefix: string, latex: string) => {
latexExpressions.push(`$${latex}$`);
return `${prefix}<<LATEX_${latexExpressions.length - 1}>>`;
}
);
}
function protectInlineDollarLatex(text: string, latexExpressions: string[]) {
return text.replace(
/(^|[^\\$])\$(?!\$|\s)((?:\\.|[^\n\r\\$])*?[^\s\\$])\$(?!\$)/g,
(_, prefix: string, latex: string) => {
latexExpressions.push(`$${latex}$`);
return `${prefix}<<LATEX_${latexExpressions.length - 1}>>`;
}
);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@blocksuite/affine/blocks/latex/src/adapters/markdown/preprocessor.ts` around
lines 35 - 43, protectInlineDollarLatex's regex allows a carriage return inside
the inline-LaTeX body which can leave `\r` in extracted latexExpressions; update
the body character class in the regex used in protectInlineDollarLatex (the part
currently written as [^\n\\$]) to also exclude `\r` (e.g., change to [^\r\n\\$])
so CRLF line endings won't embed `\r` in captured inline expressions; keep the
rest of the pattern and the push/placeholder logic in the function unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test Related to test cases

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants