fix(edit): 修复 LLM 转义错误时 loose_escape 回退路径写入错误内容的问题 by fym998 · Pull Request #136 · lessweb/deepcode-cli

fym998 · 2026-05-30T05:24:33Z

问题

LLM 在处理 LaTeX、JSON、Unicode 转义等场景时经常多数/少数反斜杠（如 \alpha 写成 \\alpha ），导致精确匹配失败进入 loose_escape 回退路径。该路径虽然能定位到正确的匹配位置，但直接使用 LLM 原始 newString（同样带有转义错误）写入文件，造成静默数据损坏——每次此类替换都会在文件中引入错误的反斜杠数量。

解决方案

新增 fixNewStringEscaping() 确定性修正：通过比对 oldString 与匹配文本中各反斜杠组的数量比例，推导出 newString 的正确转义形式。覆盖 uniform ratio（均匀比例，如 LaTeX 命令）、zero ratio（引号转义→无转义）等情况。
优先确定性修正，LLM 兜底，节约成本：先用确定性算法修正。仅在转义比例不一致（mixed ratios，如同一字符串中不同位置的反斜杠错误倍数不同）时，才调用 LLM 辅助判断。这样可以节约又贵又慢的 LLM 调用。
无法修正时明确报错而非静默写入：当确定性修正失败且 LLM 不可用时，返回明确错误信息，引导 LLM 重新读取文件或改用 Bash 工具。

测试覆盖（36 个测试全部通过）

场景	路径
LaTeX 命令 over-escaping（uniform ratio）	确定性修正 → `loose_escape`
Unicode 转义 over-escaping（uniform ratio）	确定性修正 → `loose_escape`
JSON 字符串转义	确定性修正 → `loose_escape`
引号转义→无转义（zero ratio）	确定性修正 → `loose_escape`
新串多出反斜杠组（reuse last ratio）	确定性修正 → `loose_escape`
LaTeX 混合转义（mixed ratios）	确定性失败 → LLM 修正
JS Unicode 混合转义	确定性失败 → LLM 修正
JSON 混合转义	确定性失败 → LLM 修正
混合转义 + 无 LLM	确定性失败 → 返回错误 `ok: false`

When the LLM miscounts backslash escapes (common in LaTeX, JSON nesting), the exact match fails and the loose_escape regex recovers the matched position. However the fallback path was writing the LLM's original newString verbatim, which carried the same escaping errors as the old_string. This silently corrupted files by introducing doubled or missing backslashes on every replacement that went through the loose_escape path without a successful llm_escape_correction round-trip. Add fixNewStringEscaping() which tokenizes strings into backslash runs and text segments, aligns old_string with the regex-matched text to compute per-run backslash count ratios, and applies the same ratios to newString. When newString has more backslash runs than old_string, the last ratio is reused (the escaping error is typically uniform). Also update replacementOldString to use the matched text so both sides are consistent, matching the pattern already established by tab_correction.

…back Add two tests that exercise the loose_escape fallback path without an LLM client (so correctEscapedStringsWithLLM is skipped), confirming that the escaping correction is applied to newString: - Over-escaped LaTeX commands: \\alpha → \alpha - Over-escaped LaTeX accent: H\\\"{o}tel → H\"{o}tel (both backslash and quote doubled by LLM) Both cases verify that the resulting file content uses correctly escaped single-backslash LaTeX, not the LLM's original multi-backslash new_string.

Allow loose_escape newString correction to handle cases where old_string escapes a character but the matched file text has no backslash, such as " matching a literal quote. Keep the correction aligned with the loose_escape regex semantics and preserve reuse of the last ratio for extra new_string backslash runs. Add regression coverage for quote escapes collapsing to zero backslashes and for extra new_string backslash runs reusing the last correction ratio.

fym998 · 2026-05-30T05:35:55Z

bug 的触发条件可能写得不准确（AI 写的），我自用的时候也是有时触发，有时又不触发，难以复现，我再检查一下

fym998 · 2026-05-30T06:58:00Z

我刚刚发现一个优化点：本地算法可以在很多情况下取代 LLM 矫正，从而减少成本和等待时间

Run local newString escaping correction before invoking LLM correction for unique loose_escape matches. Only fall back to LLM correction when deterministic ratio inference is ambiguous, such as mixed escaping ratios with extra new_string slash runs. Update edit handler tests to assert deterministic cases avoid LLM calls, and add LLM fallback coverage for mixed escaping in LaTeX, JS unicode escapes, and JSON strings.

…ring when escaping is ambiguous and LLM unavailable When deterministic escape correction fails (mixed/inconsistent ratios) and no LLM is available to disambiguate, return a clear error instead of silently writing the uncorrected (potentially over-escaped) newString to the file. This is safer because the LLM can then re-read the file with exact escaping or fall back to the Bash tool.

fym998 · 2026-05-30T07:23:26Z

确定性修正算法 `fixNewStringEscaping`

分三步：

第一步：收集转义比例 collectLooseEscapeRatios(oldString, matchedText)

同步遍历 oldString 和 matchedText，以非反斜杠字符为"锚点"：

oldString:   \\alpha + \"x\"
matchedText:  \alpha + "x"

反斜杠组 #1:  old=2, matched=1 → 比例 = 1/2 = 0.5
反斜杠组 #2:  old=1, matched=0 → 比例 = 0/1 = 0
              → ratios = [0.5, 0]

非反斜杠字符必须逐字匹配（锚点），否则返回 null（结构不一致）
末尾反斜杠组两边数量必须相等，否则返回 null
有任何残留字符也返回 null

第二步：分词 tokenizeLooseEscaping(newString)

将 newString 切分成交替的 slash（连续反斜杠）和 text（其他文本）token：

newString:  \\beta + \"y\" + \\gamma

→ [slash(2), text("beta + "), slash(1), text("\"y\" + "), slash(2), text("gamma")]

第三步：逐 token 应用比例

遍历 token 序列，对 slash 类型用对应序号的比例修正：

slash(2) × ratios[0]=0.5 → round(1) → \         （1个反斜杠）
slash(1) × ratios[1]=0   → round(0) → (空)       （引号转义归零）
slash(2) × ratios[?]     → 超出比例表...

此时判断 ratios 是否均匀：
  [0.5, 0] → 不均匀 → canReuseLastRatio = false
  → 返回 ok: false（交给 LLM 处理）

如果比例均匀（如 LaTeX 命令全是 \\\\ → \\，比例全是 0.5），则 canReuseLastRatio = true，多余的反斜杠组可以复用最后一个比例：

oldString:  \\alpha + \\beta       ratios = [0.5, 0.5]  ← 均匀
newString:  \\delta + \\epsilon + \\gamma

slash(2) × 0.5 → 1 个
slash(2) × 0.5 → 1 个
slash(2) × 0.5 → 1 个（复用 lastRatio）
→ \delta + \epsilon + \gamma  ✓

决策链

findLooseEscapeMatches 找到唯一匹配 (score=1)
        │
        ▼
fixNewStringEscaping() 确定性修正
   ├─ 成功 (ok:true)  → 直接用修正后的 newString 写入，matched_via="loose_escape"
   └─ 失败 (ok:false，比例不均匀或结构不一致)
        │
        ▼
   correctEscapedStringsWithLLM() LLM 辅助修正
      ├─ 成功 → matched_via="llm_escape_correction"
      └─ 失败/不可用 → 返回错误（不静默写入）

fym998 · 2026-05-30T08:34:21Z

bug 的触发条件可能写得不准确（AI 写的），我自用的时候也是有时触发，有时又不触发，难以复现，我再检查一下

我还是没搞懂之前为什么会触发，因为触发很少，记录太多，找不着了。可能是过于复杂的 LaTeX 给 DS 整不会了，LLM correction 也修正不过来；也可能是当时对话意外中断导致的（未验证）。无论怎样，这个 PR 都能覆盖这些情况。

fym998 · 2026-05-30T10:20:51Z

我知道了，LLM correction 是在我遇到这个问题之后、修复之前加上的。不过我认为这个PR还是很有价值的，至少有修复残余错误、减少 LLM 调用开销、增强确定性/可解释性这三方面的改进。

fym998 added 3 commits May 30, 2026 12:57

fym998 marked this pull request as draft May 30, 2026 06:40

fym998 added 2 commits May 30, 2026 15:08

fym998 changed the title ~~fix(edit): 修复 loose_escape 回退路径中 newString 转义未被矫正的问题~~ fix(edit): 修复 LLM 转义错误时 loose_escape 回退路径写入错误内容的问题 May 30, 2026

fym998 marked this pull request as ready for review May 30, 2026 07:24

refactor(edit): rename escape correction cursors

6fd4e2e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(edit): 修复 LLM 转义错误时 loose_escape 回退路径写入错误内容的问题#136

fix(edit): 修复 LLM 转义错误时 loose_escape 回退路径写入错误内容的问题#136
fym998 wants to merge 6 commits into
lessweb:mainfrom
fym998:fix/loose-escape-newstring-correction

fym998 commented May 30, 2026 •

edited

Loading

Uh oh!

fym998 commented May 30, 2026 •

edited

Loading

Uh oh!

fym998 commented May 30, 2026

Uh oh!

fym998 commented May 30, 2026

Uh oh!

fym998 commented May 30, 2026

Uh oh!

fym998 commented May 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

fym998 commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

问题

解决方案

测试覆盖（36 个测试全部通过）

Uh oh!

fym998 commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fym998 commented May 30, 2026

Uh oh!

fym998 commented May 30, 2026

确定性修正算法 fixNewStringEscaping

决策链

Uh oh!

fym998 commented May 30, 2026

Uh oh!

fym998 commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fym998 commented May 30, 2026 •

edited

Loading

fym998 commented May 30, 2026 •

edited

Loading

确定性修正算法 `fixNewStringEscaping`

fym998 commented May 30, 2026 •

edited

Loading