Skip to content

fix(edit): correct case-insensitive search/replace offsets for non-length-preserving lowercasing#12754

Open
nyxst4ck wants to merge 1 commit into
continuedev:mainfrom
nyxst4ck:fix-caseinsensitive-match-unicode-offset
Open

fix(edit): correct case-insensitive search/replace offsets for non-length-preserving lowercasing#12754
nyxst4ck wants to merge 1 commit into
continuedev:mainfrom
nyxst4ck:fix-caseinsensitive-match-unicode-offset

Conversation

@nyxst4ck

@nyxst4ck nyxst4ck commented Jun 18, 2026

Copy link
Copy Markdown

Description

caseInsensitiveMatch (the fallback strategy used by findSearchMatch / findSearchMatches, which back edit_existing_file and the multi-edit tool) located the match by running indexOf on the fully lowercased file content, then returned that index — plus searchContent.length — as a position into the original content:

const lowerFileContent = fileContent.toLowerCase();
const index = lowerFileContent.indexOf(searchContent.toLowerCase());
return { startIndex: index, endIndex: index + searchContent.length };

String.prototype.toLowerCase() is not length-preserving for every character. For example "İ" (U+0130, LATIN CAPITAL LETTER I WITH DOT ABOVE) lowercases to "i" + a combining dot (two UTF-16 units). When such a character appears before the match, every subsequent index in the lowercased string is shifted relative to the original, so the returned startIndex/endIndex point at the wrong characters. When the character is inside the match, endIndex is wrong too because it is derived from the search length rather than the matched region's actual length.

Because these indices feed directly into fileContent.substring(0, start) + new + fileContent.substring(end) in executeFindAndReplace, the replacement lands at the wrong offset and corrupts the file. Concrete example (file has a leading İ in a comment):

// İ marker comment        oldString: "CONST VALUE = 1;"   (case-insensitive)
const value = 1;           newString: "const value = 2;"

produces cconst value = 2; on main (a duplicated c, original t dropped) instead of const value = 2;.

Fix

Scan the original content and, for each candidate start position, lowercase only the slice that lines up with the lowercased search string, comparing for equality. This keeps both indices anchored to the original string, and the end index comes from the matched slice rather than searchContent.length. Behaviour for ASCII / length-preserving input is unchanged (all 128 existing tests still pass).

Checklist

  • I've read the contributing guide
  • The relevant docs, if any, have been updated or created (n/a — internal logic fix)
  • The relevant tests, if any, have been updated or created

Tests

Verified RED-on-main, GREEN-with-fix by reverting the implementation while keeping the new tests.

  • core/edit/searchAndReplace/findSearchMatch.vitest.ts — two cases: an earlier character that changes length on lowercase, and the matched region itself changing length. Both assert the returned indices slice the correctly-cased original text.
  • core/edit/searchAndReplace/executeFindAndReplace.vitest.ts — end-to-end test asserting the file is not corrupted.
edit/searchAndReplace/findSearchMatch.vitest.ts          (50 tests) ✓
edit/searchAndReplace/executeFindAndReplace.vitest.ts    (30 tests) ✓
edit/searchAndReplace/findAndReplaceUtils.vitest.ts      (25 tests) ✓
edit/searchAndReplace/multiEdit.vitest.ts                (23 tests) ✓
edit/searchAndReplace/findSearchMatches.test.ts          (32 tests, jest) ✓

prettier clean on the changed files; tsc reports no errors in any changed file.


Summary by cubic

Fixes case-insensitive search/replace offsets when lowercasing changes string length, preventing misaligned edits and file corruption. Updates caseInsensitiveMatch to anchor indices to the original content and adds regression tests.

  • Bug Fixes
    • Scan the original string and lowercase only the candidate slice to compare with the lowercased search.
    • Derive endIndex from the matched slice, not searchContent.length, covering cases like "İ""i̇".
    • Added unit tests for pre-match and in-match length changes, plus an end-to-end test in executeFindAndReplace.

Written for commit 91d67a4. Summary will update on new commits.

Review in cubic

…ngth-preserving lowercasing

caseInsensitiveMatch located the match by calling indexOf on the fully
lowercased file content, then returned that index (plus searchContent.length)
as a position into the ORIGINAL content. String.prototype.toLowerCase is not
length-preserving for every character (e.g. "İ" U+0130 lowercases to "i" + a
combining dot, two UTF-16 units), so any such character before or within the
match shifts every subsequent index. The returned slice was misaligned, and an
edit_existing_file / multi-edit replacement landed at the wrong offset and
corrupted the file (e.g. "const value = 1;" became "cconst value = 2;").

Fix: scan the original content and, for each candidate start, lowercase only the
aligned slice, comparing against the lowercased search. Indices stay anchored to
the original string and the end index is derived from the matched slice rather
than the search length.

Adds regression tests in findSearchMatch.vitest.ts (both an earlier character
and the matched region itself changing length) and an end-to-end test in
executeFindAndReplace.vitest.ts.
@nyxst4ck nyxst4ck requested a review from a team as a code owner June 18, 2026 00:27
@nyxst4ck nyxst4ck requested review from sestinj and removed request for a team June 18, 2026 00:27
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 18, 2026
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Re-trigger cubic

@nyxst4ck

Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant