Skip to content

Surrogate pair characters (emoji, rare CJK, etc.) break a, r, s, ~, ga commandsΒ #9931

@k1832

Description

@k1832

Describe the bug

When the cursor is on a character encoded as a UTF-16 surrogate pair β€” emojis (πŸ˜„), rare CJK characters (π©Έ½), musical symbols (π„ž), etc. β€” several character-level commands produce incorrect results.

Command Broken behavior
a (Append) Cursor lands before the character instead of after
r (Replace) Only replaces half the character, corrupting the text
s (Change char) Only deletes half the character before entering Insert mode
~ (Toggle case) Corrupts the character into a lone surrogate
ga (Unicode info) Shows the half-surrogate value instead of the full codepoint

To Reproduce

  1. Open a file containing πŸ˜„text
  2. Place cursor on πŸ˜„
  3. Press a (append), type !, press Esc
  4. See !πŸ˜„text β€” ! inserted before the emoji instead of after

Expected behavior

πŸ˜„!text β€” ! should be inserted after the emoji.

Environment (please complete the following information):

  • Extension (VsCodeVim) version: 1.32.4
  • VSCode version: 1.109.0
  • OS: Ubuntu 24.04

Additional context

position.getRight() increments by 1 UTF-16 code unit, but these characters are 2 code units (a surrogate pair). Moving by 1 lands between the pair, and VSCode's validatePosition clamps it back to the start.

x/X, l/h, and y already have surrogate boundary correction and work correctly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions