Commit 8bc1877
Fix byte/character offset confusion in formatter for multi-byte UTF-8 (#105)
Formatting proto files with multi-byte UTF-8 characters (Cyrillic, etc.)
was non-idempotent, adding empty `// ` comment lines on each format
operation.
## Root Cause
`offset_to_position` in `src/formatter/clang.rs` converted
clang-format's byte offsets to LSP positions using byte arithmetic:
```rust
let character = offset - last_newline; // treats byte offset as character offset
```
This fails for multi-byte UTF-8. Example: byte offset 134 in Cyrillic
text → calculated position 119 → should be 77 UTF-16 code units.
## Changes
- **Fixed offset calculation**: Count UTF-16 code units from last
newline instead of byte arithmetic
```rust
let text_after_newline = &up_to_offset[last_newline..];
let character = text_after_newline.encode_utf16().count();
```
- **Added tests**: `test_offset_to_position_cyrillic` (unit) and
`test_textedit_from_clang_output_cyrillic` (integration) with multi-byte
UTF-8 input
<!-- START COPILOT ORIGINAL PROMPT -->
<details>
<summary>Original prompt</summary>
>
> ----
>
> *This section details on the original issue you should resolve*
>
> <issue_title>Formatting inserts a new empty // comment line every time
(non-idempotent formatting)</issue_title>
> <issue_description>Hi! I just so happen to stumble upon this tricky
bug in formatting.
> ## What happened
> I have this example:
>
>
>
> ```proto
> message Test {
> // Обратная совместимость: если true, применяет фильтры
enabled_not_false и removed_not_true.
> int32 x = 1;
> }
> ```
>
> When applying a formatting it seem to try to split the comment in two
and spread among two lines. But in the end it just adds a new line with
empty comment:
> ```proto
> message Test {
> // Обратная совместимость: если true, применяет фильтры
enabled_not_false и removed_not_true.
> //
> int32 x = 1;
> }
> ```
>
> Further formatting just add empty line comments.
> ```proto
> message Test {
> // Обратная совместимость: если true, применяет фильтры
enabled_not_false и removed_not_true.
> //
> //
> //
> int32 x = 1;
> }
> ```
>
> ### Environment
>
> - OS: Linux fedora 42
> - Neovim: NVIM v0.11.1
> - protols: 0.13.2
> - clang-format: clang-format version 20.1.8 (Fedora 20.1.8-4.fc42)
> - Formatting trigger: Neovim LSP (`vim.lsp.buf.format()`), also
happens on `:w` (format-on-save enabled)
>
> ### Video Example
>
https://github.com/user-attachments/assets/f33be03e-78e6-45db-8c83-89ae15a31d0b
> </issue_description>
>
> ## Comments on the Issue (you are @copilot in this section)
>
> <comments>
> </comments>
>
</details>
<!-- START COPILOT CODING AGENT SUFFIX -->
- Fixes #104
<!-- START COPILOT CODING AGENT TIPS -->
---
✨ Let Copilot coding agent [set things up for
you](https://github.com/coder3101/protols/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot)
— coding agent works faster and does higher quality work when set up for
your repo.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: asharkhan3101 <140482588+asharkhan3101@users.noreply.github.com>1 parent f1ce7e5 commit 8bc1877
5 files changed
Lines changed: 73 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
50 | 54 | | |
51 | 55 | | |
52 | 56 | | |
| |||
178 | 182 | | |
179 | 183 | | |
180 | 184 | | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
181 | 225 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
Lines changed: 8 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
Lines changed: 8 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
Lines changed: 8 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
0 commit comments