Commit ea8901d
fix: hoist sanitize mapping to module constant, fix U+001E em-dash mismap
- Hoisted replacements dict to _CU_REPLACEMENTS module-level constant
to avoid per-call allocation overhead.
- Added _CU_BAD_CHARS set for O(1) early-exit when text has no corrupted chars.
- Fixed U+001E mapping: was incorrectly mapped to U+2014 (em dash),
now correctly maps to U+201E (double low-9 quotation mark) per the
high-byte stripping model.
- Added U+0014 -> U+2014 mapping for actual em dash corruption.
- Reworded docstring to clarify no-op semantics.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>1 parent 12d5053 commit ea8901d
1 file changed
Lines changed: 21 additions & 18 deletions
Lines changed: 21 additions & 18 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
10 | | - | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
11 | 23 | | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
| 24 | + | |
| 25 | + | |
18 | 26 | | |
19 | | - | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
20 | 30 | | |
21 | | - | |
| 31 | + | |
22 | 32 | | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
| 33 | + | |
31 | 34 | | |
32 | 35 | | |
33 | 36 | | |
| |||
0 commit comments