Commit 5662a32
committed
fix: add charset_version to Hindi/Arabic tokenizers for backward compatibility
Introduce a parameter in HindiCharsTokenizer and
ArabicCharsTokenizer so old models (v1: case='mixed') keep working
while new models train with the corrected charset (v2: case='upper').
- Define CASELESS_SCRIPT_TOKENIZER_TARGETS and DEFAULT_CHARSET_VERSION
constants in tts_tokenizers.py
- Persist charset_version into the OmegaConf config during training
(setup_tokenizers) so .nemo archives record which version was used
- Add _migrate_charset_version() helper in magpietts inference utils
to pin charset_version=1 for old checkpoints that lack the field,
preventing a silent vocabulary mismatch at inference time
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>1 parent 4762ab3 commit 5662a32
3 files changed
Lines changed: 49 additions & 2 deletions
File tree
- nemo/collections
- common/tokenizers/text_to_speech
- tts
- data
- modules/magpietts_inference
Lines changed: 9 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
45 | 54 | | |
46 | 55 | | |
47 | 56 | | |
| |||
Lines changed: 19 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
28 | 33 | | |
29 | 34 | | |
30 | 35 | | |
| |||
52 | 57 | | |
53 | 58 | | |
54 | 59 | | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
55 | 72 | | |
56 | 73 | | |
57 | 74 | | |
| |||
Lines changed: 21 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| |||
149 | 150 | | |
150 | 151 | | |
151 | 152 | | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
152 | 171 | | |
153 | 172 | | |
154 | 173 | | |
| |||
223 | 242 | | |
224 | 243 | | |
225 | 244 | | |
| 245 | + | |
| 246 | + | |
226 | 247 | | |
227 | 248 | | |
228 | 249 | | |
| |||
0 commit comments