Commit 52ba833
Add hi-IN , Ko-KR and pt-BR IPA tokenizer support (#15567)
* feat(tts): extend IPA tokenizer with hi-IN/en code-switching and pt-BR
* feat(tts): remove ar-MSA locale as out of scope for this PR
Signed-off-by: quanpham <youngkwan199@gmail.com>
* Apply isort and black reformatting
Signed-off-by: quapham <quapham@users.noreply.github.com>
* Add Korean IPA support
Signed-off-by: quanpham <youngkwan199@gmail.com>
* Fix leftover merge markers in Korean IPA support
Signed-off-by: quanpham <youngkwan199@gmail.com>
* Apply isort and black reformatting
Signed-off-by: quapham <quapham@users.noreply.github.com>
* fix: add KOREAN_CHARS import
Signed-off-by: quanpham <youngkwan199@gmail.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
* Apply suggestion from @XuesongYang
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
* Apply suggestion from @XuesongYang
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
* Update tests/collections/common/tokenizers/text_to_speech/test_tts_tokenizers.py
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
* Apply suggestion from @XuesongYang
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
* Apply suggestion from @XuesongYang
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
* Apply suggestion from @XuesongYang
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
* Apply suggestion from @XuesongYang
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
* Apply suggestion from @XuesongYang
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
* WIP: save local changes before rebasing
* fix: update IPAG2p typing and docs
Signed-off-by: quanpham <youngkwan199@gmail.com>
* Apply isort and black reformatting
Signed-off-by: quapham <quapham@users.noreply.github.com>
* bugfix: unit test of hindi
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
* refactor: introduce a combined constant WORD_CHARS_ALL.
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
* fix: backward-compatible punctuation for pt-BR and hi-IN tokenizers
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
* bugfix: L2_TTS_Fast_dev_runs_Magpietts_OnlineCFGDistillation.sh
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
---------
Signed-off-by: quanpham <youngkwan199@gmail.com>
Signed-off-by: quapham <quapham@users.noreply.github.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: quapham <quapham@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>1 parent 87ccac8 commit 52ba833
11 files changed
Lines changed: 1496984 additions & 53 deletions
File tree
- nemo/collections
- common/tokenizers/text_to_speech
- tts
- data
- g2p/models
- modules/magpietts_inference
- scripts/tts_dataset_files
- hi_IN
- ko_KR
- pt_BR
- tests
- collections/common/tokenizers/text_to_speech
- functional_tests
Lines changed: 46 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
19 | 25 | | |
| 26 | + | |
20 | 27 | | |
21 | 28 | | |
22 | 29 | | |
| |||
240 | 247 | | |
241 | 248 | | |
242 | 249 | | |
243 | | - | |
| 250 | + | |
244 | 251 | | |
245 | 252 | | |
246 | 253 | | |
| |||
347 | 354 | | |
348 | 355 | | |
349 | 356 | | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
350 | 394 | | |
351 | 395 | | |
Lines changed: 25 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
32 | 35 | | |
33 | 36 | | |
34 | 37 | | |
| |||
52 | 55 | | |
53 | 56 | | |
54 | 57 | | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
55 | 79 | | |
56 | 80 | | |
57 | 81 | | |
58 | 82 | | |
59 | | - | |
| 83 | + | |
60 | 84 | | |
61 | 85 | | |
62 | 86 | | |
| |||
Lines changed: 40 additions & 31 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| |||
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| 28 | + | |
27 | 29 | | |
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
| 33 | + | |
31 | 34 | | |
32 | 35 | | |
33 | 36 | | |
| |||
110 | 113 | | |
111 | 114 | | |
112 | 115 | | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | | - | |
120 | | - | |
| 116 | + | |
121 | 117 | | |
122 | 118 | | |
123 | 119 | | |
| |||
392 | 388 | | |
393 | 389 | | |
394 | 390 | | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
395 | 397 | | |
396 | 398 | | |
397 | 399 | | |
| |||
404 | 406 | | |
405 | 407 | | |
406 | 408 | | |
| 409 | + | |
407 | 410 | | |
408 | 411 | | |
409 | 412 | | |
| |||
414 | 417 | | |
415 | 418 | | |
416 | 419 | | |
417 | | - | |
| 420 | + | |
| 421 | + | |
418 | 422 | | |
419 | 423 | | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
420 | 439 | | |
421 | 440 | | |
422 | 441 | | |
| |||
471 | 490 | | |
472 | 491 | | |
473 | 492 | | |
474 | | - | |
475 | | - | |
476 | | - | |
477 | | - | |
478 | | - | |
479 | | - | |
480 | | - | |
481 | | - | |
482 | 493 | | |
483 | 494 | | |
484 | 495 | | |
| |||
628 | 639 | | |
629 | 640 | | |
630 | 641 | | |
| 642 | + | |
| 643 | + | |
631 | 644 | | |
632 | | - | |
633 | | - | |
634 | | - | |
635 | | - | |
636 | | - | |
637 | 645 | | |
638 | 646 | | |
639 | 647 | | |
| |||
773 | 781 | | |
774 | 782 | | |
775 | 783 | | |
776 | | - | |
| 784 | + | |
777 | 785 | | |
778 | 786 | | |
779 | 787 | | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
780 | 792 | | |
781 | 793 | | |
782 | 794 | | |
| |||
799 | 811 | | |
800 | 812 | | |
801 | 813 | | |
| 814 | + | |
802 | 815 | | |
803 | 816 | | |
804 | 817 | | |
| |||
851 | 864 | | |
852 | 865 | | |
853 | 866 | | |
854 | | - | |
| 867 | + | |
855 | 868 | | |
| 869 | + | |
| 870 | + | |
856 | 871 | | |
857 | 872 | | |
858 | 873 | | |
| |||
964 | 979 | | |
965 | 980 | | |
966 | 981 | | |
967 | | - | |
968 | | - | |
969 | | - | |
970 | | - | |
971 | | - | |
972 | | - | |
| 982 | + | |
973 | 983 | | |
974 | | - | |
975 | 984 | | |
976 | 985 | | |
977 | 986 | | |
| |||
Lines changed: 23 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
51 | 73 | | |
52 | 74 | | |
53 | 75 | | |
| |||
0 commit comments