You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Description
Introduces major changes to the text-to-speech module based on Kokoro
model, including:
- **Multilingual text-to-speech** - a set of complete pipelines & voices
for different languages. A complete list of (currently) supported
languages can be found below.
- **Improved phonemization & speech quality** - utilizing neural
phonemization model as a fallback for the old lexicon-base phonemization
significantly improves speech quality, particularly for non-standard,
out of dictionary words.
- **Timestamp-based audio cutting** - an improve postprocessing
algorithm, eliminates artifacts introduced by .pte model, resulting in
cleaner, more natural speech.
- **API changes**: prepared for voice-cloning & custom, fine-tuned
versions of Kokoro model.
Supported language current status:
- 🇺🇸 American English: ✅
- 🇬🇧 British English: ✅
- 🇫🇷 French: ✅
- 🇪🇸 Spanish: ✅
- 🇵🇹/🇧🇷 Portugese: ✅
- 🇮🇹 Italian: ✅
- 🇵🇱 Polish: ✅
- 🇩🇪 German: ✅
- 🇮🇳 Hindi: ✅
- 🇯🇵 Japanese: ❌ (coming soon)
- 🇨🇳 Mandarin Chinese: ❌ (coming soon)
### Introduces a breaking change?
- [x] Yes
- [ ] No
There are 2 major breaking changes introduced by this PR:
- Changed **"synthezation from phonemes"** API.
Old API:
```
const audioData = await tts.forwardFromPhonemes({
phonemes:
'ɐ mˈæn hˌu dˈʌzᵊnt tɹˈʌst hɪmsˈɛlf, kæn nˈɛvəɹ ɹˈiᵊli tɹˈʌst ˈɛniwˌʌn
ˈɛls.',
});
```
New API:
```
const audioData = await tts.forward({
text:
'ɐ mˈæn hˌu dˈʌzᵊnt tɹˈʌst hɪmsˈɛlf, kæn nˈɛvəɹ ɹˈiᵊli tɹˈʌst ˈɛniwˌʌn
ˈɛls.',
phonemize: false, # Disables phonemization and treats text as phonemes
});
```
- Changed predefined model - voice setups. Now both model files &
voice/phonemization files are bundled together, due to languages like
Polish or German having fine-tuned model weights.
Old API:
```
const model = useTextToSpeech({
model: KOKORO_MEDIUM,
voice: KOKORO_VOICE_AF_HEART,
});
```
New API:
```
const model = useTextToSpeech(KOKORO_AMERICAN_ENGLISH_FEMALE_HEART);
```
### Type of change
- [ ] Bug fix (change which fixes an issue)
- [x] New feature (change which adds functionality)
- [ ] Documentation update (improves or adds clarity to existing
documentation)
- [x] Other (chores, tests, code style improvements etc.)
### Tested on
- [x] iOS
- [x] Android
### Testing instructions
Play around demo speech apps.
Unit tests for RNE-specific code will be added later on.
Phonemis package has it's own, wide range of unit tests implemented (see
[Phonemis repo](https://github.com/IgorSwat/Phonemis))
### Screenshots
<!-- Add screenshots here, if applicable -->
### Related issues
#712
### Checklist
- [x] I have performed a self-review of my code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have updated the documentation accordingly
- [ ] My changes generate no new warnings
### Additional notes
---------
Co-authored-by: Bartosz Hanc <bartosz.hanc02@gmail.com>
Co-authored-by: Mateusz Słuszniak <mateusz.sluszniak@swmansion.com>
0 commit comments