Skip to content

Commit 9f752b6

Browse files
IgorSwatbarhancmsluszniak
authored
feat!: multilingual text-to-speech (#1134)
## Description Introduces major changes to the text-to-speech module based on Kokoro model, including: - **Multilingual text-to-speech** - a set of complete pipelines & voices for different languages. A complete list of (currently) supported languages can be found below. - **Improved phonemization & speech quality** - utilizing neural phonemization model as a fallback for the old lexicon-base phonemization significantly improves speech quality, particularly for non-standard, out of dictionary words. - **Timestamp-based audio cutting** - an improve postprocessing algorithm, eliminates artifacts introduced by .pte model, resulting in cleaner, more natural speech. - **API changes**: prepared for voice-cloning & custom, fine-tuned versions of Kokoro model. Supported language current status: - 🇺🇸 American English: ✅ - 🇬🇧 British English: ✅ - 🇫🇷 French: ✅ - 🇪🇸 Spanish: ✅ - 🇵🇹/🇧🇷 Portugese: ✅ - 🇮🇹 Italian: ✅ - 🇵🇱 Polish: ✅ - 🇩🇪 German: ✅ - 🇮🇳 Hindi: ✅ - 🇯🇵 Japanese: ❌ (coming soon) - 🇨🇳 Mandarin Chinese: ❌ (coming soon) ### Introduces a breaking change? - [x] Yes - [ ] No There are 2 major breaking changes introduced by this PR: - Changed **"synthezation from phonemes"** API. Old API: ``` const audioData = await tts.forwardFromPhonemes({ phonemes: 'ɐ mˈæn hˌu dˈʌzᵊnt tɹˈʌst hɪmsˈɛlf, kæn nˈɛvəɹ ɹˈiᵊli tɹˈʌst ˈɛniwˌʌn ˈɛls.', }); ``` New API: ``` const audioData = await tts.forward({ text: 'ɐ mˈæn hˌu dˈʌzᵊnt tɹˈʌst hɪmsˈɛlf, kæn nˈɛvəɹ ɹˈiᵊli tɹˈʌst ˈɛniwˌʌn ˈɛls.', phonemize: false, # Disables phonemization and treats text as phonemes }); ``` - Changed predefined model - voice setups. Now both model files & voice/phonemization files are bundled together, due to languages like Polish or German having fine-tuned model weights. Old API: ``` const model = useTextToSpeech({ model: KOKORO_MEDIUM, voice: KOKORO_VOICE_AF_HEART, }); ``` New API: ``` const model = useTextToSpeech(KOKORO_AMERICAN_ENGLISH_FEMALE_HEART); ``` ### Type of change - [ ] Bug fix (change which fixes an issue) - [x] New feature (change which adds functionality) - [ ] Documentation update (improves or adds clarity to existing documentation) - [x] Other (chores, tests, code style improvements etc.) ### Tested on - [x] iOS - [x] Android ### Testing instructions Play around demo speech apps. Unit tests for RNE-specific code will be added later on. Phonemis package has it's own, wide range of unit tests implemented (see [Phonemis repo](https://github.com/IgorSwat/Phonemis)) ### Screenshots <!-- Add screenshots here, if applicable --> ### Related issues #712 ### Checklist - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have updated the documentation accordingly - [ ] My changes generate no new warnings ### Additional notes --------- Co-authored-by: Bartosz Hanc <bartosz.hanc02@gmail.com> Co-authored-by: Mateusz Słuszniak <mateusz.sluszniak@swmansion.com>
1 parent 8fc7683 commit 9f752b6

60 files changed

Lines changed: 1411 additions & 1506 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.cspell-wordlist.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,3 +206,9 @@ Deinitialize
206206
fastsam
207207
promptable
208208
topk
209+
phonemize
210+
phonemization
211+
Siwis
212+
SIWIS
213+
Mateusz
214+
MATEUSZ

.gitmodules

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,7 @@
44
[submodule "third-party/googletest"]
55
path = third-party/googletest
66
url = https://github.com/google/googletest.git
7+
[submodule "packages/react-native-executorch/third-party/common/phonemis"]
8+
path = packages/react-native-executorch/third-party/common/phonemis
9+
url = https://github.com/IgorSwat/Phonemis
10+
branch = main

apps/speech/components/ModelPicker.tsx

Lines changed: 62 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
import React, { useEffect, useRef, useState } from 'react';
22
import {
33
Dimensions,
4+
Modal,
45
ScrollView,
56
StyleSheet,
67
Text,
78
TouchableOpacity,
9+
TouchableWithoutFeedback,
810
View,
911
} from 'react-native';
1012

@@ -21,7 +23,7 @@ type Props<T> = {
2123
disabled?: boolean;
2224
};
2325

24-
const DROPDOWN_MAX_HEIGHT = 200;
26+
const DROPDOWN_MAX_HEIGHT = 300;
2527

2628
export function ModelPicker<T>({
2729
models,
@@ -31,8 +33,11 @@ export function ModelPicker<T>({
3133
disabled,
3234
}: Props<T>) {
3335
const [open, setOpen] = useState(false);
34-
const [triggerHeight, setTriggerHeight] = useState(0);
35-
const [expandUp, setExpandUp] = useState(false);
36+
const [dropdownLayout, setDropdownLayout] = useState({
37+
x: 0,
38+
y: 0,
39+
width: 0,
40+
});
3641
const triggerRef = useRef<React.ComponentRef<typeof TouchableOpacity>>(null);
3742
const selected = models.find((m) => m.value === selectedModel);
3843

@@ -50,23 +55,22 @@ export function ModelPicker<T>({
5055
(
5156
_x: number,
5257
_y: number,
53-
_width: number,
58+
width: number,
5459
height: number,
55-
_pageX: number,
60+
pageX: number,
5661
pageY: number
5762
) => {
58-
setTriggerHeight(height);
5963
const spaceBelow = Dimensions.get('window').height - (pageY + height);
60-
setExpandUp(spaceBelow < DROPDOWN_MAX_HEIGHT);
64+
const y =
65+
spaceBelow >= DROPDOWN_MAX_HEIGHT
66+
? pageY + height + 2
67+
: pageY - Math.min(DROPDOWN_MAX_HEIGHT, models.length * 42) - 2;
68+
setDropdownLayout({ x: pageX, y, width });
6169
setOpen(true);
6270
}
6371
);
6472
};
6573

66-
const dropdownPosition = expandUp
67-
? { bottom: triggerHeight + 2 }
68-
: { top: triggerHeight + 2 };
69-
7074
return (
7175
<View style={styles.container}>
7276
<TouchableOpacity
@@ -80,36 +84,51 @@ export function ModelPicker<T>({
8084
<Text style={styles.chevron}>{open ? '▲' : '▼'}</Text>
8185
</TouchableOpacity>
8286

83-
{open && (
84-
<ScrollView
85-
style={[styles.dropdown, dropdownPosition]}
86-
nestedScrollEnabled
87-
keyboardShouldPersistTaps="handled"
88-
>
89-
{models.map((item) => {
90-
const isSelected = item.value === selectedModel;
91-
return (
92-
<TouchableOpacity
93-
key={item.label}
94-
style={[styles.option, isSelected && styles.optionSelected]}
95-
onPress={() => {
96-
onSelect(item.value);
97-
setOpen(false);
98-
}}
99-
>
100-
<Text
101-
style={[
102-
styles.optionText,
103-
isSelected && styles.optionTextSelected,
104-
]}
105-
>
106-
{item.label}
107-
</Text>
108-
</TouchableOpacity>
109-
);
110-
})}
111-
</ScrollView>
112-
)}
87+
<Modal
88+
visible={open}
89+
transparent
90+
animationType="none"
91+
onRequestClose={() => setOpen(false)}
92+
>
93+
<TouchableWithoutFeedback onPress={() => setOpen(false)}>
94+
<View style={StyleSheet.absoluteFill}>
95+
<ScrollView
96+
style={[
97+
styles.dropdown,
98+
{
99+
top: dropdownLayout.y,
100+
left: dropdownLayout.x,
101+
width: dropdownLayout.width,
102+
},
103+
]}
104+
keyboardShouldPersistTaps="handled"
105+
>
106+
{models.map((item) => {
107+
const isSelected = item.value === selectedModel;
108+
return (
109+
<TouchableOpacity
110+
key={item.label}
111+
style={[styles.option, isSelected && styles.optionSelected]}
112+
onPress={() => {
113+
onSelect(item.value);
114+
setOpen(false);
115+
}}
116+
>
117+
<Text
118+
style={[
119+
styles.optionText,
120+
isSelected && styles.optionTextSelected,
121+
]}
122+
>
123+
{item.label}
124+
</Text>
125+
</TouchableOpacity>
126+
);
127+
})}
128+
</ScrollView>
129+
</View>
130+
</TouchableWithoutFeedback>
131+
</Modal>
113132
</View>
114133
);
115134
}
@@ -119,7 +138,6 @@ const styles = StyleSheet.create({
119138
marginHorizontal: 12,
120139
marginVertical: 4,
121140
alignSelf: 'stretch',
122-
zIndex: 100,
123141
},
124142
trigger: {
125143
flexDirection: 'row',
@@ -152,18 +170,15 @@ const styles = StyleSheet.create({
152170
},
153171
dropdown: {
154172
position: 'absolute',
155-
left: 0,
156-
right: 0,
157173
borderWidth: 1,
158174
borderColor: '#C1C6E5',
159175
borderRadius: 8,
160176
backgroundColor: '#fff',
161177
maxHeight: DROPDOWN_MAX_HEIGHT,
162-
zIndex: 100,
163-
elevation: 4,
178+
elevation: 8,
164179
shadowColor: '#000',
165180
shadowOffset: { width: 0, height: 2 },
166-
shadowOpacity: 0.1,
181+
shadowOpacity: 0.15,
167182
shadowRadius: 4,
168183
},
169184
option: {

apps/speech/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
"metro-config": "^0.83.0",
2121
"react": "19.2.5",
2222
"react-native": "0.83.4",
23-
"react-native-audio-api": "0.12.0",
23+
"react-native-audio-api": "0.12.2",
2424
"react-native-device-info": "^15.0.2",
2525
"react-native-executorch": "workspace:*",
2626
"react-native-executorch-expo-resource-fetcher": "workspace:*",

apps/speech/screens/Quiz.tsx

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,7 @@ import Animated, {
1818
} from 'react-native-reanimated';
1919
import { SafeAreaProvider, SafeAreaView } from 'react-native-safe-area-context';
2020
import {
21-
KOKORO_MEDIUM,
22-
KOKORO_VOICE_AM_SANTA,
21+
KOKORO_AMERICAN_ENGLISH_MALE_SANTA,
2322
useTextToSpeech,
2423
} from 'react-native-executorch';
2524
import {
@@ -60,10 +59,7 @@ const createAudioBufferFromVector = (
6059

6160
export const Quiz = ({ onBack }: { onBack: () => void }) => {
6261
// --- Hooks & State ---
63-
const model = useTextToSpeech({
64-
model: KOKORO_MEDIUM,
65-
voice: KOKORO_VOICE_AM_SANTA,
66-
});
62+
const model = useTextToSpeech(KOKORO_AMERICAN_ENGLISH_MALE_SANTA);
6763

6864
const [shuffledQuestions] = useState(() => shuffleArray(QUESTIONS));
6965
const [currentIndex, setCurrentIndex] = useState(0);

apps/speech/screens/TextToSpeechLLMScreen.tsx

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,7 @@ import SWMIcon from '../assets/swm_icon.svg';
1212
import {
1313
useLLM,
1414
useTextToSpeech,
15-
KOKORO_MEDIUM,
16-
KOKORO_VOICE_AF_HEART,
15+
KOKORO_AMERICAN_ENGLISH_FEMALE_HEART,
1716
LLAMA3_2_1B_QLORA,
1817
} from 'react-native-executorch';
1918
import {
@@ -54,10 +53,7 @@ export const TextToSpeechLLMScreen = ({ onBack }: TextToSpeechLLMProps) => {
5453
const [displayText, setDisplayText] = useState('');
5554
const [isTtsStreaming, setIsTtsStreaming] = useState(false);
5655
const llm = useLLM({ model: LLAMA3_2_1B_QLORA });
57-
const tts = useTextToSpeech({
58-
model: KOKORO_MEDIUM,
59-
voice: KOKORO_VOICE_AF_HEART,
60-
});
56+
const tts = useTextToSpeech(KOKORO_AMERICAN_ENGLISH_FEMALE_HEART);
6157

6258
const processedLengthRef = useRef(0);
6359
const audioContextRef = useRef<AudioContext | null>(null);

0 commit comments

Comments
 (0)