Skip to content

Commit 377f659

Browse files
committed
Update docs & formatting
1 parent 5a125c7 commit 377f659

4 files changed

Lines changed: 101 additions & 14 deletions

File tree

.cspell-wordlist.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,3 +127,4 @@ detr
127127
metaprogramming
128128
ktlint
129129
lefthook
130+
espeak

docs/docs/03-hooks/01-natural-language-processing/useTextToSpeech.md

Lines changed: 55 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -82,17 +82,24 @@ You need more details? Check the following resources:
8282

8383
## Running the model
8484

85-
The module provides two ways to generate speech:
85+
The module provides two ways to generate speech using either raw text or pre-generated phonemes:
8686

87-
1. [**`forward(text, speed)`**](../../06-api-reference/interfaces/TextToSpeechType.md#forward): Generates the complete audio waveform at once. Returns a promise resolving to a `Float32Array`.
87+
### Using Text
88+
89+
1. [**`forward({ text, speed })`**](../../06-api-reference/interfaces/TextToSpeechType.md#forward): Generates the complete audio waveform at once. Returns a promise resolving to a `Float32Array`.
90+
2. [**`stream({ text, speed, onNext, ... })`**](../../06-api-reference/interfaces/TextToSpeechType.md#stream): An async generator that yields chunks of audio as they are computed. This is ideal for reducing the "time to first audio" for long sentences.
91+
92+
### Using Phonemes
93+
94+
If you have pre-computed phonemes (e.g., from an external dictionary or a custom G2P model), you can skip the internal phoneme generation step:
95+
96+
1. [**`forwardFromPhonemes({ phonemes, speed })`**](../../06-api-reference/interfaces/TextToSpeechType.md#forwardfromphonemes): Generates the complete audio waveform from a phoneme string.
97+
2. [**`streamFromPhonemes({ phonemes, speed, onNext, ... })`**](../../06-api-reference/interfaces/TextToSpeechType.md#streamfromphonemes): Streams audio chunks generated from a phoneme string.
8898

8999
:::note
90-
Since it processes the entire text at once, it might take a significant amount of time to produce an audio for long text inputs.
100+
Since `forward` and `forwardFromPhonemes` process the entire input at once, they might take a significant amount of time to produce audio for long inputs.
91101
:::
92102

93-
2. [**`stream({ text, speed })`**](../../06-api-reference/interfaces/TextToSpeechType.md#stream): An async generator that yields chunks of audio as they are computed.
94-
This is ideal for reducing the "time to first audio" for long sentences.
95-
96103
## Example
97104

98105
### Speech Synthesis
@@ -185,6 +192,48 @@ export default function App() {
185192
}
186193
```
187194

195+
### Synthesis from Phonemes
196+
197+
If you already have a phoneme string obtained from an external source (e.g. the Python `phonemizer` library,
198+
`espeak-ng`, or any custom phonemizer), you can use `forwardFromPhonemes` or `streamFromPhonemes` to synthesize audio directly, skipping the phoneme generation stage.
199+
200+
```tsx
201+
import React from 'react';
202+
import { Button, View } from 'react-native';
203+
import {
204+
useTextToSpeech,
205+
KOKORO_MEDIUM,
206+
KOKORO_VOICE_AF_HEART,
207+
} from 'react-native-executorch';
208+
209+
export default function App() {
210+
const tts = useTextToSpeech({
211+
model: KOKORO_MEDIUM,
212+
voice: KOKORO_VOICE_AF_HEART,
213+
});
214+
215+
const synthesizePhonemes = async () => {
216+
// Example phonemes for "Hello"
217+
const audioData = await tts.forwardFromPhonemes({
218+
phonemes:
219+
'ɐ mˈæn hˌu dˈʌzᵊnt tɹˈʌst hɪmsˈɛlf, kæn nˈɛvəɹ ɹˈiᵊli tɹˈʌst ˈɛniwˌʌn ˈɛls.',
220+
});
221+
222+
// ... process or play audioData ...
223+
};
224+
225+
return (
226+
<View style={{ flex: 1, justifyContent: 'center', alignItems: 'center' }}>
227+
<Button
228+
title="Synthesize Phonemes"
229+
onPress={synthesizePhonemes}
230+
disabled={!tts.isReady}
231+
/>
232+
</View>
233+
);
234+
}
235+
```
236+
188237
## Supported models
189238

190239
| Model | Language |

docs/docs/04-typescript-api/01-natural-language-processing/TextToSpeechModule.md

Lines changed: 43 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -53,16 +53,24 @@ For more information on resource sources, see [loading models](../../01-fundamen
5353

5454
## Running the model
5555

56-
The module provides two ways to generate speech:
56+
The module provides two ways to generate speech using either raw text or pre-generated phonemes:
57+
58+
### Using Text
5759

5860
1. [**`forward(text, speed)`**](../../06-api-reference/classes/TextToSpeechModule.md#forward): Generates the complete audio waveform at once. Returns a promise resolving to a `Float32Array`.
61+
2. [**`stream({ text, speed })`**](../../06-api-reference/classes/TextToSpeechModule.md#stream): An async generator that yields chunks of audio as they are computed. This is ideal for reducing the "time to first audio" for long sentences.
62+
63+
### Using Phonemes
64+
65+
If you have pre-computed phonemes (e.g., from an external dictionary or a custom G2P model), you can skip the internal phoneme generation step:
66+
67+
1. [**`forwardFromPhonemes(phonemes, speed)`**](../../06-api-reference/classes/TextToSpeechModule.md#forwardfromphonemes): Generates the complete audio waveform from a phoneme string.
68+
2. [**`streamFromPhonemes({ phonemes, speed })`**](../../06-api-reference/classes/TextToSpeechModule.md#streamfromphonemes): Streams audio chunks generated from a phoneme string.
5969

6070
:::note
61-
Since it processes the entire text at once, it might take a significant amount of time to produce an audio for long text inputs.
71+
Since `forward` and `forwardFromPhonemes` process the entire input at once, they might take a significant amount of time to produce audio for long inputs.
6272
:::
6373

64-
2. [**`stream({ text, speed })`**](../../06-api-reference/classes/TextToSpeechModule.md#stream): An async generator that yields chunks of audio as they are computed. This is ideal for reducing the "time to first audio" for long sentences.
65-
6674
## Example
6775

6876
### Speech Synthesis
@@ -135,3 +143,34 @@ try {
135143
console.error('Streaming failed:', error);
136144
}
137145
```
146+
147+
### Synthesis from Phonemes
148+
149+
If you already have a phoneme string (e.g., from an external library), you can use `forwardFromPhonemes` or `streamFromPhonemes` to synthesize audio directly, skipping the internal phonemizer stage.
150+
151+
```typescript
152+
import {
153+
TextToSpeechModule,
154+
KOKORO_MEDIUM,
155+
KOKORO_VOICE_AF_HEART,
156+
} from 'react-native-executorch';
157+
158+
const tts = new TextToSpeechModule();
159+
160+
await tts.load({
161+
model: KOKORO_MEDIUM,
162+
voice: KOKORO_VOICE_AF_HEART,
163+
});
164+
165+
// Example phonemes for "ExecuTorch"
166+
const waveform = await tts.forwardFromPhonemes('həlˈO wˈɜɹld!', 1.0);
167+
168+
// Or stream from phonemes
169+
for await (const chunk of tts.streamFromPhonemes({
170+
phonemes:
171+
'ɐ mˈæn hˌu dˈʌzᵊnt tɹˈʌst hɪmsˈɛlf, kæn nˈɛvəɹ ɹˈiᵊli tɹˈʌst ˈɛniwˌʌn ˈɛls.',
172+
speed: 1.0,
173+
})) {
174+
// ... process chunk ...
175+
}
176+
```

packages/react-native-executorch/src/types/tts.ts

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -203,8 +203,7 @@ export interface TextToSpeechStreamingCallbacks {
203203
* @category Types
204204
*/
205205
export interface TextToSpeechStreamingInput
206-
extends TextToSpeechInput,
207-
TextToSpeechStreamingCallbacks {}
206+
extends TextToSpeechInput, TextToSpeechStreamingCallbacks {}
208207

209208
/**
210209
* Streaming input definition for pre-computed phonemes.
@@ -213,5 +212,4 @@ export interface TextToSpeechStreamingInput
213212
* @category Types
214213
*/
215214
export interface TextToSpeechStreamingPhonemeInput
216-
extends TextToSpeechPhonemeInput,
217-
TextToSpeechStreamingCallbacks {}
215+
extends TextToSpeechPhonemeInput, TextToSpeechStreamingCallbacks {}

0 commit comments

Comments
 (0)