You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/03-hooks/01-natural-language-processing/useTextToSpeech.md
+55-6Lines changed: 55 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -82,17 +82,24 @@ You need more details? Check the following resources:
82
82
83
83
## Running the model
84
84
85
-
The module provides two ways to generate speech:
85
+
The module provides two ways to generate speech using either raw text or pre-generated phonemes:
86
86
87
-
1.[**`forward(text, speed)`**](../../06-api-reference/interfaces/TextToSpeechType.md#forward): Generates the complete audio waveform at once. Returns a promise resolving to a `Float32Array`.
87
+
### Using Text
88
+
89
+
1.[**`forward({ text, speed })`**](../../06-api-reference/interfaces/TextToSpeechType.md#forward): Generates the complete audio waveform at once. Returns a promise resolving to a `Float32Array`.
90
+
2.[**`stream({ text, speed, onNext, ... })`**](../../06-api-reference/interfaces/TextToSpeechType.md#stream): An async generator that yields chunks of audio as they are computed. This is ideal for reducing the "time to first audio" for long sentences.
91
+
92
+
### Using Phonemes
93
+
94
+
If you have pre-computed phonemes (e.g., from an external dictionary or a custom G2P model), you can skip the internal phoneme generation step:
95
+
96
+
1.[**`forwardFromPhonemes({ phonemes, speed })`**](../../06-api-reference/interfaces/TextToSpeechType.md#forwardfromphonemes): Generates the complete audio waveform from a phoneme string.
97
+
2.[**`streamFromPhonemes({ phonemes, speed, onNext, ... })`**](../../06-api-reference/interfaces/TextToSpeechType.md#streamfromphonemes): Streams audio chunks generated from a phoneme string.
88
98
89
99
:::note
90
-
Since it processes the entire text at once, it might take a significant amount of time to produce an audio for long text inputs.
100
+
Since `forward` and `forwardFromPhonemes` process the entire input at once, they might take a significant amount of time to produce audio for long inputs.
91
101
:::
92
102
93
-
2.[**`stream({ text, speed })`**](../../06-api-reference/interfaces/TextToSpeechType.md#stream): An async generator that yields chunks of audio as they are computed.
94
-
This is ideal for reducing the "time to first audio" for long sentences.
95
-
96
103
## Example
97
104
98
105
### Speech Synthesis
@@ -185,6 +192,48 @@ export default function App() {
185
192
}
186
193
```
187
194
195
+
### Synthesis from Phonemes
196
+
197
+
If you already have a phoneme string obtained from an external source (e.g. the Python `phonemizer` library,
198
+
`espeak-ng`, or any custom phonemizer), you can use `forwardFromPhonemes` or `streamFromPhonemes` to synthesize audio directly, skipping the phoneme generation stage.
Copy file name to clipboardExpand all lines: docs/docs/04-typescript-api/01-natural-language-processing/TextToSpeechModule.md
+43-4Lines changed: 43 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -53,16 +53,24 @@ For more information on resource sources, see [loading models](../../01-fundamen
53
53
54
54
## Running the model
55
55
56
-
The module provides two ways to generate speech:
56
+
The module provides two ways to generate speech using either raw text or pre-generated phonemes:
57
+
58
+
### Using Text
57
59
58
60
1.[**`forward(text, speed)`**](../../06-api-reference/classes/TextToSpeechModule.md#forward): Generates the complete audio waveform at once. Returns a promise resolving to a `Float32Array`.
61
+
2.[**`stream({ text, speed })`**](../../06-api-reference/classes/TextToSpeechModule.md#stream): An async generator that yields chunks of audio as they are computed. This is ideal for reducing the "time to first audio" for long sentences.
62
+
63
+
### Using Phonemes
64
+
65
+
If you have pre-computed phonemes (e.g., from an external dictionary or a custom G2P model), you can skip the internal phoneme generation step:
66
+
67
+
1.[**`forwardFromPhonemes(phonemes, speed)`**](../../06-api-reference/classes/TextToSpeechModule.md#forwardfromphonemes): Generates the complete audio waveform from a phoneme string.
68
+
2.[**`streamFromPhonemes({ phonemes, speed })`**](../../06-api-reference/classes/TextToSpeechModule.md#streamfromphonemes): Streams audio chunks generated from a phoneme string.
59
69
60
70
:::note
61
-
Since it processes the entire text at once, it might take a significant amount of time to produce an audio for long text inputs.
71
+
Since `forward` and `forwardFromPhonemes` process the entire input at once, they might take a significant amount of time to produce audio for long inputs.
62
72
:::
63
73
64
-
2.[**`stream({ text, speed })`**](../../06-api-reference/classes/TextToSpeechModule.md#stream): An async generator that yields chunks of audio as they are computed. This is ideal for reducing the "time to first audio" for long sentences.
65
-
66
74
## Example
67
75
68
76
### Speech Synthesis
@@ -135,3 +143,34 @@ try {
135
143
console.error('Streaming failed:', error);
136
144
}
137
145
```
146
+
147
+
### Synthesis from Phonemes
148
+
149
+
If you already have a phoneme string (e.g., from an external library), you can use `forwardFromPhonemes` or `streamFromPhonemes` to synthesize audio directly, skipping the internal phonemizer stage.
0 commit comments