Skip to content

Commit 04a71b4

Browse files
committed
Update docs & API reference
1 parent 081aea0 commit 04a71b4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+444
-152
lines changed

docs/docs/03-hooks/01-natural-language-processing/useSpeechToText.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ Since speech-to-text models can only process audio segments up to 30 seconds lon
6666

6767
`useSpeechToText` takes [`SpeechToTextProps`](../../06-api-reference/interfaces/SpeechToTextProps.md) that consists of:
6868

69-
- `model` of type [`SpeechToTextConfig`](../../06-api-reference/interfaces/SpeechToTextModelConfig.md), containing the [`isMultilingual` flag](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#ismultilingual), [tokenizer source](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#tokenizersource), [encoder source](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#encodersource), and [decoder source](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#decodersource).
69+
- `model` of type [`SpeechToTextConfig`](../../06-api-reference/interfaces/SpeechToTextModelConfig.md), containing the [`isMultilingual` flag](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#ismultilingual), [tokenizer source](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#tokenizersource) and [model source](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#modelsource).
7070
- An optional flag [`preventLoad`](../../06-api-reference/interfaces/SpeechToTextProps.md#preventload) which prevents auto-loading of the model.
7171

7272
You need more details? Check the following resources:

docs/docs/04-typescript-api/01-natural-language-processing/SpeechToTextModule.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,9 +45,7 @@ Create an instance of [`SpeechToTextModule`](../../06-api-reference/classes/Spee
4545
- [`model`](../../06-api-reference/classes/SpeechToTextModule.md#model) - Object containing:
4646
- [`isMultilingual`](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#ismultilingual) - Flag indicating if model is multilingual.
4747

48-
- [`encoderSource`](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#encodersource) - The location of the used encoder.
49-
50-
- [`decoderSource`](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#decodersource) - The location of the used decoder.
48+
- [`modelSource`](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#modelsource) - The location of the used model (bundled encoder + decoder functionality).
5149

5250
- [`tokenizerSource`](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#tokenizersource) - The location of the used tokenizer.
5351

docs/docs/06-api-reference/classes/SpeechToTextModule.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Class: SpeechToTextModule
22

3-
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:16](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L16)
3+
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:15](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L15)
44

55
Module for Speech to Text (STT) functionalities.
66

@@ -20,7 +20,7 @@ Module for Speech to Text (STT) functionalities.
2020

2121
> **decode**(`tokens`, `encoderOutput`): `Promise`\<`Float32Array`\<`ArrayBufferLike`\>\>
2222
23-
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:91](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L91)
23+
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:83](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L83)
2424

2525
Runs the decoder of the model.
2626

@@ -50,7 +50,7 @@ Decoded output.
5050

5151
> **delete**(): `void`
5252
53-
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:69](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L69)
53+
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:60](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L60)
5454

5555
Unloads the model from memory.
5656

@@ -64,7 +64,7 @@ Unloads the model from memory.
6464

6565
> **encode**(`waveform`): `Promise`\<`Float32Array`\<`ArrayBufferLike`\>\>
6666
67-
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:80](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L80)
67+
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:71](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L71)
6868

6969
Runs the encoding part of the model on the provided waveform.
7070
Returns the encoded waveform as a Float32Array.
@@ -89,7 +89,7 @@ The encoded output.
8989

9090
> **load**(`model`, `onDownloadProgressCallback?`): `Promise`\<`void`\>
9191
92-
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:27](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L27)
92+
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:26](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L26)
9393

9494
Loads the model specified by the config object.
9595
`onDownloadProgressCallback` allows you to monitor the current progress of the model download.
@@ -118,7 +118,7 @@ Optional callback to monitor download progress.
118118

119119
> **stream**(`options?`): `AsyncGenerator`\<\{ `committed`: [`TranscriptionResult`](../interfaces/TranscriptionResult.md); `nonCommitted`: [`TranscriptionResult`](../interfaces/TranscriptionResult.md); \}\>
120120
121-
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:133](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L133)
121+
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:124](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L124)
122122

123123
Starts a streaming transcription session.
124124
Yields objects with `committed` and `nonCommitted` transcriptions.
@@ -148,7 +148,7 @@ An async generator yielding transcription updates.
148148

149149
> **streamInsert**(`waveform`): `void`
150150
151-
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:206](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L206)
151+
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:197](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L197)
152152

153153
Inserts a new audio chunk into the streaming transcription session.
154154

@@ -170,7 +170,7 @@ The audio chunk to insert.
170170

171171
> **streamStop**(): `void`
172172
173-
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:213](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L213)
173+
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:204](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L204)
174174

175175
Stops the current streaming transcription session.
176176

@@ -184,7 +184,7 @@ Stops the current streaming transcription session.
184184

185185
> **transcribe**(`waveform`, `options?`): `Promise`\<[`TranscriptionResult`](../interfaces/TranscriptionResult.md)\>
186186
187-
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:109](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L109)
187+
Defined in: [modules/natural_language_processing/SpeechToTextModule.ts:100](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts#L100)
188188

189189
Starts a transcription process for a given input array (16kHz waveform).
190190
For multilingual models, specify the language in `options`.

docs/docs/06-api-reference/classes/TextToSpeechModule.md

Lines changed: 64 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Class: TextToSpeechModule
22

3-
Defined in: [modules/natural_language_processing/TextToSpeechModule.ts:17](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/TextToSpeechModule.ts#L17)
3+
Defined in: [modules/natural_language_processing/TextToSpeechModule.ts:18](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/TextToSpeechModule.ts#L18)
44

55
Module for Text to Speech (TTS) functionalities.
66

@@ -20,7 +20,7 @@ Module for Text to Speech (TTS) functionalities.
2020

2121
> **nativeModule**: `any` = `null`
2222
23-
Defined in: [modules/natural_language_processing/TextToSpeechModule.ts:21](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/TextToSpeechModule.ts#L21)
23+
Defined in: [modules/natural_language_processing/TextToSpeechModule.ts:22](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/TextToSpeechModule.ts#L22)
2424

2525
Native module instance
2626

@@ -30,7 +30,7 @@ Native module instance
3030

3131
> **delete**(): `void`
3232
33-
Defined in: [modules/natural_language_processing/TextToSpeechModule.ts:182](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/TextToSpeechModule.ts#L182)
33+
Defined in: [modules/natural_language_processing/TextToSpeechModule.ts:229](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/TextToSpeechModule.ts#L229)
3434

3535
Unloads the model from memory.
3636

@@ -44,7 +44,7 @@ Unloads the model from memory.
4444

4545
> **forward**(`text`, `speed?`): `Promise`\<`Float32Array`\<`ArrayBufferLike`\>\>
4646
47-
Defined in: [modules/natural_language_processing/TextToSpeechModule.ts:109](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/TextToSpeechModule.ts#L109)
47+
Defined in: [modules/natural_language_processing/TextToSpeechModule.ts:118](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/TextToSpeechModule.ts#L118)
4848

4949
Synthesizes the provided text into speech.
5050
Returns a promise that resolves to the full audio waveform as a `Float32Array`.
@@ -71,11 +71,43 @@ A promise resolving to the synthesized audio waveform.
7171

7272
---
7373

74+
### forwardFromPhonemes()
75+
76+
> **forwardFromPhonemes**(`phonemes`, `speed?`): `Promise`\<`Float32Array`\<`ArrayBufferLike`\>\>
77+
78+
Defined in: [modules/natural_language_processing/TextToSpeechModule.ts:135](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/TextToSpeechModule.ts#L135)
79+
80+
Synthesizes pre-computed phonemes into speech, bypassing the built-in phonemizer.
81+
This allows using an external G2P system (e.g. the Python `phonemizer` library,
82+
espeak-ng, or any custom phonemizer).
83+
84+
#### Parameters
85+
86+
##### phonemes
87+
88+
`string`
89+
90+
The pre-computed IPA phoneme string.
91+
92+
##### speed?
93+
94+
`number` = `1.0`
95+
96+
Optional speed multiplier for the speech synthesis (default is 1.0).
97+
98+
#### Returns
99+
100+
`Promise`\<`Float32Array`\<`ArrayBufferLike`\>\>
101+
102+
A promise resolving to the synthesized audio waveform.
103+
104+
---
105+
74106
### load()
75107

76108
> **load**(`config`, `onDownloadProgressCallback?`): `Promise`\<`void`\>
77109
78-
Defined in: [modules/natural_language_processing/TextToSpeechModule.ts:30](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/TextToSpeechModule.ts#L30)
110+
Defined in: [modules/natural_language_processing/TextToSpeechModule.ts:31](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/TextToSpeechModule.ts#L31)
79111

80112
Loads the model and voice assets specified by the config object.
81113
`onDownloadProgressCallback` allows you to monitor the current progress.
@@ -104,7 +136,7 @@ Optional callback to monitor download progress.
104136

105137
> **stream**(`input`): `AsyncGenerator`\<`Float32Array`\<`ArrayBufferLike`\>\>
106138
107-
Defined in: [modules/natural_language_processing/TextToSpeechModule.ts:127](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/TextToSpeechModule.ts#L127)
139+
Defined in: [modules/natural_language_processing/TextToSpeechModule.ts:196](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/TextToSpeechModule.ts#L196)
108140

109141
Starts a streaming synthesis session. Yields audio chunks as they are generated.
110142

@@ -124,11 +156,36 @@ An async generator yielding Float32Array audio chunks.
124156

125157
---
126158

159+
### streamFromPhonemes()
160+
161+
> **streamFromPhonemes**(`input`): `AsyncGenerator`\<`Float32Array`\<`ArrayBufferLike`\>\>
162+
163+
Defined in: [modules/natural_language_processing/TextToSpeechModule.ts:210](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/TextToSpeechModule.ts#L210)
164+
165+
Starts a streaming synthesis session from pre-computed phonemes.
166+
Bypasses the built-in phonemizer, allowing use of external G2P systems.
167+
168+
#### Parameters
169+
170+
##### input
171+
172+
[`TextToSpeechStreamingPhonemeInput`](../interfaces/TextToSpeechStreamingPhonemeInput.md)
173+
174+
Input object containing phonemes and optional speed.
175+
176+
#### Returns
177+
178+
`AsyncGenerator`\<`Float32Array`\<`ArrayBufferLike`\>\>
179+
180+
An async generator yielding Float32Array audio chunks.
181+
182+
---
183+
127184
### streamStop()
128185

129186
> **streamStop**(): `void`
130187
131-
Defined in: [modules/natural_language_processing/TextToSpeechModule.ts:175](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/TextToSpeechModule.ts#L175)
188+
Defined in: [modules/natural_language_processing/TextToSpeechModule.ts:222](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/modules/natural_language_processing/TextToSpeechModule.ts#L222)
132189

133190
Stops the streaming process if there is any ongoing.
134191

docs/docs/06-api-reference/functions/useTextToSpeech.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
> **useTextToSpeech**(`TextToSpeechProps`): [`TextToSpeechType`](../interfaces/TextToSpeechType.md)
44
5-
Defined in: [hooks/natural_language_processing/useTextToSpeech.ts:19](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/hooks/natural_language_processing/useTextToSpeech.ts#L19)
5+
Defined in: [hooks/natural_language_processing/useTextToSpeech.ts:22](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/hooks/natural_language_processing/useTextToSpeech.ts#L22)
66

77
React hook for managing Text to Speech instance.
88

docs/docs/06-api-reference/index.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,6 @@
101101
- [WHISPER_SMALL_EN](variables/WHISPER_SMALL_EN.md)
102102
- [WHISPER_TINY](variables/WHISPER_TINY.md)
103103
- [WHISPER_TINY_EN](variables/WHISPER_TINY_EN.md)
104-
- [WHISPER_TINY_EN_QUANTIZED](variables/WHISPER_TINY_EN_QUANTIZED.md)
105104

106105
## Models - Style Transfer
107106

@@ -262,8 +261,11 @@
262261
- [TextToImageType](interfaces/TextToImageType.md)
263262
- [TextToSpeechConfig](interfaces/TextToSpeechConfig.md)
264263
- [TextToSpeechInput](interfaces/TextToSpeechInput.md)
264+
- [TextToSpeechPhonemeInput](interfaces/TextToSpeechPhonemeInput.md)
265265
- [TextToSpeechProps](interfaces/TextToSpeechProps.md)
266+
- [TextToSpeechStreamingCallbacks](interfaces/TextToSpeechStreamingCallbacks.md)
266267
- [TextToSpeechStreamingInput](interfaces/TextToSpeechStreamingInput.md)
268+
- [TextToSpeechStreamingPhonemeInput](interfaces/TextToSpeechStreamingPhonemeInput.md)
267269
- [TextToSpeechType](interfaces/TextToSpeechType.md)
268270
- [TokenizerProps](interfaces/TokenizerProps.md)
269271
- [TokenizerType](interfaces/TokenizerType.md)

docs/docs/06-api-reference/interfaces/SpeechToTextModelConfig.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,40 +6,40 @@ Configuration for Speech to Text model.
66

77
## Properties
88

9-
### decoderSource
9+
### isMultilingual
1010

11-
> **decoderSource**: [`ResourceSource`](../type-aliases/ResourceSource.md)
11+
> **isMultilingual**: `boolean`
1212
13-
Defined in: [types/stt.ts:277](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/stt.ts#L277)
13+
Defined in: [types/stt.ts:269](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/stt.ts#L269)
1414

15-
A string that specifies the location of a `.pte` file for the decoder.
15+
A boolean flag indicating whether the model supports multiple languages.
1616

1717
---
1818

19-
### encoderSource
19+
### modelSource
20+
21+
> **modelSource**: [`ResourceSource`](../type-aliases/ResourceSource.md)
2022
21-
> **encoderSource**: [`ResourceSource`](../type-aliases/ResourceSource.md)
23+
Defined in: [types/stt.ts:276](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/stt.ts#L276)
2224

23-
Defined in: [types/stt.ts:272](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/stt.ts#L272)
25+
A string that specifies the location of a `.pte` file for the model.
2426

25-
A string that specifies the location of a `.pte` file for the encoder.
27+
We expect the model to have 2 bundled methods: 'decode' and 'encode'.
2628

2729
---
2830

29-
### isMultilingual
31+
### tokenizerSource
3032

31-
> **isMultilingual**: `boolean`
33+
> **tokenizerSource**: [`ResourceSource`](../type-aliases/ResourceSource.md)
3234
33-
Defined in: [types/stt.ts:267](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/stt.ts#L267)
35+
Defined in: [types/stt.ts:281](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/stt.ts#L281)
3436

35-
A boolean flag indicating whether the model supports multiple languages.
37+
A string that specifies the location to the tokenizer for the model.
3638

3739
---
3840

39-
### tokenizerSource
41+
### type
4042

41-
> **tokenizerSource**: [`ResourceSource`](../type-aliases/ResourceSource.md)
42-
43-
Defined in: [types/stt.ts:282](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/stt.ts#L282)
43+
> **type**: `"whisper"`
4444
45-
A string that specifies the location to the tokenizer for the model.
45+
Defined in: [types/stt.ts:264](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/stt.ts#L264)

0 commit comments

Comments
 (0)