You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/03-hooks/01-natural-language-processing/useSpeechToText.md
+18-32Lines changed: 18 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,7 +49,7 @@ import { AudioContext } from 'react-native-audio-api';
49
49
import*asFileSystemfrom'expo-file-system';
50
50
51
51
const model =useSpeechToText({
52
-
model: models.speech_to_text.whisper_tiny_en(),
52
+
model: models.speech_to_text.whisper_tiny_en(),// Use whisper_tiny_en for English or whisper_tiny for multilingual support
53
53
});
54
54
55
55
// 1. Get audio file
@@ -89,8 +89,13 @@ The `stream()` function accepts several optional parameters:
89
89
90
90
-`language`: The language code (e.g., `'es'`, `'fr'`). Required for multilingual models.
91
91
-`verbose`: If `true`, includes word-level timestamps and segment metadata in the result objects.
92
+
-`useVAD`: Enable the Voice Activity Detection submodule (if configured in `useSpeechToText` props) to optimize performance by filtering silence. Defaults to `false`.
92
93
-`timeout`: (Advanced) The interval (in milliseconds) between processing consecutive audio chunks in streaming mode. Lower values provide more frequent updates and lower latency, while higher values reduce CPU consumption. Defaults to `100`.
93
-
-`useVAD`: Enable the Voice Activity Detection submodule (if configured in `useSpeechToText` props) to optimize performance by filtering silence.
94
+
-`vadDetectionMargin`: (Advanced) The duration of silence (in milliseconds) required after speech is detected before "committing" a segment. Defaults to `500`. Only active when VAD module is used.
95
+
96
+
### Voice Activity Detection (VAD)
97
+
98
+
Integrating a VAD submodule is highly recommended for streaming. It improves performance by automatically removing silence, which reduces CPU usage, saves battery, and prevents the model from "hallucinating" text during silent periods.
@@ -158,39 +168,15 @@ export default function LiveTranscriber() {
158
168
159
169
## Advanced Features
160
170
161
-
### VAD Integration (Recommended for Live)
162
-
163
-
Integrating **Voice Activity Detection (VAD)** as a submodule improves streaming performance by automatically removing silence. This reduces CPU usage, saves battery, and prevents hallucinations during silent periods.
164
-
165
-
To use it, provide the `vad` model in the hook props and enable `useVAD` in the stream options:
166
-
167
-
```typescript
168
-
import {
169
-
useSpeechToText,
170
-
WHISPER_TINY_EN,
171
-
FSMN_VAD,
172
-
} from'react-native-executorch';
173
-
174
-
const model =useSpeechToText({
175
-
model: WHISPER_TINY_EN,
176
-
vad: FSMN_VAD, // Integrating VAD submodule
177
-
});
178
-
179
-
const startLiveStreaming =async () => {
180
-
const streamIter =model.stream({
181
-
useVAD: true, // Enable VAD logic in the stream context
182
-
vadDetectionMargin: 500, // Wait for 500ms of silence before committing (for stability)
183
-
});
184
-
};
185
-
```
186
-
187
171
### Multilingual Transcription
188
172
189
173
To transcribe languages other than English, use a multilingual model (e.g., `models.speech_to_text.whisper_tiny()`) and specify the corresponding language code:
190
174
191
175
```typescript
192
176
// Transcribe in Spanish
193
-
const model =useSpeechToText({ model: WHISPER_TINY });
177
+
const model =useSpeechToText({
178
+
model: models.speech_to_text.whisper_tiny(),
179
+
});
194
180
const result =awaitmodel.transcribe(spanishAudio, { language: 'es' });
Copy file name to clipboardExpand all lines: docs/docs/03-hooks/01-natural-language-processing/useVAD.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,9 +18,9 @@ It is recommended to use models provided by us, which are available at our [Hugg
18
18
This mode is best suited for processing pre-recorded audio files or existing buffers. You provide a full waveform to the `forward` method, which returns an array of detected speech segments.
const model =useVAD({ model: models.vad.fsmn_vad() });
24
24
25
25
// ... obtain audioBuffer (Float32Array) at 16kHz ...
26
26
@@ -55,9 +55,9 @@ You can fine-tune the streaming behavior via the `options` object:
55
55
-**`detectionMargin`** (default: `100`ms): Specifies the maximum allowed gap between the last detected speech segment and the current time to still consider the speech as "ongoing." This value determines how much silence is tolerated before `onSpeechEnd` is triggered.
Copy file name to clipboardExpand all lines: docs/docs/04-typescript-api/01-natural-language-processing/VADModule.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,7 +40,7 @@ For more information on loading resources, take a look at the [loading models](.
40
40
41
41
## Running the model
42
42
43
-
### Batch Processing
43
+
### File Processing
44
44
45
45
To process a full audio buffer at once, use the [`forward`](../../06-api-reference/classes/VADModule.md#forward) method. Before calling [`forward`](../../06-api-reference/classes/VADModule.md#forward), ensure you have the audio waveform sampled at 16 kHz. Pass the waveform as an argument; the method returns a promise that resolves to an array of detected speech segments.
Copy file name to clipboardExpand all lines: packages/react-native-executorch/common/rnexecutorch/models/voice_activity_detection/VoiceActivityDetection.cpp
0 commit comments