Skip to content

Commit 5022de3

Browse files
authored
docs: fix docs related to speech models (#1177)
## Description Fixes some code snippets in docs related to speech models that used old API. ### Introduces a breaking change? - [ ] Yes - [x] No ### Type of change - [ ] Bug fix (change which fixes an issue) - [ ] New feature (change which adds functionality) - [x] Documentation update (improves or adds clarity to existing documentation) - [ ] Other (chores, tests, code style improvements etc.) ### Tested on - [ ] iOS - [ ] Android ### Testing instructions <!-- Provide step-by-step instructions on how to test your changes. Include setup details if necessary. --> N/A ### Screenshots <!-- Add screenshots here, if applicable --> ### Related issues <!-- Link related issues here using #issue-number --> ### Checklist - [ ] I have performed a self-review of my code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have updated the documentation accordingly - [ ] My changes generate no new warnings ### Additional notes <!-- Include any additional information, assumptions, or context that reviewers might need to understand this PR. -->
1 parent c1e8233 commit 5022de3

3 files changed

Lines changed: 32 additions & 22 deletions

File tree

docs/docs/03-hooks/01-natural-language-processing/useSpeechToText.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -200,14 +200,18 @@ const result = await model.transcribe(audioBuffer, { verbose: true });
200200

201201
### Returns
202202

203-
The hook returns an object with:
204-
205-
- `transcribe(audio, options)`: One-shot transcription.
206-
- `stream(options)`: Async generator for streaming results.
207-
- `streamInsert(audio)`: Push audio to the stream buffer.
208-
- `streamStop()`: Finish the current stream.
209-
- `isGenerating`: Boolean indicating if the model is busy.
210-
- `loading`: Boolean indicating if the model is being loaded.
203+
The hook returns a [`SpeechToTextType`](../../06-api-reference/interfaces/SpeechToTextType.md) object containing:
204+
205+
- `error`: `null | RnExecutorchError` - Contains the error message if the model failed to load.
206+
- `isReady`: `boolean` - Indicates whether the model has successfully loaded and is ready for inference.
207+
- `isGenerating`: `boolean` - Indicates whether the model is currently processing an inference.
208+
- `downloadProgress`: `number` - Tracks the progress of the model download process as a value between `0` and `1`.
209+
- `transcribe(audio, options)`: Starts a transcription process for a given input array, which should be a waveform at 16kHz. Returns a promise resolving to a [`TranscriptionResult`](../../06-api-reference/interfaces/TranscriptionResult.md).
210+
- `stream(options)`: Starts a streaming transcription process. Asynchronous generator that yields objects containing `committed` and `nonCommitted` transcriptions, both of type [`TranscriptionResult`](../../06-api-reference/interfaces/TranscriptionResult.md).
211+
- `streamInsert(audio)`: Inserts a chunk of audio data (sampled at 16kHz) into the ongoing streaming transcription.
212+
- `streamStop()`: Stops the ongoing streaming transcription process.
213+
- `encode(audio)`: Runs the encoding part of the model on the provided waveform. Returns a promise resolving to the encoded `Float32Array`.
214+
- `decode(tokens, encoderOutput)`: Runs the decoder of the model with the given tokens (`Int32Array`) and encoder output (`Float32Array`). Returns a promise resolving to the decoded `Float32Array`.
211215

212216
## Supported models
213217

docs/docs/03-hooks/01-natural-language-processing/useVAD.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,10 @@ You can fine-tune the streaming behavior via the `options` object:
5656

5757
```tsx
5858
import { useVAD, models } from 'react-native-executorch';
59+
import { AudioRecorder } from 'react-native-audio-api';
5960

6061
const model = useVAD({ model: models.vad.fsmn_vad() });
62+
const recorder = new AudioRecorder();
6163

6264
const startLiveVAD = async () => {
6365
// Start the continuous streaming listener
@@ -70,21 +72,25 @@ const startLiveVAD = async () => {
7072
},
7173
});
7274

73-
// Example: Hook into your audio recorder's data event
74-
audioRecorder.on('data', (chunk: Float32Array) => {
75-
model.streamInsert(chunk);
76-
});
75+
// Capture microphone input at 16kHz
76+
recorder.onAudioReady(
77+
{ sampleRate: 16000, bufferLength: 1600, channelCount: 1 },
78+
(chunk) => model.streamInsert(chunk.buffer.getChannelData(0))
79+
);
80+
81+
await recorder.start();
7782
};
7883

7984
const stopLiveVAD = () => {
85+
recorder.stop();
8086
model.streamStop();
8187
};
8288
```
8389

8490
### Arguments & Returns
8591

8692
- **Arguments**: `useVAD` takes a [`VADProps`](../../06-api-reference/interfaces/VADProps.md) object containing the `model` and an optional `preventLoad` flag.
87-
- **Returns**: A [`VADType`](../../06-api-reference/interfaces/VADType.md) object providing `forward`, `stream`, `streamInsert`, and `streamStop` methods, along with `isReady` and `error` states.
93+
- **Returns**: A [`VADType`](../../06-api-reference/interfaces/VADType.md) object providing `forward`, `stream`, `streamInsert`, and `streamStop` methods, along with `error`, `isReady`, `isGenerating`, and `downloadProgress` states.
8894

8995
## Supported models
9096

docs/docs/04-typescript-api/01-natural-language-processing/SpeechToTextModule.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -98,20 +98,20 @@ const model = await SpeechToTextModule.fromModelName(
9898
AudioManager.setAudioSessionOptions({
9999
iosCategory: 'playAndRecord',
100100
iosMode: 'spokenAudio',
101-
iosOptions: ['allowBluetooth', 'defaultToSpeaker'],
101+
iosOptions: ['allowBluetoothHFP', 'defaultToSpeaker'],
102102
});
103103
await AudioManager.requestRecordingPermissions();
104104

105105
// 2. Setup Audio Recorder
106-
const recorder = new AudioRecorder({
107-
sampleRate: 16000,
108-
channelCount: 1,
109-
});
106+
const recorder = new AudioRecorder();
110107

111-
recorder.onAudioReady((chunk) => {
112-
// Feed chunks directly into the model's buffer
113-
model.streamInsert(chunk.buffer.getChannelData(0));
114-
});
108+
recorder.onAudioReady(
109+
{ sampleRate: 16000, bufferLength: 1600, channelCount: 1 },
110+
(chunk) => {
111+
// Feed chunks directly into the model's buffer
112+
model.streamInsert(chunk.buffer.getChannelData(0));
113+
}
114+
);
115115

116116
await recorder.start();
117117

0 commit comments

Comments
 (0)