Skip to content

Commit 194450b

Browse files
authored
feat: add s2t unload and make streamInsert and streamStop sync (#594)
## Description Changes: - add s2t unload - make streamInsert and streamStop sync ### Introduces a breaking change? - [x] Yes - [ ] No ### Type of change - [ ] Bug fix (change which fixes an issue) - [x] New feature (change which adds functionality) - [x] Documentation update (improves or adds clarity to existing documentation) - [ ] Other (chores, tests, code style improvements etc.) ### Tested on - [x] iOS - [x] Android ### Checklist - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have updated the documentation accordingly - [x] My changes generate no new warnings
1 parent 5b4725c commit 194450b

11 files changed

Lines changed: 50 additions & 60 deletions

File tree

apps/llm/app/voice_chat/index.tsx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,11 +68,11 @@ function VoiceChatScreen() {
6868
setIsRecording(false);
6969
recorder.stop();
7070
messageRecorded.current = true;
71-
await speechToText.streamStop();
71+
speechToText.streamStop();
7272
} else {
7373
setIsRecording(true);
74-
recorder.onAudioReady(async ({ buffer }) => {
75-
await speechToText.streamInsert(buffer.getChannelData(0));
74+
recorder.onAudioReady(({ buffer }) => {
75+
speechToText.streamInsert(buffer.getChannelData(0));
7676
});
7777
recorder.start();
7878
const transcription = await speechToText.stream();

apps/speech-to-text/screens/SpeechToTextScreen.tsx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -77,8 +77,8 @@ export const SpeechToTextScreen = () => {
7777
const handleStartTranscribeFromMicrophone = async () => {
7878
setLiveTranscribing(true);
7979
setTranscription('');
80-
recorder.onAudioReady(async ({ buffer }) => {
81-
await model.streamInsert(buffer.getChannelData(0));
80+
recorder.onAudioReady(({ buffer }) => {
81+
model.streamInsert(buffer.getChannelData(0));
8282
});
8383
recorder.start();
8484

@@ -89,9 +89,9 @@ export const SpeechToTextScreen = () => {
8989
}
9090
};
9191

92-
const handleStopTranscribeFromMicrophone = async () => {
92+
const handleStopTranscribeFromMicrophone = () => {
9393
recorder.stop();
94-
await model.streamStop();
94+
model.streamStop();
9595
console.log('Live transcription stopped');
9696
setLiveTranscribing(false);
9797
};

docs/docs/02-hooks/01-natural-language-processing/useSpeechToText.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -79,8 +79,8 @@ For more information on loading resources, take a look at [loading models](../..
7979
| --------------------------- | ---------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
8080
| `transcribe` | `(waveform: Float32Array \| number[], options?: DecodingOptions \| undefined) => Promise<string>` | Starts a transcription process for a given input array, which should be a waveform at 16kHz. The second argument is an options object, e.g. `{ language: 'es' }` for multilingual models. Resolves a promise with the output transcription when the model is finished. Passing `number[]` is deprecated. |
8181
| `stream` | `() => Promise<string>` | Starts a streaming transcription process. Use in combination with `streamInsert` to feed audio chunks and `streamStop` to end the stream. Updates `committedTranscription` and `nonCommittedTranscription` as transcription progresses. |
82-
| `streamInsert` | `(waveform: Float32Array \| number[]) => Promise<void>` | Inserts a chunk of audio data (sampled at 16kHz) into the ongoing streaming transcription. Call this repeatedly as new audio data becomes available. Passing `number[]` is deprecated. |
83-
| `streamStop` | `() => Promise<void>` | Stops the ongoing streaming transcription process. |
82+
| `streamInsert` | `(waveform: Float32Array \| number[]) => void` | Inserts a chunk of audio data (sampled at 16kHz) into the ongoing streaming transcription. Call this repeatedly as new audio data becomes available. Passing `number[]` is deprecated. |
83+
| `streamStop` | `() => void` | Stops the ongoing streaming transcription process. |
8484
| `encode` | `(waveform: Float32Array \| number[]) => Promise<Float32Array>` | Runs the encoding part of the model on the provided waveform. Passing `number[]` is deprecated. |
8585
| `decode` | `(tokens: number[] \| Int32Array, encoderOutput: Float32Array \| number[]) => Promise<Float32Array>` | Runs the decoder of the model. Passing `number[]` is deprecated. |
8686
| `committedTranscription` | `string` | Contains the part of the transcription that is finalized and will not change. Useful for displaying stable results during streaming. |
@@ -279,8 +279,8 @@ function App() {
279279
}, []);
280280

281281
const handleStartStreamingTranscribe = async () => {
282-
recorder.onAudioReady(async ({ buffer }) => {
283-
await model.streamInsert(buffer.getChannelData(0));
282+
recorder.onAudioReady(({ buffer }) => {
283+
model.streamInsert(buffer.getChannelData(0));
284284
});
285285
recorder.start();
286286

@@ -291,9 +291,9 @@ function App() {
291291
}
292292
};
293293

294-
const handleStopStreamingTranscribe = async () => {
294+
const handleStopStreamingTranscribe = () => {
295295
recorder.stop();
296-
await model.streamStop();
296+
model.streamStop();
297297
};
298298

299299
return (

docs/docs/03-typescript-api/01-natural-language-processing/SpeechToTextModule.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,13 @@ await model.transcribe(waveform);
2222
| Method | Type | Description |
2323
| -------------- | ---------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
2424
| `load` | `(model: SpeechToTextModelConfig, onDownloadProgressCallback?: (progress: number) => void): Promise<void>` | Loads the model specified by the config object. `onDownloadProgressCallback` allows you to monitor the current progress of the model download. |
25+
| `delete` | `(): void` | Unloads the model from memory. |
2526
| `encode` | `(waveform: Float32Array \| number[]): Promise<Float32Array>` | Runs the encoding part of the model on the provided waveform. Returns the encoded waveform as a Float32Array. Passing `number[]` is deprecated. |
2627
| `decode` | `(tokens: number[] \| Int32Array, encoderOutput: Float32Array \| number[]): Promise<Float32Array>` | Runs the decoder of the model. Passing `number[]` is deprecated. |
2728
| `transcribe` | `(waveform: Float32Array \| number[], options?: DecodingOptions): Promise<string>` | Starts a transcription process for a given input array (16kHz waveform). For multilingual models, specify the language in `options`. Returns the transcription as a string. Passing `number[]` is deprecated. |
2829
| `stream` | `(options?: DecodingOptions): AsyncGenerator<{ committed: string; nonCommitted: string }>` | Starts a streaming transcription session. Yields objects with `committed` and `nonCommitted` transcriptions. Use with `streamInsert` and `streamStop` to control the stream. |
29-
| `streamStop` | `(): Promise<void>` | Stops the current streaming transcription session. |
30-
| `streamInsert` | `(waveform: Float32Array \| number[]): Promise<void>` | Inserts a new audio chunk into the streaming transcription session. Passing `number[]` is deprecated. |
30+
| `streamStop` | `(): void` | Stops the current streaming transcription session. |
31+
| `streamInsert` | `(waveform: Float32Array \| number[]): void` | Inserts a new audio chunk into the streaming transcription session. Passing `number[]` is deprecated. |
3132

3233
:::info
3334

@@ -227,9 +228,9 @@ const recorder = new AudioRecorder({
227228
sampleRate: 16000,
228229
bufferLengthInSamples: 1600,
229230
});
230-
recorder.onAudioReady(async ({ buffer }) => {
231+
recorder.onAudioReady(({ buffer }) => {
231232
// Insert the audio into the streaming transcription
232-
await model.streamInsert(buffer.getChannelData(0));
233+
model.streamInsert(buffer.getChannelData(0));
233234
});
234235
recorder.start();
235236

@@ -246,6 +247,6 @@ try {
246247
}
247248

248249
// Stop streaming transcription
249-
await model.streamStop();
250+
model.streamStop();
250251
recorder.stop();
251252
```

docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useSpeechToText.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -79,8 +79,8 @@ For more information on loading resources, take a look at [loading models](../..
7979
| --------------------------- | ---------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
8080
| `transcribe` | `(waveform: Float32Array \| number[], options?: DecodingOptions \| undefined) => Promise<string>` | Starts a transcription process for a given input array, which should be a waveform at 16kHz. The second argument is an options object, e.g. `{ language: 'es' }` for multilingual models. Resolves a promise with the output transcription when the model is finished. Passing `number[]` is deprecated. |
8181
| `stream` | `() => Promise<string>` | Starts a streaming transcription process. Use in combination with `streamInsert` to feed audio chunks and `streamStop` to end the stream. Updates `committedTranscription` and `nonCommittedTranscription` as transcription progresses. |
82-
| `streamInsert` | `(waveform: Float32Array \| number[]) => Promise<void>` | Inserts a chunk of audio data (sampled at 16kHz) into the ongoing streaming transcription. Call this repeatedly as new audio data becomes available. Passing `number[]` is deprecated. |
83-
| `streamStop` | `() => Promise<void>` | Stops the ongoing streaming transcription process. |
82+
| `streamInsert` | `(waveform: Float32Array \| number[]) => void` | Inserts a chunk of audio data (sampled at 16kHz) into the ongoing streaming transcription. Call this repeatedly as new audio data becomes available. Passing `number[]` is deprecated. |
83+
| `streamStop` | `() => void` | Stops the ongoing streaming transcription process. |
8484
| `encode` | `(waveform: Float32Array \| number[]) => Promise<Float32Array>` | Runs the encoding part of the model on the provided waveform. Passing `number[]` is deprecated. |
8585
| `decode` | `(tokens: number[] \| Int32Array, encoderOutput: Float32Array \| number[]) => Promise<Float32Array>` | Runs the decoder of the model. Passing `number[]` is deprecated. |
8686
| `committedTranscription` | `string` | Contains the part of the transcription that is finalized and will not change. Useful for displaying stable results during streaming. |
@@ -279,7 +279,7 @@ function App() {
279279
}, []);
280280

281281
const handleStartStreamingTranscribe = async () => {
282-
recorder.onAudioReady(async ({ buffer }) => {
282+
recorder.onAudioReady(({ buffer }) => {
283283
model.streamInsert(buffer.getChannelData(0));
284284
});
285285
recorder.start();
@@ -291,7 +291,7 @@ function App() {
291291
}
292292
};
293293

294-
const handleStopStreamingTranscribe = async () => {
294+
const handleStopStreamingTranscribe = () => {
295295
recorder.stop();
296296
model.streamStop();
297297
};

docs/versioned_docs/version-0.5.x/03-typescript-api/01-natural-language-processing/SpeechToTextModule.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ await model.transcribe(waveform);
2222
| Method | Type | Description |
2323
| -------------- | ---------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
2424
| `load` | `(model: SpeechToTextModelConfig, onDownloadProgressCallback?: (progress: number) => void): Promise<void>` | Loads the model specified by the config object. `onDownloadProgressCallback` allows you to monitor the current progress of the model download. |
25+
| `delete` | `(): void` | Unloads the model from memory. |
2526
| `encode` | `(waveform: Float32Array \| number[]): Promise<Float32Array>` | Runs the encoding part of the model on the provided waveform. Returns the encoded waveform as a Float32Array. Passing `number[]` is deprecated. |
2627
| `decode` | `(tokens: number[] \| Int32Array, encoderOutput: Float32Array \| number[]): Promise<Float32Array>` | Runs the decoder of the model. Passing `number[]` is deprecated. |
2728
| `transcribe` | `(waveform: Float32Array \| number[], options?: DecodingOptions): Promise<string>` | Starts a transcription process for a given input array (16kHz waveform). For multilingual models, specify the language in `options`. Returns the transcription as a string. Passing `number[]` is deprecated. |
@@ -227,7 +228,7 @@ const recorder = new AudioRecorder({
227228
sampleRate: 16000,
228229
bufferLengthInSamples: 1600,
229230
});
230-
recorder.onAudioReady(async ({ buffer }) => {
231+
recorder.onAudioReady(({ buffer }) => {
231232
// Insert the audio into the streaming transcription
232233
model.streamInsert(buffer.getChannelData(0));
233234
});

0 commit comments

Comments
 (0)