Skip to content

Commit 9e9969c

Browse files
jakmroa-szymanska
authored andcommitted
@jakmro/s2t c++ (#580)
## Description Move Speech To Text implementation to c++ ### Introduces a breaking change? - [x] Yes - [ ] No ### Type of change - [ ] Bug fix (change which fixes an issue) - [ ] New feature (change which adds functionality) - [x] Documentation update (improves or adds clarity to existing documentation) - [x] Other (chores, tests, code style improvements etc.) ### Tested on - [x] iOS - [x] Android ### Checklist - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have updated the documentation accordingly - [x] My changes generate no new warnings
1 parent ab5cb0b commit 9e9969c

File tree

42 files changed

+1147
-857
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+1147
-857
lines changed

apps/llm/app/voice_chat/index.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ function VoiceChatScreen() {
7777

7878
const onChunk = (data: string) => {
7979
const float32Chunk = float32ArrayFromPCMBinaryBuffer(data);
80-
speechToText.streamInsert(Array.from(float32Chunk));
80+
speechToText.streamInsert(float32Chunk);
8181
};
8282

8383
const handleRecordPress = async () => {

apps/speech-to-text/screens/SpeechToTextScreen.tsx

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,7 @@ export const SpeechToTextScreen = () => {
6666
try {
6767
const decodedAudioData = await audioContext.decodeAudioDataSource(uri);
6868
const audioBuffer = decodedAudioData.getChannelData(0);
69-
const audioArray = Array.from(audioBuffer);
70-
setTranscription(await model.transcribe(audioArray));
69+
setTranscription(await model.transcribe(audioBuffer));
7170
} catch (error) {
7271
console.error('Error decoding audio data', error);
7372
console.warn('Note: Supported file formats: mp3, wav, flac');
@@ -79,8 +78,7 @@ export const SpeechToTextScreen = () => {
7978
setLiveTranscribing(true);
8079
setTranscription('');
8180
recorder.onAudioReady(async ({ buffer }) => {
82-
const bufferArray = Array.from(buffer.getChannelData(0));
83-
model.streamInsert(bufferArray);
81+
await model.streamInsert(buffer.getChannelData(0));
8482
});
8583
recorder.start();
8684

@@ -93,7 +91,7 @@ export const SpeechToTextScreen = () => {
9391

9492
const handleStopTranscribeFromMicrophone = async () => {
9593
recorder.stop();
96-
model.streamStop();
94+
await model.streamStop();
9795
console.log('Live transcription stopped');
9896
setLiveTranscribing(false);
9997
};

docs/docs/02-hooks/01-natural-language-processing/useSpeechToText.md

Lines changed: 18 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -44,10 +44,9 @@ const { uri } = await FileSystem.downloadAsync(
4444
const audioContext = new AudioContext({ sampleRate: 16000 });
4545
const decodedAudioData = await audioContext.decodeAudioDataSource(uri);
4646
const audioBuffer = decodedAudioData.getChannelData(0);
47-
const audioArray = Array.from(audioBuffer);
4847

4948
try {
50-
const transcription = await model.transcribe(audioArray);
49+
const transcription = await model.transcribe(audioBuffer);
5150
console.log(transcription);
5251
} catch (error) {
5352
console.error('Error during audio transcription', error);
@@ -76,20 +75,20 @@ For more information on loading resources, take a look at [loading models](../..
7675

7776
### Returns
7877

79-
| Field | Type | Description |
80-
| --------------------------- | --------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
81-
| `transcribe` | `(waveform: number[], options?: DecodingOptions \| undefined) => Promise<string>` | Starts a transcription process for a given input array, which should be a waveform at 16kHz. The second argument is an options object, e.g. `{ language: 'es' }` for multilingual models. Resolves a promise with the output transcription when the model is finished. |
82-
| `stream` | `() => Promise<string>` | Starts a streaming transcription process. Use in combination with `streamInsert` to feed audio chunks and `streamStop` to end the stream. Updates `committedTranscription` and `nonCommittedTranscription` as transcription progresses. |
83-
| `streamInsert` | `(waveform: number[]) => void` | Inserts a chunk of audio data (sampled at 16kHz) into the ongoing streaming transcription. Call this repeatedly as new audio data becomes available. |
84-
| `streamStop` | `() => void` | Stops the ongoing streaming transcription process. |
85-
| `encode` | `(waveform: Float32Array) => Promise<void>` | Runs the encoding part of the model on the provided waveform. Stores the result internally. |
86-
| `decode` | `(tokens: number[]) => Promise<Float32Array>` | Runs the decoder of the model. Returns the decoded waveform as a Float32Array. |
87-
| `committedTranscription` | `string` | Contains the part of the transcription that is finalized and will not change. Useful for displaying stable results during streaming. |
88-
| `nonCommittedTranscription` | `string` | Contains the part of the transcription that is still being processed and may change. Useful for displaying live, partial results during streaming. |
89-
| `error` | `string \| null` | Contains the error message if the model failed to load. |
90-
| `isGenerating` | `boolean` | Indicates whether the model is currently processing an inference. |
91-
| `isReady` | `boolean` | Indicates whether the model has successfully loaded and is ready for inference. |
92-
| `downloadProgress` | `number` | Tracks the progress of the model download process. |
78+
| Field | Type | Description |
79+
| --------------------------- | ---------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
80+
| `transcribe` | `(waveform: Float32Array \| number[], options?: DecodingOptions \| undefined) => Promise<string>` | Starts a transcription process for a given input array, which should be a waveform at 16kHz. The second argument is an options object, e.g. `{ language: 'es' }` for multilingual models. Resolves a promise with the output transcription when the model is finished. Passing `number[]` is deprecated. |
81+
| `stream` | `() => Promise<string>` | Starts a streaming transcription process. Use in combination with `streamInsert` to feed audio chunks and `streamStop` to end the stream. Updates `committedTranscription` and `nonCommittedTranscription` as transcription progresses. |
82+
| `streamInsert` | `(waveform: Float32Array \| number[]) => Promise<void>` | Inserts a chunk of audio data (sampled at 16kHz) into the ongoing streaming transcription. Call this repeatedly as new audio data becomes available. Passing `number[]` is deprecated. |
83+
| `streamStop` | `() => Promise<void>` | Stops the ongoing streaming transcription process. |
84+
| `encode` | `(waveform: Float32Array \| number[]) => Promise<Float32Array>` | Runs the encoding part of the model on the provided waveform. Passing `number[]` is deprecated. |
85+
| `decode` | `(tokens: number[] \| Int32Array, encoderOutput: Float32Array \| number[]) => Promise<Float32Array>` | Runs the decoder of the model. Passing `number[]` is deprecated. |
86+
| `committedTranscription` | `string` | Contains the part of the transcription that is finalized and will not change. Useful for displaying stable results during streaming. |
87+
| `nonCommittedTranscription` | `string` | Contains the part of the transcription that is still being processed and may change. Useful for displaying live, partial results during streaming. |
88+
| `error` | `string \| null` | Contains the error message if the model failed to load. |
89+
| `isGenerating` | `boolean` | Indicates whether the model is currently processing an inference. |
90+
| `isReady` | `boolean` | Indicates whether the model has successfully loaded and is ready for inference. |
91+
| `downloadProgress` | `number` | Tracks the progress of the model download process. |
9392

9493
<details>
9594
<summary>Type definitions</summary>
@@ -231,7 +230,7 @@ function App() {
231230
const decodedAudioData = await audioContext.decodeAudioDataSource(uri);
232231
const audioBuffer = decodedAudioData.getChannelData(0);
233232

234-
return Array.from(audioBuffer);
233+
return audioBuffer;
235234
};
236235

237236
const handleTranscribe = async () => {
@@ -281,8 +280,7 @@ function App() {
281280

282281
const handleStartStreamingTranscribe = async () => {
283282
recorder.onAudioReady(async ({ buffer }) => {
284-
const bufferArray = Array.from(buffer.getChannelData(0));
285-
model.streamInsert(bufferArray);
283+
await model.streamInsert(buffer.getChannelData(0));
286284
});
287285
recorder.start();
288286

@@ -295,7 +293,7 @@ function App() {
295293

296294
const handleStopStreamingTranscribe = async () => {
297295
recorder.stop();
298-
model.streamStop();
296+
await model.streamStop();
299297
};
300298

301299
return (

0 commit comments

Comments
 (0)