Description
Hi everyone,
I’ve started integrating react-native-executorch and using the useSpeechToText hook. For testing, I recorded a valid 3-second WAV file (mono, 16kHz, 16-bit) and placed it in the assets folder.
My goal is to support multiple languages, so I tried using whisperMultilingual. Unfortunately, it doesn’t seem to work at all no output sequences are returned. I then switched to the moonshine model, and it did recognize my speech, but it interpreted it as English instead of German. Switching back to whisper.en or whisperMultilingual again results in no output.
It seems like whisper.en and whisperMultilingual might not be functioning correctly, or perhaps I’ve misunderstood the usage.
Could this be a bug, or am I doing something wrong?
Thanks in advance!
Here’s a example of what I’ve implemented:
App.js
import { StatusBar } from "expo-status-bar";
import { StyleSheet, Text, View, ScrollView } from "react-native";
import { useSpeechToText, SpeechToTextLanguage } from "react-native-executorch";
import { loadWavAsFloat32Array } from "./wav";
export default function App() {
const {
isGenerating,
isReady,
downloadProgress,
sequence,
error,
transcribe,
} = useSpeechToText({
modelName: "whisperMultilingual",
streamingConfig: "balanced",
});
return (
<View style={styles.container}>
<StatusBar style="auto" />
<ScrollView contentContainerStyle={styles.container}>
<Text style={styles.label}>
isGenerating: <Text style={styles.value}>{String(isGenerating)}</Text>
</Text>
<Text style={styles.label}>
isReady: <Text style={styles.value}>{String(isReady)}</Text>
</Text>
<Text style={styles.label}>
downloadProgress: <Text style={styles.value}>{downloadProgress}</Text>
</Text>
<Text style={styles.label}>
sequence: <Text style={styles.value}>{JSON.stringify(sequence)}</Text>
</Text>
<Text style={styles.label}>
error:{" "}
<Text style={styles.value}>{error ? error.toString() : "None"}</Text>
</Text>
<Text style={styles.label}>
<Text
style={{ color: "blue", textDecorationLine: "underline" }}
onPress={async () => {
try {
const float32Array = await loadWavAsFloat32Array(
require("./assets/audio.wav")
);
const result = await transcribe(Array.from(float32Array), SpeechToTextLanguage.German);
} catch (e) {
console.error("Transkription fehlgeschlagen:", e);
}
}}
>
WAV transkribieren
</Text>
</Text>
</ScrollView>
</View>
);
}
const styles = StyleSheet.create({
container: {
flexGrow: 1,
padding: 24,
backgroundColor: "#fff",
},
label: {
fontWeight: "bold",
marginBottom: 8,
},
value: {
fontWeight: "normal",
color: "#333",
},
});
wav.js:
import * as FileSystem from "expo-file-system";
import { Asset } from "expo-asset";
import { Buffer } from 'buffer';
function getPCMStart(buffer) {
for (let i = 0; i < buffer.length - 4; i++) {
if (
buffer[i] === 0x64 && // 'd'
buffer[i + 1] === 0x61 && // 'a'
buffer[i + 2] === 0x74 && // 't'
buffer[i + 3] === 0x61 // 'a'
) {
return i + 8;
}
}
return 44;
}
const float32ArrayFromPCMBinaryBuffer = (b64EncodedBuffer) => {
const b64DecodedChunk = Buffer.from(b64EncodedBuffer, "base64");
const pcmStart = getPCMStart(b64DecodedChunk);
const pcmBuffer = b64DecodedChunk.subarray(pcmStart);
const int16Array = new Int16Array(
pcmBuffer.buffer,
pcmBuffer.byteOffset,
Math.floor(pcmBuffer.length / 2)
);
const float32Array = new Float32Array(int16Array.length);
for (let i = 0; i < int16Array.length; i++) {
float32Array[i] = Math.max(-1, Math.min(1, int16Array[i] / 32768));
}
return float32Array;
};
export async function loadWavAsFloat32Array(assetModule) {
const asset = Asset.fromModule(assetModule);
await asset.downloadAsync();
const fileUri = asset.localUri || asset.uri;
const fileBuffer = await FileSystem.readAsStringAsync(fileUri, {
encoding: FileSystem.EncodingType.Base64,
});
return float32ArrayFromPCMBinaryBuffer(fileBuffer);
}
metro.config.js:
...
config.resolver.assetExts.push("wav");
...
Steps to reproduce
- Copy and Paste my
App.js and wav.js
- Use a german wav file
- Watch sequence is empty
Snack or a link to a repository
No response
React Native Executorch version
0.4.3
React Native version
0.79.3
Platforms
Android
JavaScript runtime
V8
Workflow
Expo Dev Client
Architecture
Fabric (New Architecture)
Build type
Debug mode
Device
Android emulator
Device model
No response
AI model
https://huggingface.co/software-mansion/react-native-executorch-whisper-tiny
Performance logs
No response
Acknowledgements
Yes
Description
Hi everyone,
I’ve started integrating
react-native-executorchand using theuseSpeechToTexthook. For testing, I recorded a valid 3-second WAV file (mono, 16kHz, 16-bit) and placed it in the assets folder.My goal is to support multiple languages, so I tried using
whisperMultilingual. Unfortunately, it doesn’t seem to work at all no output sequences are returned. I then switched to the moonshine model, and it did recognize my speech, but it interpreted it as English instead of German. Switching back to whisper.en or whisperMultilingual again results in no output.It seems like whisper.en and whisperMultilingual might not be functioning correctly, or perhaps I’ve misunderstood the usage.
Could this be a bug, or am I doing something wrong?
Thanks in advance!
Here’s a example of what I’ve implemented:
App.js
wav.js:
metro.config.js:
Steps to reproduce
App.jsandwav.jsSnack or a link to a repository
No response
React Native Executorch version
0.4.3
React Native version
0.79.3
Platforms
Android
JavaScript runtime
V8
Workflow
Expo Dev Client
Architecture
Fabric (New Architecture)
Build type
Debug mode
Device
Android emulator
Device model
No response
AI model
https://huggingface.co/software-mansion/react-native-executorch-whisper-tiny
Performance logs
No response
Acknowledgements
Yes