Skip to content

Possible Bug in whisper and/or whisperMultilingual #398

@eriCCsan

Description

@eriCCsan

Description

Hi everyone,

I’ve started integrating react-native-executorch and using the useSpeechToText hook. For testing, I recorded a valid 3-second WAV file (mono, 16kHz, 16-bit) and placed it in the assets folder.
My goal is to support multiple languages, so I tried using whisperMultilingual. Unfortunately, it doesn’t seem to work at all no output sequences are returned. I then switched to the moonshine model, and it did recognize my speech, but it interpreted it as English instead of German. Switching back to whisper.en or whisperMultilingual again results in no output.

It seems like whisper.en and whisperMultilingual might not be functioning correctly, or perhaps I’ve misunderstood the usage.

Could this be a bug, or am I doing something wrong?

Thanks in advance!

Here’s a example of what I’ve implemented:

App.js

import { StatusBar } from "expo-status-bar";
import { StyleSheet, Text, View, ScrollView } from "react-native";
import { useSpeechToText, SpeechToTextLanguage } from "react-native-executorch";
import { loadWavAsFloat32Array } from "./wav";

export default function App() {
  const {
    isGenerating,
    isReady,
    downloadProgress,
    sequence,
    error,
    transcribe,
  } = useSpeechToText({
    modelName: "whisperMultilingual",
    streamingConfig: "balanced",
  });

  return (
    <View style={styles.container}>
      <StatusBar style="auto" />
      <ScrollView contentContainerStyle={styles.container}>
        <Text style={styles.label}>
          isGenerating: <Text style={styles.value}>{String(isGenerating)}</Text>
        </Text>
        <Text style={styles.label}>
          isReady: <Text style={styles.value}>{String(isReady)}</Text>
        </Text>
        <Text style={styles.label}>
          downloadProgress: <Text style={styles.value}>{downloadProgress}</Text>
        </Text>
        <Text style={styles.label}>
          sequence: <Text style={styles.value}>{JSON.stringify(sequence)}</Text>
        </Text>
        <Text style={styles.label}>
          error:{" "}
          <Text style={styles.value}>{error ? error.toString() : "None"}</Text>
        </Text>
        <Text style={styles.label}>
          <Text
            style={{ color: "blue", textDecorationLine: "underline" }}
            onPress={async () => {
              try {
                const float32Array = await loadWavAsFloat32Array(
                  require("./assets/audio.wav")
                );
                const result = await transcribe(Array.from(float32Array), SpeechToTextLanguage.German);
              } catch (e) { 
                console.error("Transkription fehlgeschlagen:", e);
              }
            }}
          >
            WAV transkribieren
          </Text>
        </Text>
      </ScrollView>
    </View>
  );
}

const styles = StyleSheet.create({
  container: {
    flexGrow: 1,
    padding: 24,
    backgroundColor: "#fff",
  },
  label: {
    fontWeight: "bold",
    marginBottom: 8,
  },
  value: {
    fontWeight: "normal",
    color: "#333",
  },
});

wav.js:

import * as FileSystem from "expo-file-system";
import { Asset } from "expo-asset";
import { Buffer } from 'buffer';

function getPCMStart(buffer) {
  for (let i = 0; i < buffer.length - 4; i++) {
    if (
      buffer[i] === 0x64 && // 'd'
      buffer[i + 1] === 0x61 && // 'a'
      buffer[i + 2] === 0x74 && // 't'
      buffer[i + 3] === 0x61 // 'a'
    ) {
      return i + 8;
    }
  }
  return 44;
}

const float32ArrayFromPCMBinaryBuffer = (b64EncodedBuffer) => {
    const b64DecodedChunk = Buffer.from(b64EncodedBuffer, "base64");
    const pcmStart = getPCMStart(b64DecodedChunk);
    const pcmBuffer = b64DecodedChunk.subarray(pcmStart);
    const int16Array = new Int16Array(
        pcmBuffer.buffer,
        pcmBuffer.byteOffset,
        Math.floor(pcmBuffer.length / 2)
    );
    const float32Array = new Float32Array(int16Array.length);
    for (let i = 0; i < int16Array.length; i++) {
        float32Array[i] = Math.max(-1, Math.min(1, int16Array[i] / 32768));
    }
    return float32Array;
};

export async function loadWavAsFloat32Array(assetModule) {
    const asset = Asset.fromModule(assetModule);
    await asset.downloadAsync();
    const fileUri = asset.localUri || asset.uri;
    const fileBuffer = await FileSystem.readAsStringAsync(fileUri, {
        encoding: FileSystem.EncodingType.Base64,
    });
    return float32ArrayFromPCMBinaryBuffer(fileBuffer);
}

metro.config.js:

...
config.resolver.assetExts.push("wav");
...

Steps to reproduce

  1. Copy and Paste my App.js and wav.js
  2. Use a german wav file
  3. Watch sequence is empty

Snack or a link to a repository

No response

React Native Executorch version

0.4.3

React Native version

0.79.3

Platforms

Android

JavaScript runtime

V8

Workflow

Expo Dev Client

Architecture

Fabric (New Architecture)

Build type

Debug mode

Device

Android emulator

Device model

No response

AI model

https://huggingface.co/software-mansion/react-native-executorch-whisper-tiny

Performance logs

No response

Acknowledgements

Yes

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions