Skip to content

0% live recognition (while works with wave files) #2026

@philippeflorent

Description

@philippeflorent

Hi,

I try to have vosk working on a RPI 5 and it just pukes garbage

  • I try to recognize the voice from a video https://www.youtube.com/watch?v=M2iX6HQOoLg
  • I tested the mic with arecord/apaly and it works fine, not too much noise
  • I tested recognition directly on the file and it works fine, but once I do real time recognition it is producing garbage

pre-recorded (arecord -D hw:2,0 -f cd -c 1 -t wav -d 20 test_mic_rec.wav)

import os
import vosk
import wave
import json

# Path to your model
model_path = os.path.join(os.getcwd(), "vosk-model-small-en-us-0.15")
model = vosk.Model(model_path)

# Path to WAV file
wav_path = "test_mic_rec.wav"

# Open the WAV file
wf = wave.open(wav_path, "rb")
if wf.getnchannels() != 1 or wf.getsampwidth() != 2:
    print("Warning: WAV file should be mono 16-bit")
if wf.getframerate() not in [8000, 16000, 44100]:
    print(f"Note: WAV file has unusual samplerate: {wf.getframerate()} Hz")

rec = vosk.KaldiRecognizer(model, wf.getframerate())

print("Recognizing from WAV file...")

while True:
    data = wf.readframes(4000)
    if len(data) == 0:
        break
    if rec.AcceptWaveform(data):
        result = json.loads(rec.Result())
        text = result.get("text", "")
        if text:
            print("Recognized:", text)
    else:
        partial = json.loads(rec.PartialResult()).get("partial", "")
        if partial:
            print("Partial:", partial)

# Print final result
final_result = json.loads(rec.FinalResult())
if final_result.get("text", ""):
    print("Final recognized text:", final_result["text"])
Partial: you know it's a little bit similar to like why do we need temperature and pressure and physics know those don't exist if we just look at the microscopic but it's only by by course screaming and looking at the larger scale that you can understand thelma dynamics and in the same way it's only by zooming out and looking at the singer genesis the you can understand the dynamics of transitions of phase
Final recognized text: you know it's a little bit similar to like why do we need temperature and pressure and physics know those don't exist if we just look at the microscopic but it's only by by course screaming and looking at the larger scale that you can understand thermal dynamics and in the same way it's only by zooming out and looking at the singer genesis the you can understand the dynamics of transitions of phase transitions

live recognition

import sounddevice as sd
import vosk
import queue
import json
import os

usb_index = 0  # confirmed from arecord -l

device_info = sd.query_devices(usb_index)
samplerate = int(device_info['default_samplerate'])  # 44100
channels = 1

print("Using device:", device_info['name'], "Samplerate:", samplerate)

q = queue.Queue()
model = vosk.Model(os.path.join(os.getcwd(), "vosk-model-small-en-us-0.15"))

def callback(indata, frames, time, status):
    q.put(bytes(indata))

with sd.RawInputStream(samplerate=samplerate, blocksize=8000, dtype='int16',
                       channels=channels, callback=callback, device=usb_index):
    rec = vosk.KaldiRecognizer(model, samplerate)
    print("Start speaking...")
    while True:
        data = q.get()
        if rec.AcceptWaveform(data):
            print(json.loads(rec.Result())['text'])
        else:
            print(json.loads(rec.PartialResult())['partial'])

it struggles to find any word and produces gibberish

listen
listen
listen looking at
listen looking at the
listen looking at the and
listen looking at the and
listen looking at the end of
listen looking at the and of
listen looking at the and of trans transformative
listen looking at the and of trans transformative
listen looking at the and of trans fats are
listen looking at the and of trans fats feel or
listen looking at the and of trans fats feel or
listen looking at the and of trans fats feel or
listen looking at the and of trans fats feel or intimate
listen looking at the and of trans fats feel or intimate but
listen looking at the and of trans fats feel or intimate but where
listen looking at the and of trans fats feel or intimate but where
listen looking at the and of trans fats feel or intimate but where thing them
listen looking at the and of trans fats feel or intimate but where thing them or
listen looking at the and of trans fats feel or intimate but where thing them are
listen looking at the and of trans fats feel or intimate but where thing them are
listen looking at the and of trans fats feel or intimate but where thing them are
listen looking at the and of trans fats feel or intimate but where thing them are a lot
listen looking at the and of trans fats feel or intimate but where thing them are you up being a
listen looking at the and of trans fats feel or intimate but where thing them are you up being a little
listen looking at the and of trans fats feel or intimate but where thing them are you up being a little
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this one
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this one
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this one with
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this one with
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this one with political
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this one with political
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this one with political
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this one with political

what can I check ? have I missed something ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions