Hi,
I try to have vosk working on a RPI 5 and it just pukes garbage
- I try to recognize the voice from a video https://www.youtube.com/watch?v=M2iX6HQOoLg
- I tested the mic with arecord/apaly and it works fine, not too much noise
- I tested recognition directly on the file and it works fine, but once I do real time recognition it is producing garbage
pre-recorded (arecord -D hw:2,0 -f cd -c 1 -t wav -d 20 test_mic_rec.wav)
import os
import vosk
import wave
import json
# Path to your model
model_path = os.path.join(os.getcwd(), "vosk-model-small-en-us-0.15")
model = vosk.Model(model_path)
# Path to WAV file
wav_path = "test_mic_rec.wav"
# Open the WAV file
wf = wave.open(wav_path, "rb")
if wf.getnchannels() != 1 or wf.getsampwidth() != 2:
print("Warning: WAV file should be mono 16-bit")
if wf.getframerate() not in [8000, 16000, 44100]:
print(f"Note: WAV file has unusual samplerate: {wf.getframerate()} Hz")
rec = vosk.KaldiRecognizer(model, wf.getframerate())
print("Recognizing from WAV file...")
while True:
data = wf.readframes(4000)
if len(data) == 0:
break
if rec.AcceptWaveform(data):
result = json.loads(rec.Result())
text = result.get("text", "")
if text:
print("Recognized:", text)
else:
partial = json.loads(rec.PartialResult()).get("partial", "")
if partial:
print("Partial:", partial)
# Print final result
final_result = json.loads(rec.FinalResult())
if final_result.get("text", ""):
print("Final recognized text:", final_result["text"])
Partial: you know it's a little bit similar to like why do we need temperature and pressure and physics know those don't exist if we just look at the microscopic but it's only by by course screaming and looking at the larger scale that you can understand thelma dynamics and in the same way it's only by zooming out and looking at the singer genesis the you can understand the dynamics of transitions of phase
Final recognized text: you know it's a little bit similar to like why do we need temperature and pressure and physics know those don't exist if we just look at the microscopic but it's only by by course screaming and looking at the larger scale that you can understand thermal dynamics and in the same way it's only by zooming out and looking at the singer genesis the you can understand the dynamics of transitions of phase transitions
live recognition
import sounddevice as sd
import vosk
import queue
import json
import os
usb_index = 0 # confirmed from arecord -l
device_info = sd.query_devices(usb_index)
samplerate = int(device_info['default_samplerate']) # 44100
channels = 1
print("Using device:", device_info['name'], "Samplerate:", samplerate)
q = queue.Queue()
model = vosk.Model(os.path.join(os.getcwd(), "vosk-model-small-en-us-0.15"))
def callback(indata, frames, time, status):
q.put(bytes(indata))
with sd.RawInputStream(samplerate=samplerate, blocksize=8000, dtype='int16',
channels=channels, callback=callback, device=usb_index):
rec = vosk.KaldiRecognizer(model, samplerate)
print("Start speaking...")
while True:
data = q.get()
if rec.AcceptWaveform(data):
print(json.loads(rec.Result())['text'])
else:
print(json.loads(rec.PartialResult())['partial'])
it struggles to find any word and produces gibberish
listen
listen
listen looking at
listen looking at the
listen looking at the and
listen looking at the and
listen looking at the end of
listen looking at the and of
listen looking at the and of trans transformative
listen looking at the and of trans transformative
listen looking at the and of trans fats are
listen looking at the and of trans fats feel or
listen looking at the and of trans fats feel or
listen looking at the and of trans fats feel or
listen looking at the and of trans fats feel or intimate
listen looking at the and of trans fats feel or intimate but
listen looking at the and of trans fats feel or intimate but where
listen looking at the and of trans fats feel or intimate but where
listen looking at the and of trans fats feel or intimate but where thing them
listen looking at the and of trans fats feel or intimate but where thing them or
listen looking at the and of trans fats feel or intimate but where thing them are
listen looking at the and of trans fats feel or intimate but where thing them are
listen looking at the and of trans fats feel or intimate but where thing them are
listen looking at the and of trans fats feel or intimate but where thing them are a lot
listen looking at the and of trans fats feel or intimate but where thing them are you up being a
listen looking at the and of trans fats feel or intimate but where thing them are you up being a little
listen looking at the and of trans fats feel or intimate but where thing them are you up being a little
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this one
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this one
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this one with
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this one with
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this one with political
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this one with political
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this one with political
listen looking at the and of trans fats feel or intimate but where thing them are you up being alone is this one with political
what can I check ? have I missed something ?
Hi,
I try to have vosk working on a RPI 5 and it just pukes garbage
pre-recorded (arecord -D hw:2,0 -f cd -c 1 -t wav -d 20 test_mic_rec.wav)
live recognition
it struggles to find any word and produces gibberish
what can I check ? have I missed something ?