You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Audio: MFCC: Add Voice Activity Detection based on Mel spectrum
This patch adds a new mfcc_vad module. It operates on the Mel
log spectrum values produced by the MFCC component. The VAD is
very simple and not very selective for voice vs. other signals.
But the continuously updated background noise estimate prevents
stationary noises from triggering the VAD.
The algorithm tracks a per-bin noise floor (instant-down, slow-rise)
and computes a A-weighted energy delta. The used weight emphasizes
speech frequencies. Speech is declared when the delta exceeds a
threshold (0.35 in Q9.23) with a 20-frame hangover to prevent rapid
toggling.
The VAD flag is inserted into the output stream as the first value
after the magic header word in all format paths (S16, S24, S32).
A new Kconfig option CONFIG_COMP_MFCC_VAD (depends on COMP_MFCC,
default y) gates compilation of the VAD code and the stream format
change.
The README.txt file is updated to show help how to run the
example Python script sof_mel_to_text_live_dsp_vad.py. It uses
the MFCC Mel spectrum data and VAD flags stream as audio features
for Whisper speech to text model. The formatting is changed to md.
Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
0 commit comments