Skip to content

Commit 8375e50

Browse files
author
linxiaodong
committed
addon.node : VAD-aligned (faster-whisper-like) timeline with real gaps
With VAD enabled, whisper.cpp concatenates all detected speech into a single stream, so addon.node returned a gap-less timeline where every segment end equals the next segment start. Add an alignment layer that puts timestamps back on the original timeline with real silence gaps, controlled by three new params: align_mode "hybrid" (default) | "run" | "word" | "legacy" vad_merge_gap_ms adjacent VAD segments whose silence gap is <= this (ms) merge into one run; a larger gap becomes a real gap (default 2000; negative disables aligned mode) word_gap_ms word/hybrid: start a new segment when the gap between two consecutive words exceeds this (default 500) - hybrid (C): VAD-grouped runs, each run sliced into its own buffer and re-segmented by word-level gaps with every segment end clamped to its last word. Best boundary accuracy; default. - run (A): per-run decode emitting whisper's own segments; gaps between runs. - word (B): single decode pass + word-gap re-segmentation; uses core VAD via the new whisper_full_get_token_t0/t1 mapping when a VAD model is given. - legacy: original continuous single pass. Each run is decoded from a physically sliced buffer rather than via offset_ms/duration_ms (which only bound the outer seek loop and let neighbouring speech bleed into short runs); slice-relative timestamps are shifted back with a per-run base offset. Progress is rescaled across runs so the JS callback still sees a single monotonic 0..100.
1 parent 8e3c93a commit 8375e50

1 file changed

Lines changed: 384 additions & 22 deletions

File tree

0 commit comments

Comments
 (0)