Skip to content

whisper : expose internal VAD speech segments#3916

Merged
danbev merged 1 commit into
ggml-org:masterfrom
buxuku:pr/vad-speech-segments
Jul 1, 2026
Merged

whisper : expose internal VAD speech segments#3916
danbev merged 1 commit into
ggml-org:masterfrom
buxuku:pr/vad-speech-segments

Conversation

@buxuku

@buxuku buxuku commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

When transcribing with vad = true, whisper already detects the speech segments internally and keeps them in the state, but there's no way to read them back. This exposes them through whisper_full_n_vad_segments() and whisper_full_get_vad_segment_t0/t1() (with _from_state variants). The times are on the original audio timeline in centiseconds, and the count is 0 when VAD wasn't used.

The point is to let callers reuse whisper's own speech boundaries — for instance to clip or align subtitles to speech — instead of running a separate VAD pass over the same audio.

Extended tests/test-vad-full.cpp to check the segments come back non-empty and in order. Built and ran it on macOS.

@danbev

danbev commented Jul 1, 2026

Copy link
Copy Markdown
Member

@buxuku Could you take a look at the conflicts and resolve them.
And if you rebase or merge upstream/master the two failing CI runs should be fixed now. Thanks!

When transcribing with params.vad = true, whisper already computes the speech
segments and keeps them in the state. Expose them so callers can reuse those
boundaries (for example to align or clip subtitles to speech) instead of running
a second, separate VAD pass.

Times are on the original audio timeline in centiseconds; the count is 0 when VAD
was not used. test-vad-full.cpp checks the segments are ordered and non-empty.
@buxuku buxuku force-pushed the pr/vad-speech-segments branch from 99d2d51 to 6809480 Compare July 1, 2026 08:32
@buxuku

buxuku commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

@buxuku Could you take a look at the conflicts and resolve them. And if you rebase or merge upstream/master the two failing CI runs should be fixed now. Thanks!

Done! Conflicts were just my stuff sitting right next to the #3910 token_t0/t1 changes, so I kept both. Rebased on master, should be happy now

@danbev danbev merged commit 6fc7c33 into ggml-org:master Jul 1, 2026
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants