whisper : expose internal VAD speech segments by buxuku · Pull Request #3916 · ggml-org/whisper.cpp

buxuku · 2026-06-27T13:32:24Z

When transcribing with vad = true, whisper already detects the speech segments internally and keeps them in the state, but there's no way to read them back. This exposes them through whisper_full_n_vad_segments() and whisper_full_get_vad_segment_t0/t1() (with _from_state variants). The times are on the original audio timeline in centiseconds, and the count is 0 when VAD wasn't used.

The point is to let callers reuse whisper's own speech boundaries — for instance to clip or align subtitles to speech — instead of running a separate VAD pass over the same audio.

Extended tests/test-vad-full.cpp to check the segments come back non-empty and in order. Built and ran it on macOS.

danbev · 2026-07-01T07:13:02Z

@buxuku Could you take a look at the conflicts and resolve them.
And if you rebase or merge upstream/master the two failing CI runs should be fixed now. Thanks!

When transcribing with params.vad = true, whisper already computes the speech segments and keeps them in the state. Expose them so callers can reuse those boundaries (for example to align or clip subtitles to speech) instead of running a second, separate VAD pass. Times are on the original audio timeline in centiseconds; the count is 0 when VAD was not used. test-vad-full.cpp checks the segments are ordered and non-empty.

buxuku · 2026-07-01T08:40:39Z

@buxuku Could you take a look at the conflicts and resolve them. And if you rebase or merge upstream/master the two failing CI runs should be fixed now. Thanks!

Done! Conflicts were just my stuff sitting right next to the #3910 token_t0/t1 changes, so I kept both. Rebased on master, should be happy now

buxuku force-pushed the pr/vad-speech-segments branch from 99d2d51 to 6809480 Compare July 1, 2026 08:32

danbev approved these changes Jul 1, 2026

View reviewed changes

danbev merged commit 6fc7c33 into ggml-org:master Jul 1, 2026
49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

whisper : expose internal VAD speech segments#3916

whisper : expose internal VAD speech segments#3916
danbev merged 1 commit into
ggml-org:masterfrom
buxuku:pr/vad-speech-segments

buxuku commented Jun 27, 2026

Uh oh!

danbev commented Jul 1, 2026

Uh oh!

buxuku commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

buxuku commented Jun 27, 2026

Uh oh!

danbev commented Jul 1, 2026

Uh oh!

buxuku commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants