Feature: Add streaming support for parakeet by justynleung · Pull Request #3900 · ggml-org/whisper.cpp

justynleung · 2026-06-22T01:21:02Z

Thanks @danbev again for the amazing parakeet support (#3735).

This merge request aim to add streaming support for Parakeet models as seen in huggingface and Nvidia Nemo ASR example.

(AI disclosure) Streaming mode graphical explanation:

Full audio timeline:
0s -------------------------------------------------------------------------------------- n_samples
                                                                                  
                buffer_start         chunk_start         chunk_end        buffer_end
                     |                    |                  |                |
                     v                    v                  v                v
---------------------[====================[==================]================]----
                      <------ left ctx ---><------ chunk -----><-- right ctx -->

                      <--------------------- ENCODED -------------------------->
                                            <---- DECODED ---->

A buffer window will slide across the full audio. Full window is encoded and only middle chunk is decoded.

Streaming reuse predictor state for next chunk processing until full audio is processed.

Disclosure:

[v ] I have read contribution guideline
[v ] I have search for existing PRs to prevent duplicating efforts

AI disclosure:
Disclosure: AI was used for C++ 11 syntax assistance and researching related issue #3735. All logic has been manually verified and tested. Test result in second comment for readability.

frame_stride_ms

justynleung · 2026-06-22T01:21:19Z

Verified test on ubuntu 24.04:

Streaming

$ ./build/bin/parakeet-cli -m models/ggml-parakeet-tdt-0.6b-v3-f16.bin \
   -f samples/George_W_Bush_Columbia_FINAL.ogg --stream --left-context-ms 10000 \
   --chunk-ms 2000 --right-context-ms 2000

Loading Parakeet model from: models/ggml-parakeet-tdt-0.6b-v3-f16.bin
parakeet_init_from_file_with_params_no_state: loading model from 'models/ggml-parakeet-tdt-0.6b-v3-f16.bin'
parakeet_init_with_params_no_state: use gpu    = 1
parakeet_init_with_params_no_state: gpu_device = 0
parakeet_init_with_params_no_state: devices    = 1
parakeet_init_with_params_no_state: backends   = 1
parakeet_model_load: loading model
parakeet_model_load: arch                   = Parakeet TDT
parakeet_model_load: n_vocab                = 8192
parakeet_model_load: n_audio_ctx            = 5000
parakeet_model_load: n_audio_state          = 1024
parakeet_model_load: n_audio_head           = 8
parakeet_model_load: n_audio_layer          = 24
parakeet_model_load: n_mels                 = 128
parakeet_model_load: n_fft                  = 512
parakeet_model_load: eps                    = 0.000010
parakeet_model_load: ftype                  = 1
parakeet_model_load: qntvr                  = 0
parakeet_model_load: subsampling_factor     = 8
parakeet_model_load: n_subsampling_channels = 256
parakeet_model_load: n_conv_kernel          = 9
parakeet_model_load: n_pred_dim             = 640
parakeet_model_load: n_pred_layers          = 2
parakeet_model_load: n_tdt_durations        = 5
parakeet_model_load: n_max_tokens           = 10
parakeet_model_load: loaded window function with 400 samples
parakeet_model_load: loaded tdt_durations: [0 1 2 3 4 ]
parakeet_model_load: loaded vocab with 8192 tokens (blank_id=8192, unk=0, bos=4, eos=3)
parakeet_model_load:          CPU total size =  1255.64 MB
parakeet_model_load: model size    = 1255.64 MB
parakeet_init_with_params_no_state: initialized mel cache with n_fft = 512
parakeet_backend_init_gpu: device 0: CPU (type: 0)
parakeet_backend_init_gpu: no GPU found
parakeet_init_state: enc_out state:    0.00 MB (meta) +    2.44 MB (data)
parakeet_init_state: lstm state:    0.00 MB (meta) +    0.01 MB (data)
parakeet_init_state: pred state:    0.00 MB (meta) +    0.00 MB (data)
parakeet_init_state: compute buffer (encode) =  340.67 MB
parakeet_init_state: compute buffer (decode) =   67.12 MB
Successfully loaded Parakeet model
system_info: n_threads = 4 / 8 | PARAKEET : CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | OPENMP = 1 | REPACK = 1 | 

Processing file: samples/George_W_Bush_Columbia_FINAL.ogg
read_audio_data: reading audio data from 'samples/George_W_Bush_Columbia_FINAL.ogg' ...
read_audio_data: trying to decode with miniaudio
My fellow Americans, this day has brought terrible news and great sadness to our country. At nine o'clock this morning, mission control in Houston lost contact with our space shuttle Columbia. A short time later, debris was seen falling from the skies above Texas. The Columbia's lost. There are no survivors. On board was a crew of seven. Colonel Rick Husband, Lieutenant Colonel Michael Anderson, Commander Laurel Clark, Captain David Brown, Commander William McCool, Dr. Kulpna Chavla, and Ilan Ramon, a colonel in the Israeli Air Force. These men and women assumed great risk in the service to all humanity. In an age when space flight has come to seem almost routine, it is easy to overlook the dangers of travel by rocket and the difficulties of navigating the fierce outer atmosphere of the Earth. These astronauts knew the dangers, and they faced them willingly, knowing they had a high and noble purpose in life. Because of their courage and daring and idealism, we will miss them all the more. All Americans today are thinking as well of the families of these men and women who have been given this sudden shock and grief. You're not alone. Our entire nation grieves with you. The cause in which they died will continue. Mankind is led into the darkness beyond our world by the inspiration of discovery and the longing to understand. Our journey into space will go on. In the skies today, we saw destruction and tragedy. Yet farther than we can see, there is comfort and hope. In the words of the prophet Isaiah, lift your eyes and look to the heavens. Who created all these? He who brings out the starry hosts one by one and calls them each by name, because of his great power and mighty strength, not one of them is missing. The same Creator who names the stars also knows the names of the seven souls we mourn today. The crew of the shuttle Columbia did not return safely to Earth. Yet we can pray that all are safely home. May God bless the grieving families, and may God may God continue to bless America.

parakeet_print_timings:     load time =   436.86 ms
parakeet_print_timings:     fallbacks =   0 p /   0 h
parakeet_print_timings:      mel time =  1370.00 ms
parakeet_print_timings:   sample time =     5.51 ms /   686 runs (     0.01 ms per run)
parakeet_print_timings:   encode time = 169671.31 ms /   100 runs (  1696.71 ms per run)
parakeet_print_timings:   decode time =   532.28 ms /   902 runs (     0.59 ms per run)
parakeet_print_timings:  predict time =   493.24 ms /   687 runs (     0.72 ms per run)
parakeet_print_timings:    - build     =     9.07 ms /   687 runs (     0.01 ms per run)
parakeet_print_timings:    - alloc     =    17.45 ms /   687 runs (     0.03 ms per run)
parakeet_print_timings:    - compute   =   466.50 ms /   687 runs (     0.68 ms per run)
parakeet_print_timings:    total time = 172764.50 ms

Non-streaming #3735

$ ./build/bin/parakeet-cli -m models/ggml-parakeet-tdt-0.6b-v3-f16.bin \
   -f samples/George_W_Bush_Columbia_FINAL.ogg

Loading Parakeet model from: models/ggml-parakeet-tdt-0.6b-v3-f16.bin
parakeet_init_from_file_with_params_no_state: loading model from 'models/ggml-parakeet-tdt-0.6b-v3-f16.bin'
parakeet_init_with_params_no_state: use gpu    = 1
parakeet_init_with_params_no_state: gpu_device = 0
parakeet_init_with_params_no_state: devices    = 1
parakeet_init_with_params_no_state: backends   = 1
parakeet_model_load: loading model
parakeet_model_load: arch                   = Parakeet TDT
parakeet_model_load: n_vocab                = 8192
parakeet_model_load: n_audio_ctx            = 5000
parakeet_model_load: n_audio_state          = 1024
parakeet_model_load: n_audio_head           = 8
parakeet_model_load: n_audio_layer          = 24
parakeet_model_load: n_mels                 = 128
parakeet_model_load: n_fft                  = 512
parakeet_model_load: eps                    = 0.000010
parakeet_model_load: ftype                  = 1
parakeet_model_load: qntvr                  = 0
parakeet_model_load: subsampling_factor     = 8
parakeet_model_load: n_subsampling_channels = 256
parakeet_model_load: n_conv_kernel          = 9
parakeet_model_load: n_pred_dim             = 640
parakeet_model_load: n_pred_layers          = 2
parakeet_model_load: n_tdt_durations        = 5
parakeet_model_load: n_max_tokens           = 10
parakeet_model_load: loaded window function with 400 samples
parakeet_model_load: loaded tdt_durations: [0 1 2 3 4 ]
parakeet_model_load: loaded vocab with 8192 tokens (blank_id=8192, unk=0, bos=4, eos=3)
parakeet_model_load:          CPU total size =  1255.64 MB
parakeet_model_load: model size    = 1255.64 MB
parakeet_init_with_params_no_state: initialized mel cache with n_fft = 512
parakeet_backend_init_gpu: device 0: CPU (type: 0)
parakeet_backend_init_gpu: no GPU found
parakeet_init_state: enc_out state:    0.00 MB (meta) +    2.44 MB (data)
parakeet_init_state: lstm state:    0.00 MB (meta) +    0.01 MB (data)
parakeet_init_state: pred state:    0.00 MB (meta) +    0.00 MB (data)
parakeet_init_state: compute buffer (encode) =  340.67 MB
parakeet_init_state: compute buffer (decode) =   67.12 MB
Successfully loaded Parakeet model
system_info: n_threads = 4 / 8 | PARAKEET : CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | OPENMP = 1 | REPACK = 1 | 

Processing file: samples/George_W_Bush_Columbia_FINAL.ogg
read_audio_data: reading audio data from 'samples/George_W_Bush_Columbia_FINAL.ogg' ...
read_audio_data: trying to decode with miniaudio
My fellow Americans, this day has brought terrible news and great sadness to our country. At 9 o'clock this morning, mission control in Houston lost contact with our space shuttle Columbia. A short time later, debris was seen falling from the skies above Texas. The Columbia's lost. There are no survivors. On board was a crew of seven. Colonel Rick Husband, Lieutenant Colonel Michael Anderson, Commander Laurel Clark, Captain David Brown, Commander William McCool, Dr. Kulpna Shavla, and Ilan Ramon, a colonel in the Israeli Air Force. These men and women assumed great risk in the service to all humanity. In an age when space flight has come to seem almost routine. It is easy to overlook the dangers of travel by rocket and the difficulties of navigating the fierce outer atmosphere of the earth. These astronauts knew the dangers and they faced them willingly, knowing they had a high and noble purpose in life. Because of their courage and daring and idealism, we will miss them all the more. And those you loved will always have the respect and gratitude of this country. The cause in which they died will continue. Mankind is led into the darkness beyond our world by the inspiration of discovery and the longing to understand. Our journey into space will go on. In the skies today, we saw destruction and tragedy. Yet farther than we can see, there is comfort and hope. In the words of the prophet Isaiah, lift your eyes and look to the heavens. Who created all these? He who brings out the starry hosts one by one and calls them each by name. Because of his great power and mighty strength, not one of them is missing. The same creator who names the stars also knows the names of the seven souls we mourn today. The crew of the shuttle Columbia did not return safely to Earth. Yet we can pray that all are safely home. May God bless the grieving families and make out, may God continue to bless America.

parakeet_print_timings:     load time =   452.80 ms
parakeet_print_timings:     fallbacks =   0 p /   0 h
parakeet_print_timings:      mel time =   155.98 ms
parakeet_print_timings:   sample time =     4.99 ms /   655 runs (     0.01 ms per run)
parakeet_print_timings:   encode time = 44032.14 ms /     1 runs ( 44032.14 ms per run)
parakeet_print_timings:   decode time =   455.96 ms /   848 runs (     0.54 ms per run)
parakeet_print_timings:  predict time =   428.28 ms /   656 runs (     0.65 ms per run)
parakeet_print_timings:    - build     =     7.45 ms /   656 runs (     0.01 ms per run)
parakeet_print_timings:    - alloc     =    14.46 ms /   656 runs (     0.02 ms per run)
parakeet_print_timings:    - compute   =   406.17 ms /   656 runs (     0.62 ms per run)
parakeet_print_timings:    total time = 45747.02 ms

justynleung · 2026-06-22T01:26:23Z

I would add that there is a typo in /example/parakeet-cli/README.md , the example cli is missing ggml- prefix.

- $ ./build/bin/parakeet-cli -m models/parakeet-tdt-0.6b-v3-f16.bin -f samples/jfk.wav
+ $ ./build/bin/parakeet-cli -m models/ggml-parakeet-tdt-0.6b-v3-f16.bin -f samples/jfk.wav

I am happy to include this fix in the current merge request.

justynleung · 2026-06-22T01:31:26Z

I've left the indentation in examples/parakeet-cli/parakeet-cli.cpp as-is for now because it makes this PR easier to review.

danbev · 2026-06-23T07:05:02Z

I would add that there is a typo in /example/parakeet-cli/README.md , the example cli is missing ggml- prefix.

Thanks, I've opened #3906 to fix this.

danbev · 2026-06-24T06:08:50Z

@justynleung Thanks for your effort on this!

I also went down this route in the original parakeet PR but reverted the changes as I could not get it to work correctly, and I used a similar implementation to yours. I ran into issues like the one in your above streaming output for example:

May God bless the grieving families, and may God may God continue to bless America.

My understanding is that the TDT (Token-and-Duration Transducer) will not work well for streaming because of the emitted token durations. A different, dedicated streaming model, like the parakeet_realtime_eou_120m-v1, or nemotron-3.5-asr-streaming-0.6b should be used instead. Depending on the interest from the community this is something we could look into supporting in the future.

justynleung · 2026-06-26T00:38:31Z

Thank you for the feedback. I am glad that it was taken into consideration already.

I ran some test again and can confirm the same issue arise.

I agree that other models will be better at streaming transcription. I will close the pull request.

justynleung added 13 commits June 20, 2026 16:21

feat: parakeet streaming support

882e2d5

refrac: accept ms only for user input, remove unused parse function

b0513f6

restore upstream mel_frams const (unused)

b22793c

move stream_params before the file loop

3f35a94

rewite comment with reference to Nvidia Nemo parakeet streaming example

9d466dc

add internal function for parakeet_decode() and parakeet_decode_stream()

6a6e193

Add TODO: preserve RNN-T predictor state

54ec91c

tighten stream params to only accpet positive values and multiple of

460f885

frame_stride_ms

Update README and --help description

da3b0dc

add comment to explain the code

0acf320

Add comments and clarify code logic

de77be7

reuse predictor state from prior chunk in streaming

30a7bf2

Update readme to use default stream params

0bc538a

justynleung closed this Jun 26, 2026

justynleung deleted the parakeet-stream branch June 26, 2026 00:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Add streaming support for parakeet#3900

Feature: Add streaming support for parakeet#3900
justynleung wants to merge 13 commits into
ggml-org:masterfrom
justynleung:parakeet-stream

justynleung commented Jun 22, 2026 •

edited

Loading

Uh oh!

justynleung commented Jun 22, 2026 •

edited

Loading

Uh oh!

justynleung commented Jun 22, 2026 •

edited

Loading

Uh oh!

justynleung commented Jun 22, 2026

Uh oh!

danbev commented Jun 23, 2026 •

edited

Loading

Uh oh!

danbev commented Jun 24, 2026

Uh oh!

justynleung commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

justynleung commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

justynleung commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Streaming

Non-streaming #3735

Uh oh!

justynleung commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

justynleung commented Jun 22, 2026

Uh oh!

danbev commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danbev commented Jun 24, 2026

Uh oh!

justynleung commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

justynleung commented Jun 22, 2026 •

edited

Loading

justynleung commented Jun 22, 2026 •

edited

Loading

justynleung commented Jun 22, 2026 •

edited

Loading

danbev commented Jun 23, 2026 •

edited

Loading