Skip to content

Feature: Add streaming support for parakeet#3900

Closed
justynleung wants to merge 13 commits into
ggml-org:masterfrom
justynleung:parakeet-stream
Closed

Feature: Add streaming support for parakeet#3900
justynleung wants to merge 13 commits into
ggml-org:masterfrom
justynleung:parakeet-stream

Conversation

@justynleung

@justynleung justynleung commented Jun 22, 2026

Copy link
Copy Markdown

Thanks @danbev again for the amazing parakeet support (#3735).

This merge request aim to add streaming support for Parakeet models as seen in huggingface and Nvidia Nemo ASR example.

(AI disclosure) Streaming mode graphical explanation:

Full audio timeline:
0s -------------------------------------------------------------------------------------- n_samples
                                                                                  
                buffer_start         chunk_start         chunk_end        buffer_end
                     |                    |                  |                |
                     v                    v                  v                v
---------------------[====================[==================]================]----
                      <------ left ctx ---><------ chunk -----><-- right ctx -->

                      <--------------------- ENCODED -------------------------->
                                            <---- DECODED ---->                                         

A buffer window will slide across the full audio. Full window is encoded and only middle chunk is decoded.

Streaming reuse predictor state for next chunk processing until full audio is processed.

Disclosure:

  • [v ] I have read contribution guideline
  • [v ] I have search for existing PRs to prevent duplicating efforts

AI disclosure:
Disclosure: AI was used for C++ 11 syntax assistance and researching related issue #3735. All logic has been manually verified and tested. Test result in second comment for readability.

@justynleung

justynleung commented Jun 22, 2026

Copy link
Copy Markdown
Author

Verified test on ubuntu 24.04:

Streaming
$ ./build/bin/parakeet-cli -m models/ggml-parakeet-tdt-0.6b-v3-f16.bin \
   -f samples/George_W_Bush_Columbia_FINAL.ogg --stream --left-context-ms 10000 \
   --chunk-ms 2000 --right-context-ms 2000

Loading Parakeet model from: models/ggml-parakeet-tdt-0.6b-v3-f16.bin
parakeet_init_from_file_with_params_no_state: loading model from 'models/ggml-parakeet-tdt-0.6b-v3-f16.bin'
parakeet_init_with_params_no_state: use gpu    = 1
parakeet_init_with_params_no_state: gpu_device = 0
parakeet_init_with_params_no_state: devices    = 1
parakeet_init_with_params_no_state: backends   = 1
parakeet_model_load: loading model
parakeet_model_load: arch                   = Parakeet TDT
parakeet_model_load: n_vocab                = 8192
parakeet_model_load: n_audio_ctx            = 5000
parakeet_model_load: n_audio_state          = 1024
parakeet_model_load: n_audio_head           = 8
parakeet_model_load: n_audio_layer          = 24
parakeet_model_load: n_mels                 = 128
parakeet_model_load: n_fft                  = 512
parakeet_model_load: eps                    = 0.000010
parakeet_model_load: ftype                  = 1
parakeet_model_load: qntvr                  = 0
parakeet_model_load: subsampling_factor     = 8
parakeet_model_load: n_subsampling_channels = 256
parakeet_model_load: n_conv_kernel          = 9
parakeet_model_load: n_pred_dim             = 640
parakeet_model_load: n_pred_layers          = 2
parakeet_model_load: n_tdt_durations        = 5
parakeet_model_load: n_max_tokens           = 10
parakeet_model_load: loaded window function with 400 samples
parakeet_model_load: loaded tdt_durations: [0 1 2 3 4 ]
parakeet_model_load: loaded vocab with 8192 tokens (blank_id=8192, unk=0, bos=4, eos=3)
parakeet_model_load:          CPU total size =  1255.64 MB
parakeet_model_load: model size    = 1255.64 MB
parakeet_init_with_params_no_state: initialized mel cache with n_fft = 512
parakeet_backend_init_gpu: device 0: CPU (type: 0)
parakeet_backend_init_gpu: no GPU found
parakeet_init_state: enc_out state:    0.00 MB (meta) +    2.44 MB (data)
parakeet_init_state: lstm state:    0.00 MB (meta) +    0.01 MB (data)
parakeet_init_state: pred state:    0.00 MB (meta) +    0.00 MB (data)
parakeet_init_state: compute buffer (encode) =  340.67 MB
parakeet_init_state: compute buffer (decode) =   67.12 MB
Successfully loaded Parakeet model
system_info: n_threads = 4 / 8 | PARAKEET : CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | OPENMP = 1 | REPACK = 1 | 

Processing file: samples/George_W_Bush_Columbia_FINAL.ogg
read_audio_data: reading audio data from 'samples/George_W_Bush_Columbia_FINAL.ogg' ...
read_audio_data: trying to decode with miniaudio
My fellow Americans, this day has brought terrible news and great sadness to our country. At nine o'clock this morning, mission control in Houston lost contact with our space shuttle Columbia. A short time later, debris was seen falling from the skies above Texas. The Columbia's lost. There are no survivors. On board was a crew of seven. Colonel Rick Husband, Lieutenant Colonel Michael Anderson, Commander Laurel Clark, Captain David Brown, Commander William McCool, Dr. Kulpna Chavla, and Ilan Ramon, a colonel in the Israeli Air Force. These men and women assumed great risk in the service to all humanity. In an age when space flight has come to seem almost routine, it is easy to overlook the dangers of travel by rocket and the difficulties of navigating the fierce outer atmosphere of the Earth. These astronauts knew the dangers, and they faced them willingly, knowing they had a high and noble purpose in life. Because of their courage and daring and idealism, we will miss them all the more. All Americans today are thinking as well of the families of these men and women who have been given this sudden shock and grief. You're not alone. Our entire nation grieves with you. The cause in which they died will continue. Mankind is led into the darkness beyond our world by the inspiration of discovery and the longing to understand. Our journey into space will go on. In the skies today, we saw destruction and tragedy. Yet farther than we can see, there is comfort and hope. In the words of the prophet Isaiah, lift your eyes and look to the heavens. Who created all these? He who brings out the starry hosts one by one and calls them each by name, because of his great power and mighty strength, not one of them is missing. The same Creator who names the stars also knows the names of the seven souls we mourn today. The crew of the shuttle Columbia did not return safely to Earth. Yet we can pray that all are safely home. May God bless the grieving families, and may God may God continue to bless America.

parakeet_print_timings:     load time =   436.86 ms
parakeet_print_timings:     fallbacks =   0 p /   0 h
parakeet_print_timings:      mel time =  1370.00 ms
parakeet_print_timings:   sample time =     5.51 ms /   686 runs (     0.01 ms per run)
parakeet_print_timings:   encode time = 169671.31 ms /   100 runs (  1696.71 ms per run)
parakeet_print_timings:   decode time =   532.28 ms /   902 runs (     0.59 ms per run)
parakeet_print_timings:  predict time =   493.24 ms /   687 runs (     0.72 ms per run)
parakeet_print_timings:    - build     =     9.07 ms /   687 runs (     0.01 ms per run)
parakeet_print_timings:    - alloc     =    17.45 ms /   687 runs (     0.03 ms per run)
parakeet_print_timings:    - compute   =   466.50 ms /   687 runs (     0.68 ms per run)
parakeet_print_timings:    total time = 172764.50 ms

Non-streaming #3735
$ ./build/bin/parakeet-cli -m models/ggml-parakeet-tdt-0.6b-v3-f16.bin \
   -f samples/George_W_Bush_Columbia_FINAL.ogg

Loading Parakeet model from: models/ggml-parakeet-tdt-0.6b-v3-f16.bin
parakeet_init_from_file_with_params_no_state: loading model from 'models/ggml-parakeet-tdt-0.6b-v3-f16.bin'
parakeet_init_with_params_no_state: use gpu    = 1
parakeet_init_with_params_no_state: gpu_device = 0
parakeet_init_with_params_no_state: devices    = 1
parakeet_init_with_params_no_state: backends   = 1
parakeet_model_load: loading model
parakeet_model_load: arch                   = Parakeet TDT
parakeet_model_load: n_vocab                = 8192
parakeet_model_load: n_audio_ctx            = 5000
parakeet_model_load: n_audio_state          = 1024
parakeet_model_load: n_audio_head           = 8
parakeet_model_load: n_audio_layer          = 24
parakeet_model_load: n_mels                 = 128
parakeet_model_load: n_fft                  = 512
parakeet_model_load: eps                    = 0.000010
parakeet_model_load: ftype                  = 1
parakeet_model_load: qntvr                  = 0
parakeet_model_load: subsampling_factor     = 8
parakeet_model_load: n_subsampling_channels = 256
parakeet_model_load: n_conv_kernel          = 9
parakeet_model_load: n_pred_dim             = 640
parakeet_model_load: n_pred_layers          = 2
parakeet_model_load: n_tdt_durations        = 5
parakeet_model_load: n_max_tokens           = 10
parakeet_model_load: loaded window function with 400 samples
parakeet_model_load: loaded tdt_durations: [0 1 2 3 4 ]
parakeet_model_load: loaded vocab with 8192 tokens (blank_id=8192, unk=0, bos=4, eos=3)
parakeet_model_load:          CPU total size =  1255.64 MB
parakeet_model_load: model size    = 1255.64 MB
parakeet_init_with_params_no_state: initialized mel cache with n_fft = 512
parakeet_backend_init_gpu: device 0: CPU (type: 0)
parakeet_backend_init_gpu: no GPU found
parakeet_init_state: enc_out state:    0.00 MB (meta) +    2.44 MB (data)
parakeet_init_state: lstm state:    0.00 MB (meta) +    0.01 MB (data)
parakeet_init_state: pred state:    0.00 MB (meta) +    0.00 MB (data)
parakeet_init_state: compute buffer (encode) =  340.67 MB
parakeet_init_state: compute buffer (decode) =   67.12 MB
Successfully loaded Parakeet model
system_info: n_threads = 4 / 8 | PARAKEET : CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | OPENMP = 1 | REPACK = 1 | 

Processing file: samples/George_W_Bush_Columbia_FINAL.ogg
read_audio_data: reading audio data from 'samples/George_W_Bush_Columbia_FINAL.ogg' ...
read_audio_data: trying to decode with miniaudio
My fellow Americans, this day has brought terrible news and great sadness to our country. At 9 o'clock this morning, mission control in Houston lost contact with our space shuttle Columbia. A short time later, debris was seen falling from the skies above Texas. The Columbia's lost. There are no survivors. On board was a crew of seven. Colonel Rick Husband, Lieutenant Colonel Michael Anderson, Commander Laurel Clark, Captain David Brown, Commander William McCool, Dr. Kulpna Shavla, and Ilan Ramon, a colonel in the Israeli Air Force. These men and women assumed great risk in the service to all humanity. In an age when space flight has come to seem almost routine. It is easy to overlook the dangers of travel by rocket and the difficulties of navigating the fierce outer atmosphere of the earth. These astronauts knew the dangers and they faced them willingly, knowing they had a high and noble purpose in life. Because of their courage and daring and idealism, we will miss them all the more. And those you loved will always have the respect and gratitude of this country. The cause in which they died will continue. Mankind is led into the darkness beyond our world by the inspiration of discovery and the longing to understand. Our journey into space will go on. In the skies today, we saw destruction and tragedy. Yet farther than we can see, there is comfort and hope. In the words of the prophet Isaiah, lift your eyes and look to the heavens. Who created all these? He who brings out the starry hosts one by one and calls them each by name. Because of his great power and mighty strength, not one of them is missing. The same creator who names the stars also knows the names of the seven souls we mourn today. The crew of the shuttle Columbia did not return safely to Earth. Yet we can pray that all are safely home. May God bless the grieving families and make out, may God continue to bless America.

parakeet_print_timings:     load time =   452.80 ms
parakeet_print_timings:     fallbacks =   0 p /   0 h
parakeet_print_timings:      mel time =   155.98 ms
parakeet_print_timings:   sample time =     4.99 ms /   655 runs (     0.01 ms per run)
parakeet_print_timings:   encode time = 44032.14 ms /     1 runs ( 44032.14 ms per run)
parakeet_print_timings:   decode time =   455.96 ms /   848 runs (     0.54 ms per run)
parakeet_print_timings:  predict time =   428.28 ms /   656 runs (     0.65 ms per run)
parakeet_print_timings:    - build     =     7.45 ms /   656 runs (     0.01 ms per run)
parakeet_print_timings:    - alloc     =    14.46 ms /   656 runs (     0.02 ms per run)
parakeet_print_timings:    - compute   =   406.17 ms /   656 runs (     0.62 ms per run)
parakeet_print_timings:    total time = 45747.02 ms

@justynleung

justynleung commented Jun 22, 2026

Copy link
Copy Markdown
Author

I would add that there is a typo in /example/parakeet-cli/README.md , the example cli is missing ggml- prefix.

- $ ./build/bin/parakeet-cli -m models/parakeet-tdt-0.6b-v3-f16.bin -f samples/jfk.wav
+ $ ./build/bin/parakeet-cli -m models/ggml-parakeet-tdt-0.6b-v3-f16.bin -f samples/jfk.wav

I am happy to include this fix in the current merge request.

@justynleung

Copy link
Copy Markdown
Author

I've left the indentation in examples/parakeet-cli/parakeet-cli.cpp as-is for now because it makes this PR easier to review.

@danbev

danbev commented Jun 23, 2026

Copy link
Copy Markdown
Member

I would add that there is a typo in /example/parakeet-cli/README.md , the example cli is missing ggml- prefix.

Thanks, I've opened #3906 to fix this.

@danbev

danbev commented Jun 24, 2026

Copy link
Copy Markdown
Member

@justynleung Thanks for your effort on this!

I also went down this route in the original parakeet PR but reverted the changes as I could not get it to work correctly, and I used a similar implementation to yours. I ran into issues like the one in your above streaming output for example:

May God bless the grieving families, and may God may God continue to bless America.

My understanding is that the TDT (Token-and-Duration Transducer) will not work well for streaming because of the emitted token durations. A different, dedicated streaming model, like the parakeet_realtime_eou_120m-v1, or nemotron-3.5-asr-streaming-0.6b should be used instead. Depending on the interest from the community this is something we could look into supporting in the future.

@justynleung

Copy link
Copy Markdown
Author

Thank you for the feedback. I am glad that it was taken into consideration already.

I ran some test again and can confirm the same issue arise.

I agree that other models will be better at streaming transcription. I will close the pull request.

@justynleung justynleung deleted the parakeet-stream branch June 26, 2026 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants