parakeet : add support for NVIDIA Parakeet by danbev · Pull Request #3735 · ggml-org/whisper.cpp

danbev · 2026-04-01T10:56:01Z

This is a work in progress to support the Parakeet model.

Usage instructions can be found in examples/parakeet-cli.

danbev · 2026-04-04T12:31:09Z

ffmpeg --enable-parakeet instructions

To try this out we need to first checkout this PRs branch:

$ git clone -b parakeet-support https://github.com/danbev/whisper.cpp.git

Then we build and install the parakeet library to a directory named build-install:

$ cat build-install.sh 
#!/bin/bash

set -e

build_dir=build
install_dir=build-install

rm -rf ${install_dir}
mkdir -p ${install_dir}

cmake -S . -B ${build_dir} -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/home/danbev/work/ai/whisper-work/${install_dir} \
    -DGGML_BACKEND_DIR=/home/danbev/work/ai/whisper-work/${install_dir}/lib \
    -DBUILD_SHARED_LIBS=ON \
    -DGGML_USE_CPU=ON \
    -DGGML_CPU_ALL_VARIANTS=ON \
    -DWHISPER_ALL_WARNINGS=ON \
    -DWHISPER_FATAL_WARNINGS=ON \
    -DGGML_BACKEND_DL=ON \
    -DGGML_CUDA=ON \
    -DCMAKE_CUDA_ARCHITECTURES="89-real" \
    -DGGML_CPU_AARCH64=OFF \
    -DGGML_CUDA_F16=ON

cmake --build ${build_dir} -j 8
cmake --install ${build_dir} --prefix ${install_dir}

Then we need to check out the following FFmpeg branch:

$ git clone -b parakeet.cpp https://code.ffmpeg.org/danbev/FFmpeg.git

And then build FFmpeg using the following configuration options and we explicitly
set PKG_CONFIG_PATH to point to the pkgconfig directory of the local
installation above:

$ export PKG_CONFIG_PATH="/home/danbev/work/ai/whisper-work/build-install/lib/pkgconfig${PKG_CONFIG_PATH:+:$PKG_CONFIG_PATH}"

$ ./configure --prefix=/usr --enable-version3 --disable-shared --enable-gpl \
  --enable-nonfree --enable-static --enable-pthreads --enable-filters \
  --enable-openssl --enable-runtime-cpudetect --enable-libvpx --enable-libx264 \
  --enable-libx265 --enable-libspeex --enable-libfreetype --enable-fontconfig \
  --enable-libzimg --enable-libvorbis --enable-libwebp --enable-libfribidi \
  --enable-libharfbuzz --enable-libass --enable-whisper --enable-parakeet

$ make

To run we need to set LD_LIBRARY_PATH to point to the lib directory of the local installation above so that the backends can be found at runtime. For macos this would instead be DYLD_LIBRARY_PATH:

$ export LD_LIBRARY_PATH=/home/danbev/work/ai/whisper-work/build-install/lib/:$LD_LIBRARY_PATH

After that it should be possible to run using the following command:

$ ./ffmpeg -i gb1.wav -loglevel quiet -af parakeet=model=ggml-parakeet-tdt-0.6b-v3.bin:use_gpu=1:destination=- -f null -
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 11903 MiB):
  Device 0: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes, VRAM: 11903 MiB
load_backend: loaded CUDA backend from /home/danbev/work/ai/whisper-work/build-install/lib/libggml-cuda.so
load_backend: loaded CPU backend from /home/danbev/work/ai/whisper-work/build-install/lib/libggml-cpu-alderlake.so
My fellow Americans, this day has brought terrible news and great sadness to our country. At nine o'clock this morning, mission control in Houston lost contact with our space shuttle Columbia. A short time later, debris was seen falling from the skies above Texas. The Columbia's lost. There are no survivors. On board was a crew of seven Colonel Rick Husband, Lieutenant Colonel Michael Anderson, Commander Laurel Clark, Captain David Brown, Commander William McCool, Dr. Kulpna Shavla, and Ilan Ramon, a colonel in the Israeli Air Force. These men and women assumed great risk in the service to all humanity. In an age when spaceflight has come to seem almost routine, it is easy to overlook the dangers of travel by rocket and the difficulties of navigating the fierce outer atmosphere of the Earth. Because of their courage and daring and idealism, we will miss them all the more. All Americans today are thinking as well of the families of these men and women who have been given this sudden shock and grief. You're not alone. Our entire nation grieves with you, and those you love will always have the respect and gratitude of this country. The cause in which they died will continue. Mankind is led into the darkness beyond our world by the inspiration of discovery and the longing to understand. Our journey into space will go on. In the skies today, we saw destruction and tragedy. Yet farther than we can see, there is comfort and hope. In the words of the prophet Isaiah, lift your eyes and look to the heavens. Who created all these? He who brings out the starry hosts one by one and calls them each by name, because of his great power and mighty strength, not one of them is missing. The crew of the shuttle Columbia did not return safely to Earth. Yet we can pray that all are safely home. May God bless the grieving families, and may God continue to bless America.

#23517 has been opened for this integration.

ramkrishna2910 · 2026-04-21T21:33:38Z

This would be a great addition! Looking forward to it!

… ci]

…[no ci] This commit removes the generation of the relative positional tensor in the model conversion script and instead computes it in the encoder graph. This is only done for the window of positions required for the current audio sample. This was suggested in the mtmd integration of parakeet and the same approach is used there.

This is to enable librispeech testing which will be enabled in a follow up commit.

The result from running the tests was: ```console $ cat parakeet-tdt-0.6b-v3.txt WER: 1.96% ```

…no ci] Remove hardcoded build-cuda-89-release and just use build like whisper.cpp does.

This commit updates the parkeet requirements that are out of date as I've ben using a virtual environment on linux/mac that contains torch and numpy. This also fixes the reading of the model configuration which was failing on window.

SuperPauly · 2026-05-07T00:21:55Z

LGTM, Is anyone about to review and merge?

danbev · 2026-05-07T03:45:24Z

LGTM, Is anyone about to review and merge?

Thanks for the review. I still have a few things to sort out but I hope to be able to merge this early next week. I was a bit quick on moving this from draft in hindsight.

This commit adds a function to reset the parakeet state that can be resused instead of duplicating code. It also resets the lstm state which was not done by parakeet_full leading to incorrect transcriptions when called multiple times

danbev · 2026-06-10T13:32:22Z

Running ./tests/run-tests.sh parakeet-f16 I noticed the following:

There is a complete sentence missing. I thought this was a bug in our implementation but the original python model also produces the same output as we do.

I've double checked this by cloning https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3 again just to be sure I was not using a stale version or an issue with my local environment, but it does not include this sentence either. I also tried https://github.com/mudler/parakeet.cpp/ as well and it produces the same output as the original model too (so all three models produce the same output from this audio sample).

Looking into this a little more it looks like this is caused by the greedy selection, and perhaps using beam search would fix this. I'll take a closer look at that.

Currently this is using a WHISPER_DEBUG macro instead of PARAKEET_DEBUG.

This should perhaps be integrated into scripts/quantize.all but if need I'll do that in a follow up PR.

This also removed the '$' prompt from the examples so the commands can just be copied in the UI.

This commit adds a persistent tensor for the encoders output and performs a ggml_cpy instead of setting a tensor as output and then copying that to the host, and later setting a slice of this as input to the joint network. The motivation for this change is to avoid a D2H copy of one frame from enc_out, and then a H2D copy of that frame into the joint graphs encoder input. Now, the joint graph can simply use a view into the encoders outputs persistent tensor.

This commit removes the setting of token_embd as a graph input. The motivation for this is that this caused the scheduler to place it on the last (CPU) backend regardless of where its weights live. This split the prediction graph across CPU/CUDA on every call. Removing the flag lets the GET_ROWS op run on CUDA with the rest of the graph (n_splits 2 -> 1), cutting predict time from ~207ms to ~70ms. I've kept the timeing for now as there might be other improvements that now might make a difference.

This commit folds the input-to-hidden and hidden-to-hidden bias tensors into a single tensor at model conversion time. This enables us to preform a single ggml_add operations instead of two in the LSTM layers. It also merges the LSTM gates (input, forget, cell, and output) into a single tensor at model conversion time. The same idea as above is to reduce the number of sigmoid operations to one instead of three in the LSTM layers.

This commit updates the dummy test model with the folding of tensors which I forgot about in the previous commit.

TomTheWise · 2026-06-18T19:36:50Z

Hi, is Server Planed too or can it already be wrapped by the existing whisper-server for /v1/ API over http?

danbev · 2026-06-22T11:31:50Z

Hi, is Server Planed too or can it already be wrapped by the existing whisper-server for /v1/ API over http?

We have not added support to whisper-server or provided a separate implementation. While there is some overlap there are also many differences and perhaps a separate parakeet-server would be better (and extract common functionality into a server-common.{h, cpp} perhaps). But there is nothing planned at the moment.

…lbacks to the class to manage | Removed parakeetcpp since it was merged to master branch of whispercpp (see ggml-org/whisper.cpp#3735) | added cpm usage using FetchCPM.cmake to avoid the same dependency being fetched multiple times - Does cause problems with version-control, but this is the least I can do for now

danbev force-pushed the parakeet-support branch from 7a8fa90 to 9e9c5a9 Compare April 8, 2026 08:45

parakeet : add support for NVIDIA Parakeet

ad6274f

danbev force-pushed the parakeet-support branch from 3d04340 to ad6274f Compare April 16, 2026 12:24

danbev marked this pull request as ready for review April 16, 2026 12:39

danbev added 2 commits April 20, 2026 12:40

ci : add -DGGML_NATIVE=OFF to windows job

661c9e2

ci : add GGML_BMI2=OFF to window job

1e0c5dd

bhargav191098 reviewed Apr 21, 2026

View reviewed changes

Comment thread examples/parakeet-cli/README.md Outdated

examples : set flash attention to false for parakeet-cli [no ci]

1ed636a

danbev added 5 commits April 22, 2026 14:19

parakeet : initialize ggml_tensors pointers to nullptr [no ci]

ac39626

parakeet : remove non relavant fields in parakeet_state [no ci]

05ffa91

parakeet : fix indentation in default params [no ci]

b899ce0

parakeet : check and free ggml_backend_sched_t [no-ci]

9fe3b5c

parakeet : group related types and helpers [no ci]

ceaa8bb

danbev mentioned this pull request Apr 29, 2026

mtmd : add Nemotron 3 Nano Omni support (parakeet) ggml-org/llama.cpp#22520

Open

danbev added 10 commits April 30, 2026 16:03

parakeet : remove parakeet_full_parallel() API and implementation [no…

6e61988

… ci]

parakeet : remove unused timeing fields [no ci]

66e6b09

examples : print system info and timings for parakeet-cli [no ci]

42dcf19

parakeet : add missing free of batch.i_time [no ci]

2f9216f

examples : add --output-txt and --output-file to parakeet-cli [no ci]

27cc5d7

examples : add --no-prints to parakeet-cli

93806f4

This is to enable librispeech testing which will be enabled in a follow up commit.

tests : add script to benchmark parakeet.cpp on LibriSpeech [no ci]

3f6b17c

The result from running the tests was: ```console $ cat parakeet-tdt-0.6b-v3.txt WER: 1.96% ```

squash! tests : add script to benchmark parakeet.cpp on LibriSpeech […

d8b74e4

…no ci] Remove hardcoded build-cuda-89-release and just use build like whisper.cpp does.

parakeet : enable model conversion on win [no ci]

3fa1f69

This commit updates the parkeet requirements that are out of date as I've ben using a virtual environment on linux/mac that contains torch and numpy. This also fixes the reading of the model configuration which was failing on window.

danbev added 2 commits May 7, 2026 15:01

parakeet : add parakeet_reset_state function [no ci]

0c4f4ba

This commit adds a function to reset the parakeet state that can be resused instead of duplicating code. It also resets the lstm state which was not done by parakeet_full leading to incorrect transcriptions when called multiple times

examples : reuse context in parakeet-cli [no ci]

cb611a4

tests : update run-tests.sh to support parakeet models [no ci]

f55f7eb

danbev added 6 commits June 10, 2026 15:59

parakeet : fix parakeet_log_callback_default [no ci]

c1d57fa

Currently this is using a WHISPER_DEBUG macro instead of PARAKEET_DEBUG.

parakeet : clean up of local attention [no ci]

94c226b

parakeet : clarify stride-shift code comment [no ci]

210c05e

scripts : add quantize-parakeet.sh script [no ci]

804771c

This should perhaps be integrated into scripts/quantize.all but if need I'll do that in a follow up PR.

parakeet : fix skip list in parakeet-quantize [no ci]

b0946f4

scripts : use Q8_0 as the example in parakeet model card [no ci]

9fef588

This also removed the '$' prompt from the examples so the commands can just be copied in the UI.

ggerganov reviewed Jun 12, 2026

View reviewed changes

Comment thread tests/run-tests.sh

danbev added 2 commits June 12, 2026 10:45

scripts : add ggml- prefix to HF models [no ci]

81d1469

parakeet : add prediction timing and fix sample time [no ci]

572a798

ggerganov approved these changes Jun 12, 2026

View reviewed changes

Comment thread src/parakeet.cpp

Comment thread src/parakeet.cpp

danbev added 6 commits June 14, 2026 05:49

parakeet : rename whisper_backend_init_gpu

eb8f824

parakeet : avoid copying pred_h as input to joint network

80e68a8

parakeet : update for-tests-ggml-parakeet-dtd.bin

7c06be7

This commit updates the dummy test model with the folding of tensors which I forgot about in the previous commit.

ggerganov reviewed Jun 16, 2026

View reviewed changes

Comment thread src/parakeet.cpp Outdated

danbev added 4 commits June 16, 2026 11:48

parakeet : correct packed gates comment [no ci]

e11a11e

parakeet : use ggml-org HF org [no ci]

b05d775

Merge remote-tracking branch 'upstream/master' into parakeet-support

17888ae

Merge remote-tracking branch 'upstream/master' into parakeet-support

8cc7881

danbev merged commit 9efddaf into ggml-org:master Jun 16, 2026
46 checks passed

justynleung mentioned this pull request Jun 22, 2026

Feature: Add streaming support for parakeet #3900

Closed

freddy311082 mentioned this pull request Jun 30, 2026

QVAC-21582 — Pull latest from upstream whisper.cpp (v1.9.1) tetherto/qvac-ext-lib-whisper.cpp#73

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

parakeet : add support for NVIDIA Parakeet#3735

parakeet : add support for NVIDIA Parakeet#3735
danbev merged 80 commits into
ggml-org:masterfrom
danbev:parakeet-support

danbev commented Apr 1, 2026 •

edited

Loading

Uh oh!

danbev commented Apr 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

ramkrishna2910 commented Apr 21, 2026

Uh oh!

SuperPauly commented May 7, 2026

Uh oh!

danbev commented May 7, 2026

Uh oh!

danbev commented Jun 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TomTheWise commented Jun 18, 2026

Uh oh!

danbev commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Uh oh!

Conversation

danbev commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danbev commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ramkrishna2910 commented Apr 21, 2026

Uh oh!

SuperPauly commented May 7, 2026

Uh oh!

danbev commented May 7, 2026

Uh oh!

danbev commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TomTheWise commented Jun 18, 2026

Uh oh!

danbev commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

danbev commented Apr 1, 2026 •

edited

Loading

danbev commented Apr 4, 2026 •

edited

Loading

danbev commented Jun 10, 2026 •

edited

Loading