Skip to content

parakeet : add support for NVIDIA Parakeet#3735

Merged
danbev merged 80 commits into
ggml-org:masterfrom
danbev:parakeet-support
Jun 16, 2026
Merged

parakeet : add support for NVIDIA Parakeet#3735
danbev merged 80 commits into
ggml-org:masterfrom
danbev:parakeet-support

Conversation

@danbev

@danbev danbev commented Apr 1, 2026

Copy link
Copy Markdown
Member

This is a work in progress to support the Parakeet model.


Usage instructions can be found in examples/parakeet-cli.

@danbev

danbev commented Apr 4, 2026

Copy link
Copy Markdown
Member Author
ffmpeg --enable-parakeet instructions

To try this out we need to first checkout this PRs branch:

$ git clone -b parakeet-support https://github.com/danbev/whisper.cpp.git

Then we build and install the parakeet library to a directory named build-install:

$ cat build-install.sh 
#!/bin/bash

set -e

build_dir=build
install_dir=build-install

rm -rf ${install_dir}
mkdir -p ${install_dir}

cmake -S . -B ${build_dir} -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/home/danbev/work/ai/whisper-work/${install_dir} \
    -DGGML_BACKEND_DIR=/home/danbev/work/ai/whisper-work/${install_dir}/lib \
    -DBUILD_SHARED_LIBS=ON \
    -DGGML_USE_CPU=ON \
    -DGGML_CPU_ALL_VARIANTS=ON \
    -DWHISPER_ALL_WARNINGS=ON \
    -DWHISPER_FATAL_WARNINGS=ON \
    -DGGML_BACKEND_DL=ON \
    -DGGML_CUDA=ON \
    -DCMAKE_CUDA_ARCHITECTURES="89-real" \
    -DGGML_CPU_AARCH64=OFF \
    -DGGML_CUDA_F16=ON

cmake --build ${build_dir} -j 8
cmake --install ${build_dir} --prefix ${install_dir}

Then we need to check out the following FFmpeg branch:

$ git clone -b parakeet.cpp https://code.ffmpeg.org/danbev/FFmpeg.git

And then build FFmpeg using the following configuration options and we explicitly
set PKG_CONFIG_PATH to point to the pkgconfig directory of the local
installation above:

$ export PKG_CONFIG_PATH="/home/danbev/work/ai/whisper-work/build-install/lib/pkgconfig${PKG_CONFIG_PATH:+:$PKG_CONFIG_PATH}"

$ ./configure --prefix=/usr --enable-version3 --disable-shared --enable-gpl \
  --enable-nonfree --enable-static --enable-pthreads --enable-filters \
  --enable-openssl --enable-runtime-cpudetect --enable-libvpx --enable-libx264 \
  --enable-libx265 --enable-libspeex --enable-libfreetype --enable-fontconfig \
  --enable-libzimg --enable-libvorbis --enable-libwebp --enable-libfribidi \
  --enable-libharfbuzz --enable-libass --enable-whisper --enable-parakeet

$ make

To run we need to set LD_LIBRARY_PATH to point to the lib directory of the local installation above so that the backends can be found at runtime. For macos this would instead be DYLD_LIBRARY_PATH:

$ export LD_LIBRARY_PATH=/home/danbev/work/ai/whisper-work/build-install/lib/:$LD_LIBRARY_PATH

After that it should be possible to run using the following command:

$ ./ffmpeg -i gb1.wav -loglevel quiet -af parakeet=model=ggml-parakeet-tdt-0.6b-v3.bin:use_gpu=1:destination=- -f null -
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 11903 MiB):
  Device 0: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes, VRAM: 11903 MiB
load_backend: loaded CUDA backend from /home/danbev/work/ai/whisper-work/build-install/lib/libggml-cuda.so
load_backend: loaded CPU backend from /home/danbev/work/ai/whisper-work/build-install/lib/libggml-cpu-alderlake.so
My fellow Americans, this day has brought terrible news and great sadness to our country. At nine o'clock this morning, mission control in Houston lost contact with our space shuttle Columbia. A short time later, debris was seen falling from the skies above Texas. The Columbia's lost. There are no survivors. On board was a crew of seven Colonel Rick Husband, Lieutenant Colonel Michael Anderson, Commander Laurel Clark, Captain David Brown, Commander William McCool, Dr. Kulpna Shavla, and Ilan Ramon, a colonel in the Israeli Air Force. These men and women assumed great risk in the service to all humanity. In an age when spaceflight has come to seem almost routine, it is easy to overlook the dangers of travel by rocket and the difficulties of navigating the fierce outer atmosphere of the Earth. Because of their courage and daring and idealism, we will miss them all the more. All Americans today are thinking as well of the families of these men and women who have been given this sudden shock and grief. You're not alone. Our entire nation grieves with you, and those you love will always have the respect and gratitude of this country. The cause in which they died will continue. Mankind is led into the darkness beyond our world by the inspiration of discovery and the longing to understand. Our journey into space will go on. In the skies today, we saw destruction and tragedy. Yet farther than we can see, there is comfort and hope. In the words of the prophet Isaiah, lift your eyes and look to the heavens. Who created all these? He who brings out the starry hosts one by one and calls them each by name, because of his great power and mighty strength, not one of them is missing. The crew of the shuttle Columbia did not return safely to Earth. Yet we can pray that all are safely home. May God bless the grieving families, and may God continue to bless America.

#23517 has been opened for this integration.

@danbev danbev force-pushed the parakeet-support branch from 7a8fa90 to 9e9c5a9 Compare April 8, 2026 08:45
@danbev danbev force-pushed the parakeet-support branch from 3d04340 to ad6274f Compare April 16, 2026 12:24
@danbev danbev marked this pull request as ready for review April 16, 2026 12:39
Comment thread examples/parakeet-cli/README.md Outdated
@ramkrishna2910

Copy link
Copy Markdown

This would be a great addition! Looking forward to it!

danbev added 10 commits April 30, 2026 16:03
…[no ci]

This commit removes the generation of the relative positional tensor in
the model conversion script and instead computes it in the encoder
graph. This is only done for the window of positions required for the
current audio sample.

This was suggested in the mtmd integration of parakeet and the same
approach is used there.
This is to enable librispeech testing which will be enabled in a follow
up commit.
The result from running the tests was:
```console
$ cat parakeet-tdt-0.6b-v3.txt
WER: 1.96%
```
…no ci]

Remove hardcoded build-cuda-89-release and just use build like
whisper.cpp does.
This commit updates the parkeet requirements that are out of date as
I've ben using a virtual environment on linux/mac that contains torch
and numpy.

This also fixes the reading of the model configuration which was failing
on window.
@SuperPauly

Copy link
Copy Markdown

LGTM, Is anyone about to review and merge?

@danbev

danbev commented May 7, 2026

Copy link
Copy Markdown
Member Author

LGTM, Is anyone about to review and merge?

Thanks for the review. I still have a few things to sort out but I hope to be able to merge this early next week. I was a bit quick on moving this from draft in hindsight.

danbev added 2 commits May 7, 2026 15:01
This commit adds a function to reset the parakeet state that can be
resused instead of duplicating code.

It also resets the lstm state which was not done by parakeet_full
leading to incorrect transcriptions when called multiple times
@danbev

danbev commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

Running ./tests/run-tests.sh parakeet-f16 I noticed the following:
gb1-transcription

There is a complete sentence missing. I thought this was a bug in our implementation but the original python model also produces the same output as we do.

I've double checked this by cloning https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3 again just to be sure I was not using a stale version or an issue with my local environment, but it does not include this sentence either. I also tried https://github.com/mudler/parakeet.cpp/ as well and it produces the same output as the original model too (so all three models produce the same output from this audio sample).

Looking into this a little more it looks like this is caused by the greedy selection, and perhaps using beam search would fix this. I'll take a closer look at that.

danbev added 6 commits June 10, 2026 15:59
Currently this is using a WHISPER_DEBUG macro instead of PARAKEET_DEBUG.
This should perhaps be integrated into scripts/quantize.all but if need
I'll do that in a follow up PR.
This also removed the '$' prompt from the examples so the commands can
just be copied in the UI.
Comment thread tests/run-tests.sh
Comment thread src/parakeet.cpp
Comment thread src/parakeet.cpp
danbev added 6 commits June 14, 2026 05:49
This commit adds a persistent tensor for the encoders output and
performs a ggml_cpy instead of setting a tensor as output and then
copying that to the host, and later setting a slice of this as input to
the joint network.

The motivation for this change is to avoid a D2H copy of one frame from
enc_out, and then a H2D copy of that frame into the joint graphs encoder
input. Now, the joint graph can simply use a view into the encoders
outputs persistent tensor.
This commit removes the setting of token_embd as a graph input.

The motivation for this is that this caused the scheduler to place it
on the last (CPU) backend regardless of where its weights live. This
split the prediction graph across CPU/CUDA on every call. Removing the
flag lets the GET_ROWS op run on CUDA with the rest of the graph
(n_splits 2 -> 1), cutting predict time from ~207ms to ~70ms.

I've kept the timeing for now as there might be other improvements that
now might make a difference.
This commit folds the input-to-hidden and hidden-to-hidden bias tensors
into a single tensor at model conversion time. This enables us to
preform a single ggml_add operations instead of two in the LSTM layers.

It also merges the LSTM gates (input, forget, cell, and output) into a
single tensor at model conversion time. The same idea as above is to
reduce the number of sigmoid operations to one instead of three in the
LSTM layers.
This commit updates the dummy test model with the folding of tensors
which I forgot about in the previous commit.
Comment thread src/parakeet.cpp Outdated
@danbev danbev merged commit 9efddaf into ggml-org:master Jun 16, 2026
46 checks passed
@TomTheWise

Copy link
Copy Markdown

Hi, is Server Planed too or can it already be wrapped by the existing whisper-server for /v1/ API over http?

@danbev

danbev commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

Hi, is Server Planed too or can it already be wrapped by the existing whisper-server for /v1/ API over http?

We have not added support to whisper-server or provided a separate implementation. While there is some overlap there are also many differences and perhaps a separate parakeet-server would be better (and extract common functionality into a server-common.{h, cpp} perhaps). But there is nothing planned at the moment.

inonitz added a commit to inonitz/sttserv that referenced this pull request Jun 27, 2026
…lbacks to the class to manage | Removed parakeetcpp since it was merged to master branch of whispercpp (see ggml-org/whisper.cpp#3735) | added cpm usage using FetchCPM.cmake to avoid the same dependency being fetched multiple times - Does cause problems with version-control, but this is the least I can do for now
inonitz added a commit to inonitz/sttserv that referenced this pull request Jun 27, 2026
…lbacks to the class to manage | Removed parakeetcpp since it was merged to master branch of whispercpp (see ggml-org/whisper.cpp#3735) | added cpm usage using FetchCPM.cmake to avoid the same dependency being fetched multiple times - Does cause problems with version-control, but this is the least I can do for now
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants