parakeet : add support for NVIDIA Parakeet#3735
Conversation
ffmpeg --enable-parakeet instructionsTo try this out we need to first checkout this PRs branch: $ git clone -b parakeet-support https://github.com/danbev/whisper.cpp.gitThen we build and install the parakeet library to a directory named $ cat build-install.sh
#!/bin/bash
set -e
build_dir=build
install_dir=build-install
rm -rf ${install_dir}
mkdir -p ${install_dir}
cmake -S . -B ${build_dir} -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/home/danbev/work/ai/whisper-work/${install_dir} \
-DGGML_BACKEND_DIR=/home/danbev/work/ai/whisper-work/${install_dir}/lib \
-DBUILD_SHARED_LIBS=ON \
-DGGML_USE_CPU=ON \
-DGGML_CPU_ALL_VARIANTS=ON \
-DWHISPER_ALL_WARNINGS=ON \
-DWHISPER_FATAL_WARNINGS=ON \
-DGGML_BACKEND_DL=ON \
-DGGML_CUDA=ON \
-DCMAKE_CUDA_ARCHITECTURES="89-real" \
-DGGML_CPU_AARCH64=OFF \
-DGGML_CUDA_F16=ON
cmake --build ${build_dir} -j 8
cmake --install ${build_dir} --prefix ${install_dir}Then we need to check out the following FFmpeg branch: $ git clone -b parakeet.cpp https://code.ffmpeg.org/danbev/FFmpeg.gitAnd then build FFmpeg using the following configuration options and we explicitly $ export PKG_CONFIG_PATH="/home/danbev/work/ai/whisper-work/build-install/lib/pkgconfig${PKG_CONFIG_PATH:+:$PKG_CONFIG_PATH}"
$ ./configure --prefix=/usr --enable-version3 --disable-shared --enable-gpl \
--enable-nonfree --enable-static --enable-pthreads --enable-filters \
--enable-openssl --enable-runtime-cpudetect --enable-libvpx --enable-libx264 \
--enable-libx265 --enable-libspeex --enable-libfreetype --enable-fontconfig \
--enable-libzimg --enable-libvorbis --enable-libwebp --enable-libfribidi \
--enable-libharfbuzz --enable-libass --enable-whisper --enable-parakeet
$ makeTo run we need to set $ export LD_LIBRARY_PATH=/home/danbev/work/ai/whisper-work/build-install/lib/:$LD_LIBRARY_PATHAfter that it should be possible to run using the following command: $ ./ffmpeg -i gb1.wav -loglevel quiet -af parakeet=model=ggml-parakeet-tdt-0.6b-v3.bin:use_gpu=1:destination=- -f null -
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 11903 MiB):
Device 0: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes, VRAM: 11903 MiB
load_backend: loaded CUDA backend from /home/danbev/work/ai/whisper-work/build-install/lib/libggml-cuda.so
load_backend: loaded CPU backend from /home/danbev/work/ai/whisper-work/build-install/lib/libggml-cpu-alderlake.so
My fellow Americans, this day has brought terrible news and great sadness to our country. At nine o'clock this morning, mission control in Houston lost contact with our space shuttle Columbia. A short time later, debris was seen falling from the skies above Texas. The Columbia's lost. There are no survivors. On board was a crew of seven Colonel Rick Husband, Lieutenant Colonel Michael Anderson, Commander Laurel Clark, Captain David Brown, Commander William McCool, Dr. Kulpna Shavla, and Ilan Ramon, a colonel in the Israeli Air Force. These men and women assumed great risk in the service to all humanity. In an age when spaceflight has come to seem almost routine, it is easy to overlook the dangers of travel by rocket and the difficulties of navigating the fierce outer atmosphere of the Earth. Because of their courage and daring and idealism, we will miss them all the more. All Americans today are thinking as well of the families of these men and women who have been given this sudden shock and grief. You're not alone. Our entire nation grieves with you, and those you love will always have the respect and gratitude of this country. The cause in which they died will continue. Mankind is led into the darkness beyond our world by the inspiration of discovery and the longing to understand. Our journey into space will go on. In the skies today, we saw destruction and tragedy. Yet farther than we can see, there is comfort and hope. In the words of the prophet Isaiah, lift your eyes and look to the heavens. Who created all these? He who brings out the starry hosts one by one and calls them each by name, because of his great power and mighty strength, not one of them is missing. The crew of the shuttle Columbia did not return safely to Earth. Yet we can pray that all are safely home. May God bless the grieving families, and may God continue to bless America.#23517 has been opened for this integration. |
3d04340 to
ad6274f
Compare
|
This would be a great addition! Looking forward to it! |
…[no ci] This commit removes the generation of the relative positional tensor in the model conversion script and instead computes it in the encoder graph. This is only done for the window of positions required for the current audio sample. This was suggested in the mtmd integration of parakeet and the same approach is used there.
This is to enable librispeech testing which will be enabled in a follow up commit.
The result from running the tests was: ```console $ cat parakeet-tdt-0.6b-v3.txt WER: 1.96% ```
…no ci] Remove hardcoded build-cuda-89-release and just use build like whisper.cpp does.
This commit updates the parkeet requirements that are out of date as I've ben using a virtual environment on linux/mac that contains torch and numpy. This also fixes the reading of the model configuration which was failing on window.
|
LGTM, Is anyone about to review and merge? |
Thanks for the review. I still have a few things to sort out but I hope to be able to merge this early next week. I was a bit quick on moving this from draft in hindsight. |
This commit adds a function to reset the parakeet state that can be resused instead of duplicating code. It also resets the lstm state which was not done by parakeet_full leading to incorrect transcriptions when called multiple times
|
Running There is a complete sentence missing. I thought this was a bug in our implementation but the original python model also produces the same output as we do. I've double checked this by cloning https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3 again just to be sure I was not using a stale version or an issue with my local environment, but it does not include this sentence either. I also tried https://github.com/mudler/parakeet.cpp/ as well and it produces the same output as the original model too (so all three models produce the same output from this audio sample). Looking into this a little more it looks like this is caused by the greedy selection, and perhaps using beam search would fix this. I'll take a closer look at that. |
Currently this is using a WHISPER_DEBUG macro instead of PARAKEET_DEBUG.
This should perhaps be integrated into scripts/quantize.all but if need I'll do that in a follow up PR.
This also removed the '$' prompt from the examples so the commands can just be copied in the UI.
This commit adds a persistent tensor for the encoders output and performs a ggml_cpy instead of setting a tensor as output and then copying that to the host, and later setting a slice of this as input to the joint network. The motivation for this change is to avoid a D2H copy of one frame from enc_out, and then a H2D copy of that frame into the joint graphs encoder input. Now, the joint graph can simply use a view into the encoders outputs persistent tensor.
This commit removes the setting of token_embd as a graph input. The motivation for this is that this caused the scheduler to place it on the last (CPU) backend regardless of where its weights live. This split the prediction graph across CPU/CUDA on every call. Removing the flag lets the GET_ROWS op run on CUDA with the rest of the graph (n_splits 2 -> 1), cutting predict time from ~207ms to ~70ms. I've kept the timeing for now as there might be other improvements that now might make a difference.
This commit folds the input-to-hidden and hidden-to-hidden bias tensors into a single tensor at model conversion time. This enables us to preform a single ggml_add operations instead of two in the LSTM layers. It also merges the LSTM gates (input, forget, cell, and output) into a single tensor at model conversion time. The same idea as above is to reduce the number of sigmoid operations to one instead of three in the LSTM layers.
This commit updates the dummy test model with the folding of tensors which I forgot about in the previous commit.
|
Hi, is Server Planed too or can it already be wrapped by the existing whisper-server for /v1/ API over http? |
We have not added support to whisper-server or provided a separate implementation. While there is some overlap there are also many differences and perhaps a separate parakeet-server would be better (and extract common functionality into a server-common.{h, cpp} perhaps). But there is nothing planned at the moment. |
…lbacks to the class to manage | Removed parakeetcpp since it was merged to master branch of whispercpp (see ggml-org/whisper.cpp#3735) | added cpm usage using FetchCPM.cmake to avoid the same dependency being fetched multiple times - Does cause problems with version-control, but this is the least I can do for now
…lbacks to the class to manage | Removed parakeetcpp since it was merged to master branch of whispercpp (see ggml-org/whisper.cpp#3735) | added cpm usage using FetchCPM.cmake to avoid the same dependency being fetched multiple times - Does cause problems with version-control, but this is the least I can do for now

This is a work in progress to support the Parakeet model.
Usage instructions can be found in examples/parakeet-cli.