mtmd : add Nemotron 3 Nano Omni support (parakeet) by danbev · Pull Request #22520 · ggml-org/llama.cpp

danbev · 2026-04-29T14:36:59Z

Overview

This commit adds support for the subsampling and encoder part of Nemotron Nemo 3 omni model.

Additional information

The Parakeet subsampling/encoder were taken from parakeet.cpp which is currently a pull request against whisper.cpp. I've tried to copy the code as close as possible to hopefully enable easy patching between these two project later.

Refs: ggml-org/whisper.cpp#3735

I have read and agree with the contributing guidelines
AI usage disclosure: No

For testing a converted model can be found here and can be run using the following command:

llama-mtmd-cli -hf danbev/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16-mtmd-GGUF --no-warmup --audio jfk.wav -p "Transcribe this audio clip, only the trancription and nothing else."

This commit adds support for the subsampling and encoder part of Nemotron Nemo 3 omni model. The Parakeet subsampling/encoder were taken from parakeet.cpp which is currently a pull request against whisper.cpp. I've tried to copy the code a close as possible to hopefully enable easy patching between the these two project later. Refs: ggml-org/whisper.cpp#3735

ngxson

looks good, I'm leaving some early-review comments

This commit removes the generation of the relative positional tensor in the model conversion script and instead computes it in the encoder graph. This is only done for the window of positions required for the current audio sample.

This commit adds a function to get access to the clip_model. It also removes the two functions clip_get_mel_filter_tensor, and clip_get_window_tensor(const struct clip_ctx * ctx) which can now use clip_get_model to access the model tensors that it needs.

ngxson

looking good so far

…tmd-audio

…tmd-audio [no ci]

…tmd-audio

This commit updates the parakeet code in mtmd to reflect the latest updates to parakeet.cpp in whisper.cpp. A follow up commit will address the currently hardcoded dw_pad and see if we can add n_conv_kernel as a model metadata field.

This commit updates the model conversion to read the conv_kernel_size field from the sound_config section of the models config.json file. It then uses this field instead of the hardcoded values in parakeet.cpp.

…tmd-audio

ngxson · 2026-06-18T13:56:40Z

side question: is it possible to use the 4th dm of the input as batch dim? (provided that all inputs are the same size - no padding or masking is need)

that may allow batching support in the future, but it's optional

I'm not sure but I'll take a look 👍

correction: I meant 3rd dim; input tensor shape is: [nx, ny, n_batch]

I've looked into this and this should be possible with some changes. I'd be happy to make these changes in a follow up PR when we add batch support, and I'll try to also update parakeet.cpp then to keep both as aligned as possible.

CISC · 2026-06-18T18:48:56Z

@danbev Glad midsommar! :)

danbev · 2026-06-19T04:16:27Z

Glad midsommar! :)

@CISC Tack! 😃

ngxson reviewed Apr 29, 2026

View reviewed changes

Comment thread convert_hf_to_gguf.py Outdated

Comment thread tools/mtmd/mtmd-audio.cpp Outdated

github-actions Bot added examples python python script changes labels Apr 29, 2026

ngxson reviewed Apr 30, 2026

View reviewed changes

Comment thread tools/mtmd/clip.h Outdated

ngxson reviewed Apr 30, 2026

View reviewed changes

Comment thread tools/mtmd/mtmd-audio.cpp Outdated

danbev added 2 commits April 30, 2026 14:50

mtmd : read mel_filters and window into hparams

8e279f4

ngxson reviewed Apr 30, 2026

View reviewed changes

Comment thread tools/mtmd/mtmd-audio.cpp Outdated

Comment thread tools/mtmd/clip.cpp Outdated

danbev added 14 commits May 1, 2026 10:16

mtmd : use set_input_f32 lambda [no ci]

ffd1b99

mtmd : add better asserts for mel_filters and hann window [no ci]

8af100f

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

b5a35e0

…tmd-audio

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

7ed9294

…tmd-audio

mtmd : add missing size_t cast

49658ba

mtmd : change type of pad to size_t

9a8398e

mtmd : zero initialize samples_padded

6ba52fc

mtmd : remove unsued ctx member from parakeet preprocessor

385b2d4

mtmd : make log_mel_spectrogram_parakeet_worker_thread private static

cef7ff7

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

681a199

…tmd-audio

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

44cb51f

…tmd-audio

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

0cd9e16

…tmd-audio

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

78e28f4

…tmd-audio [no ci]

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

96b1326

…tmd-audio

danbev self-assigned this Jun 2, 2026

danbev added 5 commits June 4, 2026 12:50

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

656437b

…tmd-audio

Merge branch 'upstream/master' into nemotron-3-omni-mtmd-audio

79e1dba

mtmd : add audio_conv_kernel_size to model conversion

5b741d1

This commit updates the model conversion to read the conv_kernel_size field from the sound_config section of the models config.json file. It then uses this field instead of the hardcoded values in parakeet.cpp.

mtmd : cleanup [no ci]

4f8882b

danbev marked this pull request as ready for review June 18, 2026 03:41

danbev requested review from a team and CISC as code owners June 18, 2026 03:41

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

01a1f58

…tmd-audio

CISC approved these changes Jun 18, 2026

View reviewed changes

Comment thread conversion/nemotron.py Outdated

danbev and others added 3 commits June 18, 2026 15:14

conversion : call super().filter_tensors [no ci]

c1d465f

Merge remote-tracking branch 'upstream/master' into nemotron-3-omni-m…

e49c091

…tmd-audio

do not discard result of super filter_tensors

3378340

ngxson reviewed Jun 18, 2026

View reviewed changes

danbev added 3 commits June 18, 2026 19:50

mtmd : use build_mm instead of ggml_mul_mat

816d776

mtmd : use build_ffn

882c9b7

mtmd : move and reuse get_vector lambda

79baf6c

ngxson reviewed Jun 18, 2026

View reviewed changes

Comment thread tools/mtmd/clip.cpp Outdated

danbev added 4 commits June 22, 2026 07:07

mtmd : use build_inp_raw for parakeet

8835abe

mtmd : throw exception in get_scalar instead of assert

602c218

mtmd : fix std::min call

2ab3beb

mtmt : use .c_str in throw clause in get_vector

1337d74

Uh oh!

Conversation

danbev commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

danbev Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

danbev Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CISC commented Jun 18, 2026

Uh oh!

Uh oh!

danbev commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

danbev commented Apr 29, 2026 •

edited

Loading