Fix AudioFlamingo3/MusicFlamingo HF parity and RoTE handling by lashahub · Pull Request #37643 · vllm-project/vllm

lashahub · 2026-03-20T06:14:49Z

Summary

This PR fixes and completes the AudioFlamingo3 (AF3) / MusicFlamingo (MF) implementation in vLLM so it aligns closely with the current Hugging Face reference behavior; AF3 was originally introduced in vLLM in #30539.

Revisits #35522.

Why this PR is needed

The current MF support in vLLM came from two earlier PRs:

[Model][Multimodal] Add explicit MusicFlamingo adapter #32696 added an explicit MusicFlamingo adapter
[Bugfix] Fix loading Music Flamingo #35535 fixed the loading issue reported in [Bug]: Music Flamingo ValueError: Following weights were not initialized from checkpoint: {'audio_tower.pos_emb.freqs'} #35522

Those changes helped partially, but they did not fully implement MusicFlamingo correctly.

The core problem is that MusicFlamingo was still effectively routed through the AF3 path. That was enough to make the model appear supported, and later to make it load more cleanly, but not enough to match the actual HF processor/model semantics.

What was incomplete before

#32696

#32696 introduced a thin wrapper-style MusicFlamingo adapter, but it did not implement the MF-specific behavior that exists in the HF reference, including:

rote_timestamps
MF-specific prompt expansion
audio BOS/EOS boundary tokens
rotary time embedding (RoTE) application in the MF model path
MF-specific rope_parameters / head_dim handling

It also added a dummy audio_tower.pos_emb.freqs compatibility parameter to make MF checkpoints load, which was a workaround rather than a faithful implementation.

#35522 / #35535

#35522 reported the loading failure:

ValueError: Following weights were not initialized from checkpoint: {'audio_tower.pos_emb.freqs'}

That issue was real, but it was a symptom of the earlier wrapper design.

#35535 removed that immediate loading failure, which was useful, but it still did not implement the actual MF semantics. It improved loading, not full correctness.

What this PR changes

AudioFlamingo3

fixes placeholder/token-length derivation to match HF behavior
removes stale compatibility logic tied to the old MF wrapper approach
keeps the shared audio processing path cleaner and reusable

MusicFlamingo

replaces the thin wrapper approach with a real MF implementation
uses rote_timestamps correctly
applies RoTE in the MF model path after the shared audio encoder and before projection
uses MF BOS/EOS audio boundary tokens in prompt replacement
handles MF rotary config via rope_parameters and head_dim

Validation

This PR adds and runs the missing regression coverage:

AF3 processing tests
MF processing tests
HF-parity numerical checks for AF3 and MF audio-feature handling
AF3 generation tests against nvidia/audio-flamingo-3-hf
MF generation tests against nvidia/music-flamingo-2601-hf
single-vs-batched generation consistency checks for both models

Upstream alignment

MusicFlamingo is currently in the final stages of being merged into Transformers in huggingface/transformers#43538:

huggingface/transformers#43538

Small follow-up refinements may still be needed if upstream lands minor changes before final merge, but this PR is already very close to the current HF reference and is intended to be the correct baseline for AF3/MF support in vLLM.

Takeaway

#32696 and #35535 addressed parts of the problem, but they did not fully implement MusicFlamingo correctly.

This PR replaces that partial state with:

a correct AF3 implementation
a real MusicFlamingo implementation instead of a thin AF3 alias
regression coverage to keep both models correct going forward

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>

github-actions · 2026-03-20T06:14:57Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

mergify · 2026-03-20T06:15:28Z

Documentation preview: https://vllm--37643.org.readthedocs.build/en/37643/

gemini-code-assist

Code Review

This pull request introduces comprehensive support for the MusicFlamingo model, including its integration into the vLLM framework, updated documentation, and dedicated test suites. The changes involve adding MusicFlamingoForConditionalGeneration to the list of supported models, implementing its specific multimodal processing logic (including rotary embeddings and audio token handling), and creating new test fixtures and generation tests to validate its functionality. Additionally, the AudioFlamingo3 model's processing and testing were refined, with updates to expected outputs, dummy data generation, and the introduction of helper functions for output assertion, ensuring consistency and correctness across both audio models. The min_transformers_version for MusicFlamingo was also updated to reflect its requirements.

DarkLight1337

Thanks, left some initial comments

DarkLight1337 · 2026-03-21T05:07:55Z

Thanks, I don't have any more concerns. Can you fix pre-commit though?

lashahub · 2026-03-22T01:24:30Z

Thanks, I don't have any more concerns. Can you fix pre-commit though?

I reran pre-commit run -a on the current branch locally and it passes cleanly now.

DarkLight1337 · 2026-03-22T04:33:08Z

Sorry it still fails

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>

lashahub · 2026-03-22T05:12:21Z

DCO passes now.
pre-commit / pre-run-check fails due to Error: PR must have the 'ready' label or the author must have at least 4 merged PRs (found 1).

DarkLight1337 · 2026-03-22T05:24:52Z

Sorry for the confusion, let's see if it passes after I add ready label

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>

…oject#37643) Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>

…oject#37643) Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>

…oject#37643) Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>

khluu · 2026-04-11T01:03:49Z

Hi there, just wondering: why does this model require minimum transformers 5.3.0? https://github.com/vllm-project/vllm/pull/37643/changes#diff-c2cd72327248d1c1aa3d4b29ec9e47314d9893bfeff94e927841cd640fac84c1R755

lashahub · 2026-04-11T01:10:55Z

Hi @khluu, Music Flamingo was actually released after vLLM integration with transformers v5.5.0. It will be updated in #39011

khluu · 2026-04-11T01:12:27Z

Thanks! There's a function call in a test that is missing input_ids which seems to be new arg introduced in transformers v5.5.0. I'm skipping it for now but lmk once you merge the change to add it back:
#30566 (comment)

…oject#37643) Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>

lashahub added 4 commits January 12, 2026 11:18

Add MusicFlamingo

bb8965b

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>

Update MusicFlamingo

60470ca

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>

Add MusicFlamingo

3602e48

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>

Restore AF3 and add proper MusicFlamingo support

044596e

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>

lashahub requested review from DarkLight1337 and ywang96 as code owners March 20, 2026 06:14

mergify Bot added documentation Improvements or additions to documentation multi-modality Related to multi-modality (#4194) labels Mar 20, 2026

gemini-code-assist Bot reviewed Mar 20, 2026

View reviewed changes