Skip to content

Fix AudioFlamingo3/MusicFlamingo HF parity and RoTE handling#37643

Merged
DarkLight1337 merged 11 commits into
vllm-project:mainfrom
lashahub:mf
Mar 23, 2026
Merged

Fix AudioFlamingo3/MusicFlamingo HF parity and RoTE handling#37643
DarkLight1337 merged 11 commits into
vllm-project:mainfrom
lashahub:mf

Conversation

@lashahub
Copy link
Copy Markdown
Contributor

@lashahub lashahub commented Mar 20, 2026

Summary

This PR fixes and completes the AudioFlamingo3 (AF3) / MusicFlamingo (MF) implementation in vLLM so it aligns closely with the current Hugging Face reference behavior; AF3 was originally introduced in vLLM in #30539.

Revisits #35522.

Why this PR is needed

The current MF support in vLLM came from two earlier PRs:

Those changes helped partially, but they did not fully implement MusicFlamingo correctly.

The core problem is that MusicFlamingo was still effectively routed through the AF3 path. That was enough to make the model appear supported, and later to make it load more cleanly, but not enough to match the actual HF processor/model semantics.

What was incomplete before

#32696

#32696 introduced a thin wrapper-style MusicFlamingo adapter, but it did not implement the MF-specific behavior that exists in the HF reference, including:

  • rote_timestamps
  • MF-specific prompt expansion
  • audio BOS/EOS boundary tokens
  • rotary time embedding (RoTE) application in the MF model path
  • MF-specific rope_parameters / head_dim handling

It also added a dummy audio_tower.pos_emb.freqs compatibility parameter to make MF checkpoints load, which was a workaround rather than a faithful implementation.

#35522 / #35535

#35522 reported the loading failure:

ValueError: Following weights were not initialized from checkpoint: {'audio_tower.pos_emb.freqs'}

That issue was real, but it was a symptom of the earlier wrapper design.

#35535 removed that immediate loading failure, which was useful, but it still did not implement the actual MF semantics. It improved loading, not full correctness.

What this PR changes

AudioFlamingo3

  • fixes placeholder/token-length derivation to match HF behavior
  • removes stale compatibility logic tied to the old MF wrapper approach
  • keeps the shared audio processing path cleaner and reusable

MusicFlamingo

  • replaces the thin wrapper approach with a real MF implementation
  • uses rote_timestamps correctly
  • applies RoTE in the MF model path after the shared audio encoder and before projection
  • uses MF BOS/EOS audio boundary tokens in prompt replacement
  • handles MF rotary config via rope_parameters and head_dim

Validation

This PR adds and runs the missing regression coverage:

  • AF3 processing tests
  • MF processing tests
  • HF-parity numerical checks for AF3 and MF audio-feature handling
  • AF3 generation tests against nvidia/audio-flamingo-3-hf
  • MF generation tests against nvidia/music-flamingo-2601-hf
  • single-vs-batched generation consistency checks for both models

Upstream alignment

MusicFlamingo is currently in the final stages of being merged into Transformers in huggingface/transformers#43538:

huggingface/transformers#43538

Small follow-up refinements may still be needed if upstream lands minor changes before final merge, but this PR is already very close to the current HF reference and is intended to be the correct baseline for AF3/MF support in vLLM.

Takeaway

#32696 and #35535 addressed parts of the problem, but they did not fully implement MusicFlamingo correctly.

This PR replaces that partial state with:

  • a correct AF3 implementation
  • a real MusicFlamingo implementation instead of a thin AF3 alias
  • regression coverage to keep both models correct going forward

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 20, 2026

Documentation preview: https://vllm--37643.org.readthedocs.build/en/37643/

@mergify mergify Bot added documentation Improvements or additions to documentation multi-modality Related to multi-modality (#4194) labels Mar 20, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive support for the MusicFlamingo model, including its integration into the vLLM framework, updated documentation, and dedicated test suites. The changes involve adding MusicFlamingoForConditionalGeneration to the list of supported models, implementing its specific multimodal processing logic (including rotary embeddings and audio token handling), and creating new test fixtures and generation tests to validate its functionality. Additionally, the AudioFlamingo3 model's processing and testing were refined, with updates to expected outputs, dummy data generation, and the introduction of helper functions for output assertion, ensuring consistency and correctness across both audio models. The min_transformers_version for MusicFlamingo was also updated to reflect its requirements.

Comment thread tests/models/multimodal/generation/test_audioflamingo3.py
Comment thread tests/models/multimodal/processing/test_musicflamingo.py Outdated
Comment thread vllm/model_executor/models/audioflamingo3.py Outdated
Comment thread vllm/model_executor/models/audioflamingo3.py
Comment thread vllm/model_executor/models/musicflamingo.py Outdated
Copy link
Copy Markdown
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left some initial comments

@DarkLight1337
Copy link
Copy Markdown
Member

Thanks, I don't have any more concerns. Can you fix pre-commit though?

@lashahub
Copy link
Copy Markdown
Contributor Author

Thanks, I don't have any more concerns. Can you fix pre-commit though?

I reran pre-commit run -a on the current branch locally and it passes cleanly now.

@DarkLight1337
Copy link
Copy Markdown
Member

Sorry it still fails

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
@lashahub
Copy link
Copy Markdown
Contributor Author

DCO passes now.
pre-commit / pre-run-check fails due to Error: PR must have the 'ready' label or the author must have at least 4 merged PRs (found 1).

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 22, 2026
@DarkLight1337
Copy link
Copy Markdown
Member

Sorry for the confusion, let's see if it passes after I add ready label

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
@DarkLight1337 DarkLight1337 merged commit e7767ec into vllm-project:main Mar 23, 2026
58 checks passed
RhizoNymph pushed a commit to RhizoNymph/vllm that referenced this pull request Mar 26, 2026
…oject#37643)

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026
…oject#37643)

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
…oject#37643)

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
nithinvc pushed a commit to nithinvc/vllm that referenced this pull request Mar 27, 2026
…oject#37643)

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>

Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>
JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026
…oject#37643)

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
@lashahub lashahub deleted the mf branch April 4, 2026 22:34
mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
…oject#37643)

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
@khluu
Copy link
Copy Markdown
Member

khluu commented Apr 11, 2026

Hi there, just wondering: why does this model require minimum transformers 5.3.0? https://github.com/vllm-project/vllm/pull/37643/changes#diff-c2cd72327248d1c1aa3d4b29ec9e47314d9893bfeff94e927841cd640fac84c1R755

@lashahub
Copy link
Copy Markdown
Contributor Author

Hi @khluu, Music Flamingo was actually released after vLLM integration with transformers v5.5.0. It will be updated in #39011

@khluu
Copy link
Copy Markdown
Member

khluu commented Apr 11, 2026

Thanks! There's a function call in a test that is missing input_ids which seems to be new arg introduced in transformers v5.5.0. I'm skipping it for now but lmk once you merge the change to add it back:
#30566 (comment)

mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026
…oject#37643)

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…oject#37643)

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…oject#37643)

Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants