[ROCm][CI] Gate incompatible HF references on Transformers v5 by AndreasKaratzas · Pull Request #41532 · vllm-project/vllm

AndreasKaratzas · 2026-05-03T04:14:56Z

This updates generation-test metadata for models whose HF reference path is incompatible with the installed Transformers v5 runtime.

Test groups fixed:

mi355_1: Language Models Tests (Standard) for MiniCPM4.
mi355_1: Language Models Test (Extended Generation) for HyperCLOVAX.

MiniCPM4:

Fails in HF remote code because it imports is_torch_fx_available, removed from the installed Transformers v5 runtime.
Links the upstream compatibility request, which is closed as not planned: Removal of is_torch_fx_available in v5.0 breaks trust_remote_code models huggingface/transformers#44561

HyperCLOVAX:

The model's remote code imports ROPE_INIT_FUNCTIONS and unconditionally indexes ROPE_INIT_FUNCTIONS[self.rope_type]; for the default case, self.rope_type is "default": https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-14B/blob/main/modeling_hyperclovax.py
Transformers v5 docs still list "default" as a valid RoPE type. The incompatibility is narrower: v5 no longer exposes a "default" entry in ROPE_INIT_FUNCTIONS, while v4.57 did. In v5's own Llama code, default RoPE is handled by compute_default_rope_parameters; ROPE_INIT_FUNCTIONS is consulted only when rope_type != "default".
The v5 migration guide also notes that rotary embedding configuration now lives under config.rope_parameters, which matches the new v5 model path.

cc @kenroche

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request updates the model testing framework to include model revisions and trust_remote_code flags. It also introduces version constraints for specific models in the registry. Feedback was provided regarding a logic bug where adding a version reason causes tests to be skipped regardless of version validity, and a concern that the specified Transformers version range for MiniCPM4 appears to be incorrect or overly restrictive.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-05-04T00:08:05Z

cc @charlifu

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-05-05T00:07:21Z

Notes on the AudioFlamingo3 / MusicFlamingo changes:

Transformers 5.5 native processors now require a 1:1 mapping between text entries and audio entries. vLLM's processor path supports multiple audio items inside one prompt, so the AudioFlamingo3 multimodal processor now performs the HF processor's windowing, feature extraction, prompt expansion, and tokenization locally for this single-prompt multi-audio case.
MusicFlamingo's native processor no longer returns rote_timestamps. vLLM now derives the same rotary-time timestamps from chunk_counts and the encoded audio sequence length inside the model path, while still accepting rote_timestamps if an older processor provides them.
MusicFlamingo prompt expansion now includes the audio BOS/EOS tokens expected by its prompt replacement metadata.
transformers_version_reason now only applies when the installed version misses the configured min/max bound. This keeps the AudioFlamingo3/MusicFlamingo "Needs PR 43538" explanation for versions below 5.3.0 without skipping supported 5.5.3 runs.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

misunderstanding

eustlb

transformers releases
AudioFlamingo3: v5.0.0
MusicFlamingo3: v5.5.0

There should not be any reasons not to use super()._call_hf_processor. Looks like a lot of the confusion here comes from the fact that the vLLM PR was merged before the transformers one on which it depended, and that evolved after the vLLM one was merged.

Such fixes should be addressed in work already started in #39011, but happy to help if needed. @lashahub looping you in here

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…processing Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-05-21T22:18:11Z

I pushed some more changes (a lot of them actually). The main intent is to stop carrying vLLM-local copies of preprocessing that now belongs to the upstream HF processors. So vLLM had manual chunking, feature extraction, and audio-token expansion logic. With AudioFlamingo3 at transformers v5.0.0 and MusicFlamingo at v5.5.0, that logic should live in HF, while vLLM keeps only the adapter bookkeeping needed for batching, placeholder matching, and model execution.

In audioflamingo3.py, _call_hf_processor now calls super()._call_hf_processor(...). The deleted active-path code was duplicating HF behavior: manually splitting audio into chunks, calling the feature extractor directly, normalizing masks, expanding <sound> tokens, then calling the tokenizer. Current HF AudioFlamingo3Processor.__call__ already does those pieces.

The remaining vLLM logic after super() is:

Copy mm_data first so we do not mutate the caller's mapping with pop.
Map vLLM's "audios" alias to HF's "audio" argument.
Rename HF's input_features_mask to vLLM's feature_attention_mask.
Compute chunk_counts, because HF returns flattened audio chunks but vLLM still needs to regroup chunk embeddings back per original audio item.

So chunk_counts is not a replacement for HF preprocessing. It is vLLM batching metadata.

I also removed MusicFlamingo's _call_hf_processor because after the AudioFlamingo3 parent method was fixed, the MusicFlamingo override only duplicated parent behavior and recomputed chunk_counts. MusicFlamingo can inherit the shared AudioFlamingo3 adapter path.

Also regarding MusicFlamingo's _expand_audio_tokens, I think that current HF MusicFlamingoProcessor expands audio placeholders itself, including the MusicFlamingo-specific BOS/audio/EOS pattern. Keeping a second vLLM implementation risks drifting from HF again. vLLM still keeps MusicFlamingo prompt replacement in musicflamingo.py, because vLLM needs to match generated placeholder token IDs to multimodal embedding slots.

The partial_rotary_factor change in musicflamingo.py aligns with current transformers. The previous code used the full head dimension. HF computes dim = int(head_dim * partial_rotary_factor), so I think that vLLM was rotating the wrong dimensionality.

The RoPE forward change from flattened batch_positions to timestamp-derived window_positions also matches current HF. The old implementation made the rotary window axis depend on flattened chunk order. Current HF uses the actual window start timestamp, computes the audio window duration, then derives the window position from that. That matters for chunked audio and for matching single vs batched behavior.

The fp32 RoPE buffer restoration is there because vLLM model dtype/device application can cast non-persistent buffers to bf16. In the HF construction path we compare against, inv_freq and position_angles are built from torch.float, so they remain fp32 under the tested bf16 load path. These buffers feed trig functions; bf16 there is enough to create exact-output drift. _apply lets the module move to the right device, then recreates those RoPE buffers in fp32 on that device.

The cast in _encode_audio_features fixes the ROCm failure where processor output is fp32 but the audio tower conv weights/bias are bf16. ROCm/PyTorch requires input and bias dtypes to match for that conv path. Casting at model ingress is the right boundary: HF processors naturally produce float features, while the vLLM model owns execution dtype/device.

Also, for ROCm, the transcription was wrong yielding a weird "four-on-the-floor" with MusicFlamingo. Direct HF generation matched the original fixture, while vLLM's URL audio path produced a different transcription. The root cause was audio resampling: HF/librosa uses soxr-style resampling, while vLLM was using the default parser resampler. Switching only MusicFlamingo's parser to audio_resample_method="soxr" makes the input features match HF more closely without changing the global default for other models.

The MusicFlamingo generation test no longer skips missing fixtures because missing committed fixtures should be a test error, not a skip. The small warmup in test_musicflamingo.py is there because MI300/ROCm showed a first-inference exact-output drift during decoder kernel compilation; after a one-token warmup, the steady-state deterministic output matches.

cc @eustlb @hmellor Lmk what you guys think.

mergify · 2026-05-21T22:18:56Z

Hi @AndreasKaratzas, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

lashahub · 2026-05-22T20:20:20Z

Thanks for looping me in.

AF-Next should not be added as a separate vLLM architecture anymore. The current HF checkpoints use the existing MusicFlamingo architecture (model_type=musicflamingo, MusicFlamingoForConditionalGeneration).

I’ll wait for this PR to settle/land, then rebase #39011 on top and re-scope it to AF-Next checkpoint coverage through MusicFlamingo only. That should avoid duplicating the Flamingo processor/RoTE fixes here.

[ROCm][CI] Gate incompatible HF references on Transformers v5

009c271

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas marked this pull request as ready for review May 3, 2026 04:15

AndreasKaratzas requested review from DarkLight1337 and ywang96 as code owners May 3, 2026 04:15

claude Bot reviewed May 3, 2026

View reviewed changes

AndreasKaratzas added the rocm Related to AMD ROCm label May 3, 2026

github-project-automation Bot added this to AMD May 3, 2026

github-project-automation Bot moved this to Todo in AMD May 3, 2026

gemini-code-assist Bot reviewed May 3, 2026

View reviewed changes

Comment thread tests/models/registry.py

Comment thread tests/models/registry.py

[Bugfix] Fix transformers version reason gating logic

aa01552

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

This was referenced May 4, 2026

[CI Failure]: mi325_1: Language Models Test (Extended Generation) #41584

Open

[CI Failure]: mi355_1: Language Models Tests (Standard) #40645

Open

tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label May 4, 2026

AndreasKaratzas added 3 commits May 4, 2026 13:08

Merge remote-tracking branch 'origin/main' into akaratza_lang_mod_hf

8bbaf9b

Merge remote-tracking branch 'origin/main' into akaratza_lang_mod_hf

cddee18

[CI][Bugfix] Fix Flamingo processors with Transformers 5.5

dee7089

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

mergify Bot added the multi-modality Related to multi-modality (#4194) label May 5, 2026

AndreasKaratzas added 4 commits May 4, 2026 21:41

Merge remote-tracking branch 'origin/main' into akaratza_lang_mod_hf

fd065b0

[CI][Bugfix] Fix Flamingo processors with Transformers 5.5

27ce761

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Merge remote-tracking branch 'origin/main' into akaratza_lang_mod_hf

03e00e4

Merge remote-tracking branch 'origin/main' into akaratza_lang_mod_hf

eaad92c

This comment was marked as outdated.

Sign in to view

eustlb reviewed May 20, 2026

View reviewed changes

AndreasKaratzas added 2 commits May 21, 2026 16:51

Merge remote-tracking branch 'origin/main' into akaratza_lang_mod_hf

0307fc6

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[CI] Fix Flamingo3 HF processor alignment and MusicFlamingo audio pre…

37e7865

…processing Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas requested review from NickLucche and tjtanaa as code owners May 21, 2026 22:09

mergify Bot added the ci/build label May 21, 2026

AndreasKaratzas added 3 commits May 21, 2026 17:20

Merge remote-tracking branch 'origin/main' into akaratza_lang_mod_hf

94168b7

Merge remote-tracking branch 'origin/main' into akaratza_lang_mod_hf

1427df0

Align Flamingo3 processing with HF and fix audio limits

c319e8c

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][CI] Gate incompatible HF references on Transformers v5#41532

[ROCm][CI] Gate incompatible HF references on Transformers v5#41532
AndreasKaratzas wants to merge 14 commits into
vllm-project:mainfrom
ROCm:akaratza_lang_mod_hf

AndreasKaratzas commented May 3, 2026

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

AndreasKaratzas commented May 4, 2026

Uh oh!

AndreasKaratzas commented May 5, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

eustlb left a comment

Uh oh!

AndreasKaratzas commented May 21, 2026

Uh oh!

mergify Bot commented May 21, 2026

Uh oh!

lashahub commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

AndreasKaratzas commented May 3, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

AndreasKaratzas commented May 4, 2026

Uh oh!

AndreasKaratzas commented May 5, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

eustlb left a comment

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas commented May 21, 2026

Uh oh!

mergify Bot commented May 21, 2026

Uh oh!

lashahub commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants