Skip to content

fix: skip qwen3_5_text checkpoint remap for nested VL language_model#45256

Open
zozo123 wants to merge 2 commits intohuggingface:mainfrom
zozo123:fix/qwen3-5-save-pretrained-prefix
Open

fix: skip qwen3_5_text checkpoint remap for nested VL language_model#45256
zozo123 wants to merge 2 commits intohuggingface:mainfrom
zozo123:fix/qwen3-5-save-pretrained-prefix

Conversation

@zozo123
Copy link
Copy Markdown

@zozo123 zozo123 commented Apr 5, 2026

Summary

When saving a Qwen3.5 VL model via save_pretrained, the revert_weight_conversion for qwen3_5_text replaces a leading model. segment. This wrongly matches keys that already start with model.language_model. on composite VL models, duplicating the language_model prefix in the saved safetensors keys.

Fixes #45216

Changes

  • In get_model_conversion_mapping(), detect when a qwen3_5_text submodule is the nested model.language_model trunk in a VL model
  • Skip the text remap for that submodule to prevent prefix duplication during revert_weight_conversion
  • For qwen3_5_moe_text inside MoE VL models, apply only qwen2_moe conversions
  • Add regression test verifying save_pretrained does not produce keys with triple-nested language_model segments

Testing

  • New test: test_save_pretrained_no_triple_nested_language_model_prefix in test_modeling_qwen3_5.py
  • Saves a Qwen3_5ForConditionalGeneration model and asserts no key contains language_model.language_model.language_model

Built autonomously by islo.dev

…trunk

When saving a Qwen3.5 VL model via save_pretrained, the
revert_weight_conversion for qwen3_5_text replaces a leading "model."
segment. This wrongly matches keys that already start with
"model.language_model." on composite VL models, duplicating the
language_model prefix in the saved safetensors keys.

Skip the qwen3_5_text remap when the submodule is the nested
model.language_model trunk. For qwen3_5_moe_text inside MoE VL models,
apply only the qwen2_moe conversions.

Add a regression test verifying save_pretrained does not produce keys
with triple-nested language_model segments.

Fixes huggingface#45216
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 5, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_5

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 5, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45256&sha=ec19be

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Regression] Qwen3.5 saved checkpoint is not correct with save_pretrained API since version 5.4.0

2 participants