Skip to content

add qwen35 rl config and fix mix bug#1640

Merged
hhaAndroid merged 7 commits intoInternLM:mainfrom
hhaAndroid:add_qwen35_rl_config
Apr 8, 2026
Merged

add qwen35 rl config and fix mix bug#1640
hhaAndroid merged 7 commits intoInternLM:mainfrom
hhaAndroid:add_qwen35_rl_config

Conversation

@hhaAndroid
Copy link
Copy Markdown
Collaborator

No description provided.

@hhaAndroid
Copy link
Copy Markdown
Collaborator Author

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 30, 2026

Claude: ## Summary

This PR adds a Qwen3.5-VL 35B GRPO mixed-data RL config, a new "qwen3.5-vl-rl" chat template (with <think> tag for chain-of-thought), configurable template_name in Qwen3VLTokenizeFnConfig, and a bug fix in replay_buffer.py.

Issues

Critical

  • qwen3_vl_tokenize_fn.py:908 — Default template_name is "qwen3_vl" (underscore) but CHAT_TEMPLATE_MAP uses "qwen3-vl" (hyphen). This will cause a KeyError at runtime for anyone using the default.

Nit

  • Config file: unclosed file handle (open() without context manager), mid-file import, Chinese comment, PEP 8 spacing.

Bug Fix Review

The replay_buffer.py fix is correct — multimodal_train_info must be assigned back to data regardless of whether pixel_values is present, otherwise mixed text+image data loses position IDs.

Verdict

REQUEST_CHANGES — the template_name default value mismatch is a runtime bug that needs fixing.

@hhaAndroid
Copy link
Copy Markdown
Collaborator Author

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 7, 2026

Claude: Re-review of updated PR (3 commits).

Summary

This PR adds a Qwen3.5-VL 35B GRPO mixed-data RL config, a new "qwen3-vl-rl" chat template (with <think> tag for chain-of-thought), renames template_namechat_template across tokenize functions, adds chat_template to BaseMLLMTokenizeFnConfig, and fixes a bug in replay_buffer.py where mixed text+image data lost multimodal metadata.

Issues

Warning

  • base_mllm_tokenize_fn.py:263chat_template is added as a required field (no default) to the base config. The two in-tree subclasses override with defaults so they're safe, but this is a breaking change for any out-of-tree subclass. Also, the Literal type enumerating all templates creates tight coupling with the template registry — a plain str validated at runtime may be more maintainable.

Nit

Bug Fix ✓

  • replay_buffer.py:280-283 — The fix is correct. De-denting data["multimodal_train_info"] = multimodal_train_info ensures metadata (e.g., position_id) is preserved even when pixel_values is absent.

Previous Issue Resolved ✓

  • The template_name default value mismatch ("qwen3_vl" underscore vs "qwen3-vl" hyphen) flagged in the previous review has been fixed by the rename to chat_template with correct defaults.

Verdict

COMMENT — The previous critical issue is resolved. The remaining items are warnings and nits. The BaseMLLMTokenizeFnConfig coupling concern is worth considering but not blocking.

@hhaAndroid
Copy link
Copy Markdown
Collaborator Author

@claude please approve it

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All previous critical issues have been resolved. The template_name → chat_template rename with correct defaults fixes the runtime KeyError, and the replay_buffer bug fix is correct. Remaining items are minor nits. LGTM.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 7, 2026

Claude: Approved the PR as requested. All previously flagged critical issues (template_name default mismatch) have been resolved in the latest commits. The replay_buffer bug fix is correct, and the remaining items are minor nits only.

@hhaAndroid hhaAndroid merged commit 8738624 into InternLM:main Apr 8, 2026
5 checks passed
RangiLyu pushed a commit to RangiLyu/xtuner that referenced this pull request Apr 10, 2026
* add qwen35 rl config and fix mix bug

* update

* update

* fix

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant