add qwen35 rl config and fix mix bug by hhaAndroid · Pull Request #1640 · InternLM/xtuner

hhaAndroid · 2026-03-27T11:17:08Z

No description provided.

…en35_rl_config

hhaAndroid · 2026-03-30T08:01:30Z

@claude review

xtuner/v1/datasets/mllm_tokenize_fn/qwen3_vl_tokenize_fn.py

examples/v1/config/rl_qwen3p5_vl_35B_grpo_mixdata.py

claude · 2026-03-30T08:03:58Z

Claude: ## Summary

This PR adds a Qwen3.5-VL 35B GRPO mixed-data RL config, a new "qwen3.5-vl-rl" chat template (with <think> tag for chain-of-thought), configurable template_name in Qwen3VLTokenizeFnConfig, and a bug fix in replay_buffer.py.

Issues

Critical

qwen3_vl_tokenize_fn.py:908 — Default template_name is "qwen3_vl" (underscore) but CHAT_TEMPLATE_MAP uses "qwen3-vl" (hyphen). This will cause a KeyError at runtime for anyone using the default.

Nit

Config file: unclosed file handle (open() without context manager), mid-file import, Chinese comment, PEP 8 spacing.

Bug Fix Review

The replay_buffer.py fix is correct — multimodal_train_info must be assigned back to data regardless of whether pixel_values is present, otherwise mixed text+image data loses position IDs.

Verdict

REQUEST_CHANGES — the template_name default value mismatch is a runtime bug that needs fixing.

…en35_rl_config

hhaAndroid · 2026-04-07T12:44:05Z

@claude review

xtuner/v1/data_proto/templates/__init__.py

xtuner/v1/datasets/mllm_tokenize_fn/base_mllm_tokenize_fn.py

examples/v1/config/rl_qwen3p5_vl_35B_grpo_mixdata.py

xtuner/v1/ray/dataflow/replay_buffer.py

claude · 2026-04-07T12:49:04Z

Claude: Re-review of updated PR (3 commits).

Summary

This PR adds a Qwen3.5-VL 35B GRPO mixed-data RL config, a new "qwen3-vl-rl" chat template (with <think> tag for chain-of-thought), renames template_name → chat_template across tokenize functions, adds chat_template to BaseMLLMTokenizeFnConfig, and fixes a bug in replay_buffer.py where mixed text+image data lost multimodal metadata.

Issues

Warning

base_mllm_tokenize_fn.py:263 — chat_template is added as a required field (no default) to the base config. The two in-tree subclasses override with defaults so they're safe, but this is a breaking change for any out-of-tree subclass. Also, the Literal type enumerating all templates creates tight coupling with the template registry — a plain str validated at runtime may be more maintainable.

Nit

rl_qwen3p5_vl_35B_grpo_mixdata.py:1-8 — Import ordering (PEP 8): import json should be grouped with stdlib imports.
rl_qwen3p5_vl_35B_grpo_mixdata.py:74 — Unclosed file handle (open() without context manager). Suggest with open(...) as f: json.load(f).
templates/__init__.py:57 — The qwen3-vl-rl template forces <think> mode and carries unused tool fields — a brief comment explaining the design choice would help future readers.

Bug Fix ✓

replay_buffer.py:280-283 — The fix is correct. De-denting data["multimodal_train_info"] = multimodal_train_info ensures metadata (e.g., position_id) is preserved even when pixel_values is absent.

Previous Issue Resolved ✓

The template_name default value mismatch ("qwen3_vl" underscore vs "qwen3-vl" hyphen) flagged in the previous review has been fixed by the rename to chat_template with correct defaults.

Verdict

COMMENT — The previous critical issue is resolved. The remaining items are warnings and nits. The BaseMLLMTokenizeFnConfig coupling concern is worth considering but not blocking.

hhaAndroid · 2026-04-07T12:53:49Z

@claude please approve it

claude

All previous critical issues have been resolved. The template_name → chat_template rename with correct defaults fixes the runtime KeyError, and the replay_buffer bug fix is correct. Remaining items are minor nits. LGTM.

claude · 2026-04-07T12:54:53Z

Claude: Approved the PR as requested. All previously flagged critical issues (template_name default mismatch) have been resolved in the latest commits. The replay_buffer bug fix is correct, and the remaining items are minor nits only.

* add qwen35 rl config and fix mix bug * update * update * fix * fix

hhaAndroid added 3 commits March 27, 2026 11:16

add qwen35 rl config and fix mix bug

e91e498

Merge branch 'main' of https://github.com/InternLM/xtuner into add_qw…

f2b62be

…en35_rl_config

update

ee28e2e

claude bot reviewed Mar 30, 2026

View reviewed changes

xtuner/v1/datasets/mllm_tokenize_fn/qwen3_vl_tokenize_fn.py Outdated Show resolved Hide resolved

claude bot reviewed Mar 30, 2026

View reviewed changes

examples/v1/config/rl_qwen3p5_vl_35B_grpo_mixdata.py Show resolved Hide resolved

claude bot reviewed Mar 30, 2026

View reviewed changes

examples/v1/config/rl_qwen3p5_vl_35B_grpo_mixdata.py Show resolved Hide resolved

claude bot reviewed Mar 30, 2026

View reviewed changes

examples/v1/config/rl_qwen3p5_vl_35B_grpo_mixdata.py Show resolved Hide resolved

claude bot reviewed Mar 30, 2026

View reviewed changes

examples/v1/config/rl_qwen3p5_vl_35B_grpo_mixdata.py Show resolved Hide resolved

hhaAndroid added 2 commits April 7, 2026 12:27

Merge branch 'main' of https://github.com/InternLM/xtuner into add_qw…

9f85c6f

…en35_rl_config

update

86c8208

claude bot reviewed Apr 7, 2026

View reviewed changes

xtuner/v1/data_proto/templates/__init__.py Show resolved Hide resolved

claude bot reviewed Apr 7, 2026

View reviewed changes

xtuner/v1/datasets/mllm_tokenize_fn/base_mllm_tokenize_fn.py Outdated Show resolved Hide resolved

claude bot reviewed Apr 7, 2026

View reviewed changes

examples/v1/config/rl_qwen3p5_vl_35B_grpo_mixdata.py Outdated Show resolved Hide resolved

claude bot reviewed Apr 7, 2026

View reviewed changes

examples/v1/config/rl_qwen3p5_vl_35B_grpo_mixdata.py Show resolved Hide resolved

claude bot reviewed Apr 7, 2026

View reviewed changes

xtuner/v1/ray/dataflow/replay_buffer.py Show resolved Hide resolved

fix

11c14c9

claude bot approved these changes Apr 7, 2026

View reviewed changes

fix

c0c31f4

hhaAndroid merged commit 8738624 into InternLM:main Apr 8, 2026
5 checks passed

RangiLyu pushed a commit to RangiLyu/xtuner that referenced this pull request Apr 10, 2026

add qwen35 rl config and fix mix bug (InternLM#1640)

198d406

* add qwen35 rl config and fix mix bug * update * update * fix * fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add qwen35 rl config and fix mix bug#1640

add qwen35 rl config and fix mix bug#1640
hhaAndroid merged 7 commits intoInternLM:mainfrom
hhaAndroid:add_qwen35_rl_config

hhaAndroid commented Mar 27, 2026

Uh oh!

hhaAndroid commented Mar 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Mar 30, 2026

Uh oh!

hhaAndroid commented Apr 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Apr 7, 2026

Uh oh!

hhaAndroid commented Apr 7, 2026

Uh oh!

claude bot left a comment

Uh oh!

claude bot commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hhaAndroid commented Mar 27, 2026

Uh oh!

hhaAndroid commented Mar 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Mar 30, 2026

Issues

Critical

Nit

Bug Fix Review

Verdict

Uh oh!

hhaAndroid commented Apr 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Apr 7, 2026

Summary

Issues

Warning

Nit

Bug Fix ✓

Previous Issue Resolved ✓

Verdict

Uh oh!

hhaAndroid commented Apr 7, 2026

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude bot commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant