[fix] remove dead code, fix typos, and eliminate unnecessary GPU-CPU syncs by choiszt · Pull Request #150 · EvolvingLMMs-Lab/lmms-engine

choiszt · 2026-04-01T04:10:57Z

Motivation

During code review, identified several issues that hurt training performance and code quality:

Unnecessary .item() calls causing GPU→CPU synchronization barriers in the forward pass
Dead code (unused variables, duplicate function definitions, redundant assignments)
A typo (torch.torch.int32) and a suboptimal comparison pattern

Modifications

Eliminate unnecessary GPU-CPU syncs

qwen2_5_vl_liger.py: Removed 4 unused variables (tokens_count, n_image_tokens, n_video_tokens, visual_tokens)
that each called .item(), causing GPU-CPU sync every forward pass
qwen2_5_vl_ops.py: Removed 3 unused variables (bsz, q_len, kv_seq_len) in attn_forward, where
torch.max(position_ids).item() also triggered GPU-CPU sync

Remove dead code

qwen3_vl_liger.py: Removed redundant hidden_states = outputs[0] (already assigned on the line above)
sequence_packing_utils.py: Removed duplicate _get_unpad_data function definition (identical copy)

Fix typo and optimize comparison

sequence_packing_utils.py: Fixed torch.torch.int32 → torch.int32 in _unpad_input
qwen3_vl_ops.py: Changed layer_idx in range(len(deepstack_visual_embeds)) → layer_idx < len(deepstack_visual_embeds)
(logically equivalent, avoids creating a range object each iteration)

Verification

Verified with actual GPU training on 2x A100 (FSDP2 + Liger + rmpad):

Model	Size	Steps	Result
Qwen2.5-VL-3B	3.75B	10	✅ Training completed without errors
Qwen3-VL-4B	4.44B	10	✅ Training completed without errors

## Checklist

Follow commit message convention
Format code with black and isort
Verified with GPU training (Qwen2.5-VL + Qwen3-VL)
No functional changes — only dead code removal, typo fix, and perf improvement

…yncs

Copilot

Pull request overview

This PR improves training performance and code quality by removing dead code and avoiding unnecessary GPU→CPU synchronization points, while also fixing a dtype typo and simplifying a small comparison.

Changes:

Removed unused variables (and associated .item() calls) that introduced GPU→CPU syncs in Qwen2.5-VL and Qwen2.5-VL attention/forward paths.
Removed redundant/dead assignments and a duplicate helper function definition.
Fixed an invalid dtype reference (torch.torch.int32 → torch.int32) and simplified a layer-index bounds check.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`src/lmms_engine/models/sequence_packing_utils.py`	Removes duplicate `_get_unpad_data` definition and fixes `torch.int32` dtype usage in `_unpad_input`.
`src/lmms_engine/models/qwen3_vl/qwen3_vl_ops.py`	Simplifies deepstack layer gating condition to avoid `range(...)` membership check.
`src/lmms_engine/models/qwen3_vl/qwen3_vl_liger.py`	Removes a redundant `hidden_states = outputs[0]` assignment.
`src/lmms_engine/models/qwen2_5_vl/qwen2_5_vl_ops.py`	Removes unused variables that computed `torch.max(position_ids).item()` (sync point) inside `attn_forward`.
`src/lmms_engine/models/qwen2_5_vl/qwen2_5_vl_liger.py`	Removes unused token counting variables that called `.item()` in `lce_forward`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fix: remove dead code, fix typos, and eliminate unnecessary GPU-CPU s…

61b8b3a

…yncs

choiszt requested review from Copilot and kcz358 April 1, 2026 04:12

Copilot started reviewing on behalf of choiszt April 1, 2026 04:12 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

kcz358 approved these changes Apr 1, 2026

View reviewed changes

kcz358 merged commit c10dbb9 into main Apr 1, 2026
7 checks passed

kcz358 deleted the dev/shuai branch April 1, 2026 06:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] remove dead code, fix typos, and eliminate unnecessary GPU-CPU syncs#150

[fix] remove dead code, fix typos, and eliminate unnecessary GPU-CPU syncs#150
kcz358 merged 1 commit intomainfrom
dev/shuai

choiszt commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

choiszt commented Apr 1, 2026

Motivation

Modifications

Eliminate unnecessary GPU-CPU syncs

Remove dead code

Fix typo and optimize comparison

Verification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants