Skip to content

[fix] remove dead code, fix typos, and eliminate unnecessary GPU-CPU syncs#150

Merged
kcz358 merged 1 commit intomainfrom
dev/shuai
Apr 1, 2026
Merged

[fix] remove dead code, fix typos, and eliminate unnecessary GPU-CPU syncs#150
kcz358 merged 1 commit intomainfrom
dev/shuai

Conversation

@choiszt
Copy link
Copy Markdown
Contributor

@choiszt choiszt commented Apr 1, 2026

Motivation

During code review, identified several issues that hurt training performance and code quality:

  • Unnecessary .item() calls causing GPU→CPU synchronization barriers in the forward pass
  • Dead code (unused variables, duplicate function definitions, redundant assignments)
  • A typo (torch.torch.int32) and a suboptimal comparison pattern

Modifications

Eliminate unnecessary GPU-CPU syncs

  • qwen2_5_vl_liger.py: Removed 4 unused variables (tokens_count, n_image_tokens, n_video_tokens, visual_tokens)
    that each called .item(), causing GPU-CPU sync every forward pass
  • qwen2_5_vl_ops.py: Removed 3 unused variables (bsz, q_len, kv_seq_len) in attn_forward, where
    torch.max(position_ids).item() also triggered GPU-CPU sync

Remove dead code

  • qwen3_vl_liger.py: Removed redundant hidden_states = outputs[0] (already assigned on the line above)
  • sequence_packing_utils.py: Removed duplicate _get_unpad_data function definition (identical copy)

Fix typo and optimize comparison

  • sequence_packing_utils.py: Fixed torch.torch.int32torch.int32 in _unpad_input
  • qwen3_vl_ops.py: Changed layer_idx in range(len(deepstack_visual_embeds))layer_idx < len(deepstack_visual_embeds)
    (logically equivalent, avoids creating a range object each iteration)

Verification

Verified with actual GPU training on 2x A100 (FSDP2 + Liger + rmpad):

Model Size Steps Result
Qwen2.5-VL-3B 3.75B 10 ✅ Training completed without errors
Qwen3-VL-4B 4.44B 10 ✅ Training completed without errors
image ## Checklist
  • Follow commit message convention
  • Format code with black and isort
  • Verified with GPU training (Qwen2.5-VL + Qwen3-VL)
  • No functional changes — only dead code removal, typo fix, and perf improvement

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves training performance and code quality by removing dead code and avoiding unnecessary GPU→CPU synchronization points, while also fixing a dtype typo and simplifying a small comparison.

Changes:

  • Removed unused variables (and associated .item() calls) that introduced GPU→CPU syncs in Qwen2.5-VL and Qwen2.5-VL attention/forward paths.
  • Removed redundant/dead assignments and a duplicate helper function definition.
  • Fixed an invalid dtype reference (torch.torch.int32torch.int32) and simplified a layer-index bounds check.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/lmms_engine/models/sequence_packing_utils.py Removes duplicate _get_unpad_data definition and fixes torch.int32 dtype usage in _unpad_input.
src/lmms_engine/models/qwen3_vl/qwen3_vl_ops.py Simplifies deepstack layer gating condition to avoid range(...) membership check.
src/lmms_engine/models/qwen3_vl/qwen3_vl_liger.py Removes a redundant hidden_states = outputs[0] assignment.
src/lmms_engine/models/qwen2_5_vl/qwen2_5_vl_ops.py Removes unused variables that computed torch.max(position_ids).item() (sync point) inside attn_forward.
src/lmms_engine/models/qwen2_5_vl/qwen2_5_vl_liger.py Removes unused token counting variables that called .item() in lce_forward.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kcz358 kcz358 merged commit c10dbb9 into main Apr 1, 2026
7 checks passed
@kcz358 kcz358 deleted the dev/shuai branch April 1, 2026 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants