EvolvingLMMs-Lab
diff --git a/‎docs/index.rst‎
Lines changed: 6 additions & 0 deletions b/‎docs/index.rst‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎docs/troubleshoot/index.rst‎
Lines changed: 9 additions & 0 deletions b/‎docs/troubleshoot/index.rst‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs/troubleshoot/transformers_5_migration.md‎
Lines changed: 129 additions & 0 deletions b/‎docs/troubleshoot/transformers_5_migration.md‎
Lines changed: 129 additions & 0 deletions
diff --git a/‎examples/qwen3_5/example_config.yaml‎
Lines changed: 166 additions & 0 deletions b/‎examples/qwen3_5/example_config.yaml‎
Lines changed: 166 additions & 0 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 1 addition & 1 deletion b/‎pyproject.toml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/lmms_engine/datasets/processor/aero_processor.py‎
Lines changed: 14 additions & 6 deletions b/‎src/lmms_engine/datasets/processor/aero_processor.py‎
Lines changed: 14 additions & 6 deletions
@@ -62,6 +62,12 @@ Welcome to the LMMs Engine documentation! LMMs Engine is a flexible and extensib
    models/qwen3_moe
    models/qwen3_omni_moe
 
+.. toctree::
+   :maxdepth: 1
+   :caption: Troubleshooting
+
+   troubleshoot/index
+
 Indices and tables
 ==================
 
 
@@ -0,0 +1,9 @@
+Troubleshooting
+===============
+
+Common issues and solutions for LMMs Engine.
+
+.. toctree::
+   :maxdepth: 2
+
+   transformers_5_migration
@@ -0,0 +1,129 @@
+# Transformers 5.0 Migration Guide
+
+This guide helps you migrate to transformers 5.0 while maintaining backward compatibility with older models.
+
+## Overview
+
+LMMs Engine now supports `transformers >= 5.0` while maintaining backward compatibility with `transformers 4.x`. This enables training with the latest models like Qwen3.5 while preserving support for existing models.
+
+## Compatibility Matrix
+
+| Model Family | transformers < 5.0 | transformers >= 5.0 | Minimum Version |
+|-------------|-------------------|---------------------|-----------------|
+| Qwen2.5-VL | ✅ | ✅ | - |
+| Qwen3-VL | ✅ | ✅ | - |
+| Qwen3 | ✅ | ✅ | - |
+| **Qwen3.5** | ❌ | ✅ | **>= 5.3.0** |
+| LLaVA-OneVision1.5 | ✅ | ❌ | < 5.0.0 |
+| DLLM models (DreamDLLM, Qwen3DLLM, LLaDADLLM) | ✅ | ❌ | < 5.0.0 |
+
+## Installation
+
+### For Qwen3.5 Training (New Feature)
+
+Qwen3.5 requires transformers 5.3.0 or higher:
+
+```bash
+pip install "transformers>=5.3.0"
+```
+
+Or with uv:
+
+```bash
+uv pip install "transformers>=5.3.0"
+```
+
+### For Legacy Models (LLaVA-OneVision1.5, DLLM)
+
+If you need to use LLaVA-OneVision1.5 or DLLM models, install transformers 4.x:
+
+```bash
+pip install "transformers<5.0.0"
+```
+
+Or with uv:
+
+```bash
+uv pip install "transformers<5.0.0"
+```
+
+## Verified Compatibilities
+
+The following models have been tested and verified:
+
+### Tested with transformers >= 5.0
+- ✅ **Qwen2.5-VL** - Fully compatible
+- ✅ **Qwen3-VL** - Fully compatible  
+- ✅ **Qwen3** - Fully compatible
+
+### Tested with transformers < 5.0
+- ✅ **Qwen2.5-VL** - Fully compatible
+- ✅ **Qwen3-VL** - Fully compatible
+- ✅ **Qwen3** - Fully compatible
+- ✅ **LLaVA-OneVision1.5** - Only compatible with < 5.0
+- ✅ **DLLM models** - Only compatible with < 5.0
+
+## How It Works
+
+LMMs Engine automatically detects your transformers version and:
+
+1. **With transformers >= 5.0**: Loads Qwen3.5 and all compatible models. Legacy models (LLaVA-OneVision1.5, DLLM) are excluded from imports.
+
+2. **With transformers < 5.0**: Loads all legacy models. Qwen3.5 is not available.
+
+The version check is performed at import time using `is_transformers_version_greater_or_equal_to()`.
+
+## Troubleshooting
+
+### Error: "Module not found" for Qwen3.5
+
+**Symptom**: Trying to use Qwen3.5 but getting import errors.
+
+**Solution**: Qwen3.5 requires transformers >= 5.3.0. Install the correct version:
+
+```bash
+pip install "transformers>=5.3.0"
+```
+
+### Error: "Module not found" for LLaVA-OneVision1.5 or DLLM
+
+**Symptom**: Trying to use LLaVA-OneVision1.5 or DLLM models but they're not available.
+
+**Solution**: These models are incompatible with transformers >= 5.0. Downgrade to transformers 4.x:
+
+```bash
+pip install "transformers<5.0.0"
+```
+
+### Error: ImportError when importing models
+
+**Symptom**: `ImportError` or `ModuleNotFoundError` when importing specific models.
+
+**Solution**: Check your transformers version and consult the compatibility matrix above. Ensure you're using the correct transformers version for your target model.
+
+## Implementation Details
+
+The compatibility is implemented through conditional imports in `src/lmms_engine/models/__init__.py`:
+
+```python
+from lmms_engine.utils.import_utils import is_transformers_version_greater_or_equal_to
+
+is_transformers_5 = is_transformers_version_greater_or_equal_to("5.0.0")
+
+# Models that work with both versions are always imported
+from .qwen2_5_vl import apply_liger_kernel_to_qwen2_5_vl
+from .qwen3_vl import apply_liger_kernel_to_qwen3_vl
+from .qwen3 import apply_liger_kernel_to_qwen3
+
+# Models only compatible with transformers < 5.0
+if not is_transformers_5:
+    from .llava_onevision1_5 import LLaVAOneVision1_5_ForConditionalGeneration
+    from .dream_dllm import DreamDLLMForMaskedLM
+    # ... other legacy models
+```
+
+## Related Resources
+
+- [Qwen-VL Training Guide](../models/qwenvl.md)
+- [Data Preparation Guide](../user_guide/data_prep.md)
+- [Training Configuration](../getting_started/train.md)
@@ -0,0 +1,166 @@
+trainer_type: fsdp2_trainer
+dataset_config:
+  extra_kwargs: {}
+  dataset_type: qwen3_vl_iterable
+  dataset_format: yaml
+  processor_config:
+    processor_name: Qwen/Qwen3-VL-8B-Instruct
+    processor_type: qwen3_vl
+  dataset_path: data/video/debug.yaml
+  datasets: null
+  shuffle: true
+  eval_dataset_path: null
+  object_storage: none
+  bucket_name: null
+  packing: false
+  packing_strategy: first_fit
+  packing_length: 51200
+  filter_overlong: true
+  filter_overlong_workers: 8
+  max_length: null
+  video_sampling_strategy: fps
+  video_max_pixels: 50176
+  video_max_frames: 512
+  frame_num: 64
+  fps: 1
+  video_backend: qwen_vl_utils
+trainer_args:
+  output_dir: ./output/qwen3_5_training
+  do_train: false
+  do_eval: false
+  do_predict: false
+  eval_strategy: 'no'
+  prediction_loss_only: false
+  per_device_train_batch_size: 1
+  per_device_eval_batch_size: 8
+  gradient_accumulation_steps: 1
+  eval_accumulation_steps: null
+  eval_delay: 0
+  torch_empty_cache_steps: null
+  learning_rate: 0.0002
+  weight_decay: 0.0
+  adam_beta1: 0.9
+  adam_beta2: 0.999
+  adam_epsilon: 1.0e-08
+  max_grad_norm: 1.0
+  num_train_epochs: 1
+  max_steps: 1000
+  lr_scheduler_type: cosine
+  lr_scheduler_kwargs: {}
+  warmup_ratio: 0.1
+  warmup_steps: 0
+  log_level: passive
+  log_level_replica: warning
+  log_on_each_node: true
+  logging_dir: ./output/qwen3_5_training/runs
+  logging_strategy: steps
+  logging_first_step: false
+  logging_steps: 1
+  logging_nan_inf_filter: true
+  save_strategy: steps
+  save_steps: 1000
+  save_total_limit: 1
+  save_on_each_node: false
+  save_only_model: false
+  restore_callback_states_from_checkpoint: false
+  use_cpu: false
+  seed: 42
+  data_seed: null
+  bf16: true
+  fp16: false
+  bf16_full_eval: false
+  fp16_full_eval: false
+  tf32: null
+  local_rank: 0
+  ddp_backend: null
+  debug: []
+  dataloader_drop_last: false
+  eval_steps: null
+  dataloader_num_workers: 0
+  dataloader_prefetch_factor: null
+  run_name: qwen3_5_debug
+  disable_tqdm: false
+  remove_unused_columns: true
+  label_names: null
+  load_best_model_at_end: false
+  metric_for_best_model: null
+  greater_is_better: null
+  ignore_data_skip: false
+  fsdp: []
+  fsdp_config:
+    transformer_layer_cls_to_wrap:
+    - Qwen3_5DecoderLayer
+    reshard_after_forward: false
+    min_num_params: 0
+    xla: false
+    xla_fsdp_v2: false
+    xla_fsdp_grad_ckpt: false
+  accelerator_config:
+    split_batches: false
+    dispatch_batches: null
+    even_batches: true
+    use_seedable_sampler: true
+    non_blocking: false
+    gradient_accumulation_kwargs: null
+  parallelism_config: null
+  deepspeed: null
+  label_smoothing_factor: 0.0
+  optim: adamw_torch_fused
+  optim_args: null
+  length_column_name: length
+  report_to: []
+  project: huggingface
+  trackio_space_id: trackio
+  ddp_find_unused_parameters: null
+  ddp_bucket_cap_mb: null
+  ddp_broadcast_buffers: null
+  dataloader_pin_memory: true
+  dataloader_persistent_workers: false
+  skip_memory_metrics: true
+  push_to_hub: false
+  resume_from_checkpoint: null
+  hub_model_id: null
+  hub_strategy: every_save
+  hub_token: <HUB_TOKEN>
+  hub_private_repo: null
+  hub_always_push: false
+  hub_revision: null
+  gradient_checkpointing: true
+  gradient_checkpointing_kwargs: null
+  include_for_metrics: []
+  eval_do_concat_batches: true
+  auto_find_batch_size: false
+  full_determinism: false
+  ddp_timeout: 1800
+  torch_compile: false
+  torch_compile_backend: null
+  torch_compile_mode: null
+  include_num_input_tokens_seen: 'no'
+  neftune_noise_alpha: null
+  optim_target_modules: null
+  batch_eval_metrics: false
+  eval_on_start: false
+  use_liger_kernel: true
+  liger_kernel_config: null
+  eval_use_gather_object: false
+  average_tokens_across_devices: true
+  use_muon: false
+  freeze_modules: null
+  use_rmpad: true
+  fsdp2: true
+  sp_ulysses_degree: 1
+  reduce_dtype: bfloat16
+  output_dtype: bfloat16
+  print_batch_input_steps: 5
+  enable_profiler: false
+  profiler_config:
+    start_step: 1
+    end_step: 3
+model_config:
+  extra_kwargs: {}
+  load_from_pretrained_path: Qwen/Qwen3.5-VL-8B-Instruct
+  load_from_config: null
+  attn_implementation: flash_attention_2
+  overwrite_config: null
+  monkey_patch_kwargs: null
+extra_kwargs: null
@@ -16,7 +16,7 @@ license = { text = "Apache-2.0" }
 dependencies = [
     "datasets",
     "hf_transfer",
-    "transformers==4.57.1",
+    "transformers>=4.57.1",
     "accelerate",
     "pillow",
     "peft",
 
@@ -5,6 +5,7 @@
 from PIL import Image
 
 from lmms_engine.mapping_func import register_processor
+from lmms_engine.utils import DataUtilities
 
 from ...models.aero.processing_aero import AeroProcessor, AeroProcessorKwargs
 from .config import ProcessorConfig
@@ -19,6 +20,14 @@ def build(self):
         self.processor = self._build_processor()
         self.processor.chat_template = self.chat_template_no_system
 
+    @property
+    def special_tokens(self):
+        if not hasattr(self, "_special_tokens"):
+            self._special_tokens = DataUtilities.get_special_tokens(
+                self.processor.tokenizer, extra_tokens=["<|im_start|>", "<|im_end|>"]
+            )
+        return self._special_tokens
+
     def _build_processor(self):
         processor = AeroProcessor.from_pretrained(self.config.processor_name)
         return processor
@@ -88,9 +97,7 @@ def get_qwen_template_labels(
         system_message: str = "You are a helpful assistant",
         add_system_prompt: bool = True,
     ):
-        special_tokens = self.processor.tokenizer.additional_special_tokens
-        special_tokens.extend(["<|im_start|>", "<|im_end|>"])
-        unmask_tokens_idx = [self.processor.tokenizer.convert_tokens_to_ids(t) for t in special_tokens]
+        unmask_tokens_idx = [self.processor.tokenizer.convert_tokens_to_ids(t) for t in self.special_tokens]
         input_id, target = [], []
         # The purpose of start from is to record which mm token we are at. Supposing the format is interleaved
         # Then we need to record this so that the mm token can be expanded correctly per conversation
@@ -100,12 +107,13 @@ def get_qwen_template_labels(
         video_start_from = 0
 
         if add_system_prompt and hf_messages[0]["role"] != "system":
-            input_id += self.processor.tokenizer.apply_chat_template([{"role": "system", "content": system_message}])
+            input_id += DataUtilities.apply_chat_template(
+                self.processor, [{"role": "system", "content": [{"type": "text", "text": system_message}]}]
+            )
             target += [-100] * len(input_id)
         for message in hf_messages:
             role = message["role"]
-            # Cautions, qwen2_5 vl tokenizer wrap into a list
-            encode_id = self.processor.apply_chat_template([message], tokenize=True)[0]
+            encode_id = DataUtilities.apply_chat_template(self.processor, [message])
             if self.audio_token_id in encode_id:
                 encode_id, used_audio = self._expand_encode_id_audio_tokens(
                     encode_id, num_audio_tokens, audio_start_from