lmms-engine/docs/getting_started/train.md at main · EvolvingLMMs-Lab/lmms-engine

Train

To run training, prepare a YAML config. Below are two up-to-date examples that you can use as templates.

Following is an example config:

trainer_type: fsdp2_trainer

# Dataset configuration - now includes the actual dataset definitions
dataset_config:
  dataset_type: vision
  dataset_format: yaml # Uses 'yaml' format for both external files and inline definitions

  # Inline dataset definitions (no dataset_path needed)
  datasets:
    - path: data/open_thoughts_debug
      data_folder: ""
      data_type: arrow

  # Processor configuration
  processor_config:
    processor_name: "Qwen/Qwen2.5-VL-7B-Instruct"
    processor_type: "qwen2_5_vl"

  # Packing configuration
  packing: true
  packing_strategy: first_fit
  packing_length: 16384

# Model configuration
model_config:
  load_from_pretrained_path: "Qwen/Qwen2.5-VL-7B-Instruct"
  attn_implementation: "flash_attention_2"

# Training arguments, mostly compatible with HuggingFace Trainer
trainer_args:
  per_device_train_batch_size: 1
  learning_rate: 1.0e-06 # we should use 1.0 to makes YAML recognize it as a float
  weight_decay: 0.0
  gradient_accumulation_steps: 1
  gradient_checkpointing: true
  num_train_epochs: 1
  save_steps: 100
  save_total_limit: 1
  report_to: "wandb"
  output_dir: "./output/debug"
  warmup_ratio: 0.0
  run_name: "qwen2_5_vl_config"
  eval_strategy: "no"
  logging_steps: 1
  group_by_length: true
  dataloader_num_workers: 8
  bf16: true
  lr_scheduler_type: "cosine"
  freeze_modules: ["visual"]
  use_liger_kernel: true
  use_rmpad: true
  fsdp2: true
  fsdp_config:
    transformer_layer_cls_to_wrap: ["Qwen2_5_VLDecoderLayer"]
    reshard_after_forward: false

You can visit the config.py file under each subfolder to see what parameters are configurable

Key fields

trainer_type: Use hf_trainer for standard HF Trainer or fsdp2_trainer for PyTorch FSDP2.
dataset_config.dataset_format: yaml. You can either set dataset_path to an external YAML, or embed datasets inline via datasets.
datasets: Each entry defines path, optional data_folder, and data_type (e.g., arrow, parquet).
processor_config: Set processor_name (e.g., a Hugging Face model id) and processor_type (e.g., qwen2_5_vl).
packing: Enable sequence packing with packing: true, and adjust packing_strategy and packing_length. Use filter_overlong to drop samples exceeding limits.
video options: video_backend, video_sampling_strategy, video_max_pixels, video_max_frames control video preprocessing.
model_config: Prefer load_from_pretrained_path and set attn_implementation (e.g., flash_attention_2).
freeze_modules: List of submodules (e.g., visual) to freeze during training.
use_liger_kernel/use_rmpad: Performance optimizations. Keep enabled if supported on your stack.
fsdp2/fsdp_config: Enable FSDP2 sharding and wrap transformer layer classes via transformer_layer_cls_to_wrap. Tune reshard_after_forward for memory/perf trade-offs.
EMA (Exponential Moving Average): Enable EMA with ema_enabled: true. Configure ema_decay (default 0.9999), ema_update_every, ema_start_step, and optionally filter parameters via ema_param_filter. EMA checkpoints are saved alongside regular checkpoints and can be merged using merge_fsdp.py with --state_dict_dirname pytorch_ema_model_fsdp_0.

Run

Example launch command:

export NCCL_BLOCKING_WAIT=0
export TOKENIZERS_PARALLELISM=false

# Hugging Face setup (optional)
export HF_TOKEN="<YOUR HF_TOKEN>"
export HF_HOME="$HOME/.cache/huggingface"
export HF_HUB_ENABLE_HF_TRANSFER="1"

export NCCL_DEBUG=INFO

CONFIG=$1  # path to your YAML config

torchrun --nproc_per_node="8" \
    --nnodes="1" \
    --node_rank="0" \
    --master_addr="127.0.0.1" \
    --master_port="8000" \
    -m lmms_engine.launch.cli config_yaml=${CONFIG}

Run direct with cli and override with hydra

Instead of using a YAML config file, you can pass configuration directly via Hydra overrides on the command line. This is useful for quick experiments and parameter tuning.

Basic Usage

Use the format key=value to override any configuration parameter. Hydra automatically creates the nested structure:

torchrun --nproc_per_node="8" \
    --nnodes="1" \
    --node_rank="0" \
    --master_addr="127.0.0.1" \
    --master_port="8000" \
    -m lmms_engine.launch.cli \
    trainer_type=fsdp2_trainer \
    dataset_config.dataset_path=/path/to/video_dataset.yaml \
    dataset_config.dataset_format=yaml \
    dataset_config.dataset_type=qwen3_vl_iterable \
    dataset_config.processor_config.processor_name="Qwen/Qwen3-VL-8B-Instruct" \
    dataset_config.processor_config.processor_type=qwen3_vl \
    model_config.load_from_pretrained_path="Qwen/Qwen3-VL-8B-Instruct" \
    model_config.attn_implementation=flash_attention_2 \
    trainer_args.per_device_train_batch_size=1 \
    trainer_args.learning_rate=2.0e-04 \
    trainer_args.num_train_epochs=1 \
    trainer_args.output_dir=./output/debug \
    trainer_args.bf16=true

Common Overrides

Here are frequently used parameters you can override:

Dataset Configuration:

dataset_config.dataset_path: Path to your YAML dataset config
dataset_config.dataset_format: Format type (e.g., yaml, json)
dataset_config.dataset_type: Dataset type (e.g., vision, qwen3_vl_iterable)
dataset_config.processor_config.processor_name: Model name for the processor
dataset_config.processor_config.processor_type: Processor type to use
dataset_config.packing: Enable/disable sequence packing (e.g., packing=true)
dataset_config.packing_length: Max sequence length for packing
dataset_config.video_backend: Video processing backend (e.g., qwen_vl_utils)
dataset_config.video_sampling_strategy: Video sampling method (e.g., fps)
dataset_config.video_max_frames: Maximum frames per video

Model Configuration:

model_config.load_from_pretrained_path: Path or HF model ID to load from
model_config.attn_implementation: Attention implementation (e.g., flash_attention_2)

Training Arguments:

trainer_args.per_device_train_batch_size: Batch size per device
trainer_args.learning_rate: Learning rate (use float notation like 2.0e-04)
trainer_args.num_train_epochs: Number of training epochs
trainer_args.max_steps: Maximum training steps
trainer_args.gradient_accumulation_steps: Gradient accumulation steps
trainer_args.gradient_checkpointing: Enable gradient checkpointing
trainer_args.output_dir: Output directory for checkpoints
trainer_args.run_name: Name for this training run
trainer_args.bf16: Use bfloat16 precision
trainer_args.fsdp2: Enable FSDP2 distributed training
trainer_args.use_liger_kernel: Enable Liger kernel optimizations
trainer_args.use_rmpad: Enable padding removal optimization
trainer_args.ema_enabled: Enable EMA (default: false)
trainer_args.ema_decay: EMA decay rate (default: 0.9999)
trainer_args.ema_update_every: Update EMA every N steps (default: 1)
trainer_args.ema_start_step: Start EMA from step N (default: 0)
trainer_args.ema_requires_grad_only: Only apply EMA to trainable parameters (default: true)
trainer_args.ema_param_filter: Filter parameters by name (supports mode, include, exclude)
trainer_args.ema_resume_from_ema: Resume training from EMA weights (default: false)

Advanced Example

See examples/qwen3_vl/qwen3_vl_8b_train.sh for a complete training script using Hydra overrides with comprehensive parameter configuration for multi-GPU training.

Overriding Existing YAML Config

You can use a YAML config file as a base and override specific parameters via CLI using Hydra's config-path and config-name:

torchrun --nproc_per_node="8" \
    --nnodes="1" \
    --node_rank="0" \
    --master_addr="127.0.0.1" \
    --master_port="8000" \
    -m lmms_engine.launch.cli \
    --config-path /path/to/config_yaml/directory \
    --config-name qwen2_5_vl_dp \
    trainer_args.max_steps=100

This loads all settings from qwen2_5_vl_dp.yaml in the specified directory and only overrides the specified parameters (CLI overrides take precedence).

Tips

Use quotes for string values: processor_name="Qwen/Qwen2.5-VL-7B-Instruct"
Use dot notation for nested configs: trainer_args.learning_rate=1.0e-06
Boolean values: packing=true or packing=false
For complex values (lists/arrays), use Hydra's syntax: trainer_args.fsdp_config.transformer_layer_cls_to_wrap=["Qwen2_5_VLDecoderLayer"]
Add new parameters with +: +dataset_config.extra_kwargs.image_max_pixels=4194304

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train

Key fields

Run

Run direct with cli and override with hydra

Basic Usage

Common Overrides

Advanced Example

Overriding Existing YAML Config

Tips

FilesExpand file tree

train.md

Latest commit

History

train.md

File metadata and controls

Train

Key fields

Run

Run direct with cli and override with hydra

Basic Usage

Common Overrides

Advanced Example

Overriding Existing YAML Config

Tips