Skip to content

Commit c96502e

Browse files
committed
feat(cookbook): update transformers model configuration for Qwen3.5
- Replace generic TransformersModel with Qwen3.5ForConditionalGeneration - Set custom `_no_split_modules` to {'Qwen3_5DecoderLayer'} for FSDP compatibility - Use specific model ID 'ms://Qwen/Qwen3.5-4B' instead of generic MODEL_ID - Remove explicit strategy parameter as it's handled by model configuration
1 parent 7816375 commit c96502e

1 file changed

Lines changed: 3 additions & 6 deletions

File tree

cookbook/transformers/sp_fsdp_dense.py

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -62,12 +62,9 @@ def train():
6262
batch_size=8,
6363
device_mesh=device_mesh,
6464
)
65-
66-
model = TransformersModel(
67-
model_id=MODEL_ID,
68-
device_mesh=device_mesh,
69-
strategy='native_fsdp',
70-
)
65+
from transformers.models.qwen3_5.modeling_qwen3_5 import Qwen3_5ForConditionalGeneration
66+
model = TransformersModel(model_id='ms://Qwen/Qwen3.5-4B', model_cls=Qwen3_5ForConditionalGeneration)
67+
model.model._no_split_modules = {'Qwen3_5DecoderLayer'}
7168

7269
lora_config = LoraConfig(target_modules='all-linear')
7370
model.add_adapter_to_model('default', lora_config, gradient_accumulation_steps=1)

0 commit comments

Comments
 (0)