You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+9-6Lines changed: 9 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -855,6 +855,9 @@ Notes:
855
855
- When a boolean is passed, the expert parallel degree defaults to 1 and further the behaviour would be as follows:
856
856
- if True, it is Scatter MoE Kernels with experts sharded based on the top level sharding protocol (e.g. FSDP).
857
857
- if False, Scatter MoE Kernels with complete replication of experts across ranks.
858
+
- FSDP must be used when lora tuning with `--fast_moe`
859
+
- lora tuning with ScatterMoE is supported, but because of inference restrictions on vLLM/vanilla PEFT, the expert layers and router linear layer should not be trained as `target_modules` for models being tuned with ScatterMoE. Users have control over which `target_modules` they wish to train:
860
+
- At this time, only attention layers are trainable when using LoRA with scatterMoE. Until support for the router linear layer is added in, target modules must be specified explicitly (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj"]`) instead of passing `target_modules: ["all-linear"]`.
858
861
-`world_size` must be divisible by the `ep_degree`
859
862
-`number of experts` in the MoE module must be divisible by the `ep_degree`
860
863
- Running fast moe modifies the state dict of the model, and must be post-processed which happens automatically and the converted checkpoint can be found at `hf_converted_checkpoint` folder within every saved checkpoint directory. Alternatively, we can perform similar option manually through [checkpoint utils](https://github.com/foundation-model-stack/fms-acceleration/blob/main/plugins/accelerated-moe/src/fms_acceleration_moe/utils/checkpoint_utils.py) script.
@@ -916,12 +919,12 @@ For information on supported dataset formats and how to tune a vision-language m
916
919
917
920
? May be supported, but not tested
918
921
919
-
Model Name & Size | Model Architecture | Full Finetuning |
Additionally, once the offline data processing is complete, users can leverage the shards stored in `output_dir` for tuning by passing it through the `--training_data_path` flag or passing it via `data_paths` argument in data config yaml, provided they find the sharded datasets beneficial for training.
39
39
40
+
**NOTE**: The offline data preprocessing script is not compatible with processing image datasets for vision models.
0 commit comments