Skip to content

Commit 3c25265

Browse files
committed
explitcitly don't support router layer
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
1 parent 8449659 commit 3c25265

2 files changed

Lines changed: 20 additions & 6 deletions

File tree

README.md

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -902,12 +902,9 @@ Notes:
902902
- When a boolean is passed, the expert parallel degree defaults to 1 and further the behaviour would be as follows:
903903
- if True, it is Scatter MoE Kernels with experts sharded based on the top level sharding protocol (e.g. FSDP).
904904
- if False, Scatter MoE Kernels with complete replication of experts across ranks.
905-
- lora tuning with ScatterMoE is supported, but because of inference restrictions on vLLM/vanilla PEFT, experts should not be trained as `target_modules` for models being tuned with ScatterMoE. Users have control over which `target_modules` they wish to train:
906-
- Passing `all-linear` to adapter layers will include the router, which is a linear layer, and all attn layers. This **will not** train the expert layers.
907-
- To train only attention layers, specify target modules specifically (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj"]`).
908-
- To train expert layers, specify `input_linear` and `output_linear` in target modules along with `router` (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj", "router", "input_linear", "output_linear"]`). If you specify these layers, inference with vLLM/vanilla HF PEFT **is not possible**.
909-
- When lora tuning with ScatterMoE, the values `--fast_moe 1` or `--fast_moe True` are not expected to work, as FSDP must be enabled when lora tuning. Run either `--fast_moe False` or `--fast-moe x>1`.
910-
- When lora tuning with ScatterMoE, `--r` must be set to 16 or greater.
905+
- lora tuning with ScatterMoE is supported, but because of inference restrictions on vLLM/vanilla PEFT, the expert layers and router linear layer should not be trained as `target_modules` for models being tuned with ScatterMoE. Users have control over which `target_modules` they wish to train:
906+
- At this time, only attention layers are trainable when using LoRA with scatterMoE. Until support for the router linear layer is added in, target modules must be specified explicitly (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj"]`) instead of passing `target_modules: ["all-linear"]`.
907+
- When lora tuning with ScatterMoE, the value `--fast_moe True` is not expected to work, as FSDP must be enabled when lora tuning. Run either `--fast_moe False` or `--fast-moe x>=1`.
911908
- `world_size` must be divisible by the `--ep_degree`
912909
- `number of experts` in the MoE module must be divisible by the `ep_degree`
913910
- Running fast moe modifies the state dict of the model, and must be post-processed which happens automatically and the converted checkpoint can be found at `hf_converted_checkpoint` folder within every saved checkpoint directory. Alternatively, we can perform similar option manually through [checkpoint utils](https://github.com/foundation-model-stack/fms-acceleration/blob/main/plugins/accelerated-moe/src/fms_acceleration_moe/utils/checkpoint_utils.py) script.

tuning/sft_trainer.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,23 @@ def train(
155155
"Trainer should not perform packing when using `--padding_free`"
156156
)
157157

158+
if fast_moe_config is not None:
159+
# Checking for unsupported modules with Scatter MoE for LoRA
160+
restricted_modules = ["all-linear", "output_linear", "input_linear", "router"]
161+
if (
162+
peft_config is not None
163+
and hasattr(peft_config, "target_modules")
164+
and any(
165+
module in (peft_config.target_modules or [])
166+
for module in restricted_modules
167+
)
168+
):
169+
raise ValueError(
170+
"`--fast_moe` with LoRA does not currently support `all-linear`, `router`, "
171+
"`input_linear` or `output_linear` as target modules at this time. Please "
172+
"explicitly specify target modules when using `--fast_moe` with LoRA."
173+
)
174+
158175
task_type = "CAUSAL_LM"
159176
additional_metrics = {}
160177

0 commit comments

Comments
 (0)