Skip to content

Commit 2c56c30

Browse files
authored
doc: How to tune LoRA lm_head (#305)
* doc: How to tune LoRA lm_head Signed-off-by: Angel Luu <angel.luu@us.ibm.com> * doc: Move lm_head into a collapsible section Signed-off-by: Angel Luu <angel.luu@us.ibm.com> * Fix grammar errors Co-authored-by: Anh Uong <anhuong4444@gmail.com> Signed-off-by: Angel Luu <an317gel@gmail.com> Signed-off-by: Anh Uong <anh.uong@ibm.com> * Some rewording Signed-off-by: Angel Luu <angel.luu@us.ibm.com> --------- Signed-off-by: Angel Luu <angel.luu@us.ibm.com> Signed-off-by: Angel Luu <an317gel@gmail.com>
1 parent a313983 commit 2c56c30

1 file changed

Lines changed: 45 additions & 11 deletions

File tree

README.md

Lines changed: 45 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -280,15 +280,15 @@ Set `peft_method` to `"lora"`. You can additionally pass any arguments from [Lor
280280
r: int =8
281281
lora_alpha: int = 32
282282
target_modules: List[str] = field(
283-
default_factory=lambda: ["q_proj", "v_proj"],
284-
metadata={
285-
"help": "The names of the modules to apply LORA to. LORA selects modules which either \
286-
completely match or "
287-
'end with one of the strings. If the value is ["all-linear"], \
288-
then LORA selects all linear and Conv1D '
289-
"modules except for the output layer."
290-
},
291-
)
283+
default=None,
284+
metadata={
285+
"help": "The names of the modules to apply LORA to. LORA selects modules which either \
286+
completely match or "
287+
'end with one of the strings. If the value is ["all-linear"], \
288+
then LORA selects all linear and Conv1D '
289+
"modules except for the output layer."
290+
},
291+
)
292292
bias = "none"
293293
lora_dropout: float = 0.05
294294
```
@@ -331,8 +331,11 @@ Equally you can pass in a JSON configuration for running tuning. See [build doc]
331331
}
332332
```
333333

334-
Notice the `target_modules` that are set are the default values. `target_modules` are the names of the modules to apply the adapter to. If this is specified, only the modules with the specified names will be replaced. When passing a list of strings, either an exact match will be performed or it is checked if the name of the module ends with any of the passed strings. If this is specified as `all-linear`, then all linear/Conv1D modules are chosen, excluding the output layer. If this is not specified, modules will be chosen according to the model architecture. If the architecture is not known, an error will be raised — in this case, you should specify the target modules manually. See [HuggingFace docs](https://huggingface.co/docs/peft/en/package_reference/lora#peft.LoraConfig) for more details.
334+
Notice the `target_modules` are the names of the modules to apply the adapter to.
335+
- If this is specified, only the modules with the specified names will be replaced. When passing a list of strings, either an exact match will be performed or it is checked if the name of the module ends with any of the passed strings. If this is specified as `all-linear`, then all linear/Conv1D modules are chosen, excluding the output layer. If this is specified as `lm_head` which is an output layer, the `lm_head` layer will be chosen. See the Note of this [section](#recommended-target-modules-per-model-architecture) on recommended target modules by model architecture.
336+
- If this is not specified, modules will be chosen according to the model architecture. If the architecture is not known, an error will be raised — in this case, you should specify the target modules manually. See [HuggingFace docs](https://huggingface.co/docs/peft/en/package_reference/lora#peft.LoraConfig) for more details.
335337

338+
#### How to get list of LoRA target_modules of a model
336339
For each model, the `target_modules` will depend on the type of model architecture. You can specify linear or attention layers to `target_modules`. To obtain list of `target_modules` for a model:
337340

338341
```py
@@ -387,7 +390,38 @@ For example for LLaMA model the modules look like:
387390
You can specify attention or linear layers. With the CLI, you can specify layers with `--target_modules "q_proj" "v_proj" "k_proj" "o_proj"` or `--target_modules "all-linear"`.
388391

389392
#### Recommended target modules per model architecture
390-
As per [LoRA paper](https://arxiv.org/pdf/2106.09685), section 4.2 , by using the query and value projection matrices, we can achieve reasonable quality with efficient GPU utilization. Hence, while thinking about what LoRA adapters to specify, we recommend starting with query and value matrices. You could also refer to the defaults specified by PEFT library for popular model architectures in section [TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING](https://github.com/huggingface/peft/blob/7b1c08d2b5e13d3c99b7d6ee83eab90e1216d4ba/src/peft/utils/constants.py#L70) as a good starting point.
393+
As per [LoRA paper](https://arxiv.org/pdf/2106.09685), section 4.2 , by using the query and value projection matrices, we can achieve reasonable quality with efficient GPU utilization. Hence, while thinking about what LoRA adapters to specify, we recommend starting with query and value matrices. You could also refer to the defaults specified by PEFT library for popular model architectures in section [TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING](https://github.com/huggingface/peft/blob/7b1c08d2b5e13d3c99b7d6ee83eab90e1216d4ba/src/peft/utils/constants.py#L70) as a good starting point.
394+
395+
<details>
396+
397+
<summary>How to specify lm_head as a target module</summary>
398+
399+
Since `lm_head` is an output layer, it will **not** be included as a target module if you specify `all-linear`. You can, however, specify to apply the LoRA adapter to the `lm_head` layer by explicitly naming it in the `target_modules` arg.
400+
401+
**NOTE**: Specifying `["lm_head", "all-linear"]` will not tune the `lm_head` layer, but will run the equivalent of `["all-linear"]`. To include `lm_head`, you must explicitly specify all of the layers to tune on. Using the example of the Llama model above, you would need to list `"q_proj" "v_proj" "k_proj" "o_proj" "lm_head"` to tune the all linear layers including `lm_head`. These 5 layers will be produced in the LoRA adapter.
402+
403+
Example 1:
404+
```json
405+
{
406+
"target_modules": ["lm_head"] // this produces lm_head layer only
407+
}
408+
```
409+
410+
Example 2:
411+
```json
412+
{
413+
"target_modules": ["lm_head", "c_proj", "c_attn", "c_fc"] // this produces lm_head, c_proj, c_attn and c_fc layers
414+
}
415+
```
416+
417+
Example 3:
418+
```json
419+
{
420+
"target_modules": ["lm_head", "all-linear"] // this produces the equivalent of all-linear only, no lm_head
421+
}
422+
```
423+
424+
</details>
391425

392426
_________________________
393427

0 commit comments

Comments
 (0)