Skip to content

Commit c0ab0e6

Browse files
committed
Merge branch 'main' into doc-edit
2 parents 2679fb9 + 4e385b5 commit c0ab0e6

23 files changed

Lines changed: 232 additions & 50 deletions

README.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
[![Doc](https://img.shields.io/badge/Website-Doc-ff69b4.svg)](https://optimalscale.github.io/LMFlow/)
2222
[![Embark](https://img.shields.io/badge/Discord-LMFlow-%237289da.svg?logo=discord)](https://discord.gg/u9VJNpzhvA)
2323
[![slack badge](https://img.shields.io/badge/Slack-Join-blueviolet?logo=slack&amp)](https://join.slack.com/t/lmflow/shared_invite/zt-1wju9nicy-woXbNtS~5MavHSAtiMxmxQ)
24-
[![WeChat badge](https://img.shields.io/badge/WeChat-Join-brightgreen?logo=wechat&amp)](https://i.imgloc.com/2023/07/13/VgJyaZ.jpeg)
24+
[![WeChat badge](https://img.shields.io/badge/WeChat-Join-brightgreen?logo=wechat&amp)](https://s1.ax1x.com/2023/08/06/pPAQTPI.jpg)
2525

2626
An extensible, convenient, and efficient toolbox for finetuning large machine learning models, designed to be user-friendly, speedy and reliable, and accessible to the entire community.
2727

@@ -33,6 +33,7 @@ Large Model for All.
3333

3434

3535
## Latest News
36+
* [2023-08-07] Support [Flash Attention-2](https://crfm.stanford.edu/2023/07/17/flash2.html). Check out [flash_attention](https://github.com/OptimalScale/LMFlow/blob/main/readme/flash_attn2.md) for more details.
3637
* [2023-08-02] Support [Llama2](https://ai.meta.com/llama/), [ChatGLM2](https://huggingface.co/THUDM/chatglm2-6b), and [Baichuan](https://huggingface.co/baichuan-inc/Baichuan-7B) models.
3738
* [2023-07-23] :rocket: [LMFlow multimodal chatbot](https://github.com/OptimalScale/LMFlow/blob/main/scripts/run_vis_chatbot_gradio_minigpt4.sh) is now available! Support multimodal inputs of images and texts. [Online Demo](http://multimodal.lmflow.online) is also provided (We hold the service on a single GPU, hence one may experience "queuing" or "application busy" sometimes when multiple users are accessing at the same time, please wait and attempt again later when such event happens) :rocket: ![image](https://github.com/OptimalScale/LMFlow/blob/rpan-vision-encoder/assets/multimodal-chatbot-demo.gif)
3839
* [2023-06-22] [LMFlow paper](https://arxiv.org/abs/2306.12420) is out! Check out our implementation details at https://arxiv.org/abs/2306.12420
@@ -213,7 +214,7 @@ cd LMFlow
213214
conda create -n lmflow python=3.9 -y
214215
conda activate lmflow
215216
conda install mpi4py
216-
pip install -e .
217+
./install.sh
217218
```
218219

219220
## 2.Prepare Dataset
@@ -336,6 +337,16 @@ You can config the deepspeed under configs. Details can be referred at [DeepSpee
336337

337338
Thanks to the great efforts of [llama.cpp](https://github.com/ggerganov/llama.cpp). It is possible for everyone to run their LLaMA models on CPU by 4-bit quantization. We provide a script to convert LLaMA LoRA weights to `.pt` files. You only need to use `convert-pth-to-ggml.py` in llama.cpp to perform quantization.
338339

340+
### 4.4 Vocabulary List Extension
341+
342+
Now you can train your own sentencepiece tokenizer and merge it with model's origin hf tokenizer. Check out [vocab_extension](https://github.com/OptimalScale/LMFlow/blob/main/scripts/vocab_extension) for more details.
343+
344+
### 4.5 Position Interpolation for LLaMA Models
345+
Now LMFlow supports the latest Linear & NTK (Neural Kernel theory) scaling techniques for LLaMA models. Check out [postion_interpolation](
346+
https://github.com/OptimalScale/LMFlow/blob/main/readme/Position_Interpolation.md) for more details.
347+
348+
### 4.6 FlashAttention-2
349+
Now LMFlow supports the latest [FlashAttention-2](https://crfm.stanford.edu/2023/07/17/flash2.html). Check out [flash_attention](https://github.com/OptimalScale/LMFlow/blob/main/readme/flash_attn2.md) for more details.
339350

340351
## 5. Model Release
341352

@@ -385,7 +396,6 @@ Then you can check the model performance at our [Doc](https://optimalscale.githu
385396
Please refer to our [Documentation](https://optimalscale.github.io/LMFlow/) for more API reference and experimental results.
386397

387398

388-
389399
## Acknowledgement
390400
LMFlow draws inspiration from various studies, including but not limited to:
391401
- Alpaca: https://github.com/tatsu-lab/stanford_alpaca

install.sh

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
#!/bin/bash
2+
3+
pip install -e .
4+
5+
gpu_state="$(nvidia-smi --query-gpu=name --format=csv,noheader)"
6+
if [[ "${gpu_state}" == *"A100"* || "${gpu_state}" == *"A40"* ]]; then
7+
pip install flash-attn==2.0.2
8+
fi

readme/Position_Interpolation.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Position Interpolation
2+
Now LMFlow supports the latest Linear & NTK (Neural Kernel theory) scaling techniques for LLaMA models. \
3+
For more details of these techniques, you can checkout the links below:
4+
* Linear scaling: \
5+
https://arxiv.org/abs/2306.15595
6+
* NTK scaling: \
7+
https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/
8+
## Usage
9+
To use the Position Interpolation Techniques, you need to set the following options:
10+
```
11+
--truncate_to_model_max_length False
12+
--do_rope_scaling True
13+
```
14+
For linear scaling, set the extending ratio by:
15+
```
16+
--rope_pi_ratio 4
17+
```
18+
For NTK scaling, set the extending ratio by:
19+
```
20+
--rope_ntk_ratio 4
21+
```
22+
Here is an example of evaluation bash code:
23+
```
24+
#!/bin/bash
25+
26+
CUDA_VISIBLE_DEVICES=0 \
27+
deepspeed examples/evaluation.py \
28+
--answer_type text \
29+
--model_name_or_path pinkmanlove/llama-7b-hf \
30+
--dataset_path data/wiki_en_eval \
31+
--deepspeed examples/ds_config.json \
32+
--inference_batch_size_per_device 1 \
33+
--truncate_to_model_max_length False \
34+
--block_size 4096 \
35+
--use_flash_attention True \
36+
--do_rope_scaling True \
37+
--rope_pi_ratio 2 \
38+
--rope_ntk_ratio 4 \
39+
--metric ppl
40+
```

readme/flash_attn2.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Flash Attention 2.0
2+
We're thrilled to announce that LMFlow now supports training and inference using **FlashAttention-2**! This cutting-edge feature will take your language modeling to the next level. To use it, simply add ``` --use_flash_attention True ``` to the corresponding bash script.
3+
Here is an example of how to use it:
4+
```
5+
#!/bin/bash
6+
pip install flash_attn==2.0.2
7+
8+
deepspeed --master_port=11000 \
9+
examples/chatbot.py \
10+
--deepspeed configs/ds_config_chatbot.json \
11+
--model_name_or_path LMFlow/Full-Robin-7b-v2 \
12+
--max_new_tokens 1024 \
13+
--prompt_structure "###Human: {input_text}###Assistant:" \
14+
--end_string "#" \
15+
--use_flash_attention True
16+
```
17+
18+
Upgrade to LMFlow now and experience the future of language modeling!

scripts/run_evaluation.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
#!/bin/bash
22

3+
if [ ! -d data/MedQA-USMLE ]; then
4+
cd data && ./download.sh MedQA-USMLE && cd -
5+
fi
6+
37
CUDA_VISIBLE_DEVICES=0 \
48
deepspeed examples/evaluation.py \
59
--answer_type medmcqa \

scripts/run_evaluation_accelerator.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
#!/bin/bash
22

3+
if [ ! -d data/MedQA-USMLE ]; then
4+
cd data && ./download.sh MedQA-USMLE && cd -
5+
fi
6+
37
CUDA_VISIBLE_DEVICES=0 accelerate launch --config_file configs/accelerator_singlegpu_config.yaml examples/evaluation.py \
48
--answer_type usmle \
59
--model_name_or_path gpt2-large \

scripts/run_evaluation_with_lora.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,11 @@
33
# --model_name_or_path specifies the original huggingface model
44
# --lora_model_path specifies the model difference introduced by finetuning,
55
# i.e. the one saved by ./scripts/run_finetune_with_lora.sh
6+
7+
if [ ! -d data/alpaca ]; then
8+
cd data && ./download.sh alpaca && cd -
9+
fi
10+
611
CUDA_VISIBLE_DEVICES=0 \
712
deepspeed examples/evaluation.py \
813
--answer_type text \

scripts/run_finetune.sh

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,9 @@ output_dir=${project_dir}/output_models/${exp_id}
1414
log_dir=${project_dir}/log/${exp_id}
1515

1616
dataset_path=${project_dir}/data/alpaca/train
17+
if [ ! -d ${dataset_path} ]; then
18+
cd data && ./download.sh alpaca && cd -
19+
fi
1720

1821
mkdir -p ${output_dir} ${log_dir}
1922

@@ -27,7 +30,7 @@ deepspeed ${deepspeed_args} \
2730
--block_size 512 \
2831
--per_device_train_batch_size 1 \
2932
--deepspeed configs/ds_config_zero3.json \
30-
--bf16 \
33+
--fp16 \
3134
--run_name finetune \
3235
--validation_split_percentage 0 \
3336
--logging_steps 20 \

scripts/run_finetune_with_lora.sh

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ output_dir=${project_dir}/output_models/${exp_id}
1212
log_dir=${project_dir}/log/${exp_id}
1313

1414
dataset_path=${project_dir}/data/alpaca/train
15+
if [ ! -d ${dataset_path} ]; then
16+
cd data && ./download.sh alpaca && cd -
17+
fi
1518

1619
mkdir -p ${output_dir} ${log_dir}
1720

@@ -28,7 +31,7 @@ deepspeed ${deepspeed_args} \
2831
--lora_r 8 \
2932
--save_aggregated_lora 0\
3033
--deepspeed configs/ds_config_zero2.json \
31-
--bf16 \
34+
--fp16 \
3235
--run_name finetune_with_lora \
3336
--validation_split_percentage 0 \
3437
--logging_steps 20 \

scripts/run_finetune_with_lora_save_aggregated_weights.sh

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,9 @@ log_dir=${project_dir}/log/${exp_id}
1313

1414
dataset_path=${project_dir}/data/alpaca/train
1515
eval_dataset_path=${project_dir}/data/alpaca/test
16+
if [ ! -d ${dataset_path} ]; then
17+
cd data && ./download.sh alpaca && cd -
18+
fi
1619

1720
mkdir -p ${output_dir} ${log_dir}
1821

@@ -29,7 +32,7 @@ deepspeed ${deepspeed_args} \
2932
--lora_r 8 \
3033
--save_aggregated_lora 1\
3134
--deepspeed configs/ds_config_zero2.json \
32-
--bf16 \
35+
--fp16 \
3336
--run_name finetune_with_lora \
3437
--validation_split_percentage 0 \
3538
--logging_steps 20 \

0 commit comments

Comments
 (0)