You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An extensible, convenient, and efficient toolbox for finetuning large machine learning models, designed to be user-friendly, speedy and reliable, and accessible to the entire community.
27
27
@@ -33,6 +33,7 @@ Large Model for All.
33
33
34
34
35
35
## Latest News
36
+
*[2023-08-07] Support [Flash Attention-2](https://crfm.stanford.edu/2023/07/17/flash2.html). Check out [flash_attention](https://github.com/OptimalScale/LMFlow/blob/main/readme/flash_attn2.md) for more details.
36
37
*[2023-08-02] Support [Llama2](https://ai.meta.com/llama/), [ChatGLM2](https://huggingface.co/THUDM/chatglm2-6b), and [Baichuan](https://huggingface.co/baichuan-inc/Baichuan-7B) models.
37
38
*[2023-07-23]:rocket:[LMFlow multimodal chatbot](https://github.com/OptimalScale/LMFlow/blob/main/scripts/run_vis_chatbot_gradio_minigpt4.sh) is now available! Support multimodal inputs of images and texts. [Online Demo](http://multimodal.lmflow.online) is also provided (We hold the service on a single GPU, hence one may experience "queuing" or "application busy" sometimes when multiple users are accessing at the same time, please wait and attempt again later when such event happens) :rocket:
38
39
*[2023-06-22][LMFlow paper](https://arxiv.org/abs/2306.12420) is out! Check out our implementation details at https://arxiv.org/abs/2306.12420
@@ -213,7 +214,7 @@ cd LMFlow
213
214
conda create -n lmflow python=3.9 -y
214
215
conda activate lmflow
215
216
conda install mpi4py
216
-
pip install -e .
217
+
./install.sh
217
218
```
218
219
219
220
## 2.Prepare Dataset
@@ -336,6 +337,16 @@ You can config the deepspeed under configs. Details can be referred at [DeepSpee
336
337
337
338
Thanks to the great efforts of [llama.cpp](https://github.com/ggerganov/llama.cpp). It is possible for everyone to run their LLaMA models on CPU by 4-bit quantization. We provide a script to convert LLaMA LoRA weights to `.pt` files. You only need to use `convert-pth-to-ggml.py` in llama.cpp to perform quantization.
338
339
340
+
### 4.4 Vocabulary List Extension
341
+
342
+
Now you can train your own sentencepiece tokenizer and merge it with model's origin hf tokenizer. Check out [vocab_extension](https://github.com/OptimalScale/LMFlow/blob/main/scripts/vocab_extension) for more details.
343
+
344
+
### 4.5 Position Interpolation for LLaMA Models
345
+
Now LMFlow supports the latest Linear & NTK (Neural Kernel theory) scaling techniques for LLaMA models. Check out [postion_interpolation](
346
+
https://github.com/OptimalScale/LMFlow/blob/main/readme/Position_Interpolation.md) for more details.
347
+
348
+
### 4.6 FlashAttention-2
349
+
Now LMFlow supports the latest [FlashAttention-2](https://crfm.stanford.edu/2023/07/17/flash2.html). Check out [flash_attention](https://github.com/OptimalScale/LMFlow/blob/main/readme/flash_attn2.md) for more details.
339
350
340
351
## 5. Model Release
341
352
@@ -385,7 +396,6 @@ Then you can check the model performance at our [Doc](https://optimalscale.githu
385
396
Please refer to our [Documentation](https://optimalscale.github.io/LMFlow/) for more API reference and experimental results.
386
397
387
398
388
-
389
399
## Acknowledgement
390
400
LMFlow draws inspiration from various studies, including but not limited to:
We're thrilled to announce that LMFlow now supports training and inference using **FlashAttention-2**! This cutting-edge feature will take your language modeling to the next level. To use it, simply add ``` --use_flash_attention True ``` to the corresponding bash script.
0 commit comments