|
1 | | -# Model2vec |
| 1 | +# Model2Vec: Distill a Small Fast Model from any Sentence Transformer |
| 2 | + |
| 3 | +**Model2Vec** is a method to distill a small, fast model from any Sentence Transformer model. |
| 4 | + |
| 5 | +## Table of Contents |
| 6 | +- [Main Features](#main-features) |
| 7 | +- [Quickstart](#quickstart) |
| 8 | +- [What is Model2Vec?](#what-is-model2vec) |
| 9 | +- [Who is this for?](#who-is-this-for) |
| 10 | +- [Usage](#usage) |
| 11 | + - [Distilling a Model2Vec model](#distilling-a-model2vec-model) |
| 12 | + - [Inferencing a Model2Vec model](#inferencing-a-model2vec-model) |
| 13 | + - [Evaluating a Model2Vec model](#evaluating-a-model2vec-model) |
| 14 | +- [Results](#results) |
| 15 | +- [Citing](#citing) |
| 16 | + |
| 17 | +## Main Features |
| 18 | +- **Small**: Model2Vec can reduce the size of a Sentence Transformer model by a factor of 15 *. |
| 19 | +- **Fast distillation**: Model2Vec can distill a Sentence Transformer model in ~5 minutes on CPU *. |
| 20 | +- **Fast inference**: Model2Vec creates static embeddings that are up to 500 times * faster than the original model. |
| 21 | +- **State-of-the-art static embedding performance**: Model2Vec outperforms traditional static embeddings by a large margin on a number of benchmarks. |
| 22 | +- **No data needed**: Distillation happens directly on a token leven, so no dataset is needed. |
| 23 | +- **Simple to use**: Model2Vec provides an easy to use interface for distilling and inferencing Model2Vec models. |
| 24 | +- **Bring your own model**: Model2Vec can be applied to any Sentence Transformer model. |
| 25 | +- **Bring your own vocabulary**: Model2Vec can be applied to any vocabulary, allowing you to use your own domain-specific vocabulary. |
| 26 | +- **Multi-lingual**: Model2Vec can easily be applied to any language. |
| 27 | +- **Tightly integrated with HuggingFace hub**: Model2Vec models can be easily shared and loaded from the HuggingFace hub. Our models can be found [here](https://huggingface.co/minishlab). |
| 28 | +- **Easy Evaluation**: Model2Vec comes with a set of evaluation tasks to measure the performance of the distilled model. |
| 29 | + |
| 30 | +\* Based on the [bge-base-en-v1.5 model](https://huggingface.co/BAAI/bge-base-en-v1.5). |
| 31 | + |
| 32 | + |
| 33 | +## Quickstart |
| 34 | + |
| 35 | +Install the package with: |
| 36 | +```bash |
| 37 | +pip install model2vec |
| 38 | +``` |
| 39 | + |
| 40 | +The easiest way to get started with Model2Vec is to download one of our [flagship models from the HuggingFace hub](https://huggingface.co/minishlab). These models are pre-trained and ready to use. The following code snippet shows how to load a model and make embeddings: |
| 41 | +```python |
| 42 | +from model2vec import StaticModel |
| 43 | + |
| 44 | +# Load a model from the HuggingFace hub (in this case the M2V_base_output model) |
| 45 | +model_name = "minishlab/M2V_base_output" |
| 46 | +model = StaticModel.from_pretrained(model_name) |
| 47 | + |
| 48 | +# Make embeddings |
| 49 | +embeddings = model.encode(["It's dangerous to go alone!", "It's a secret to everyone."]) |
| 50 | +``` |
| 51 | + |
| 52 | +Alternatively, you can distill your own Model2Vec model from a Sentence Transformer model. The following code snippet shows how to distill a model: |
| 53 | +```python |
| 54 | +from model2vec.distill import distill |
| 55 | + |
| 56 | +# Choose a Sentence Transformer model |
| 57 | +model_name = "BAAI/bge-base-en-v1.5" |
| 58 | + |
| 59 | +# Distill the model |
| 60 | +m2v_model = distill(model_name=model_name, pca_dims=256) |
| 61 | + |
| 62 | +# Save the model |
| 63 | +m2v_model.save_pretrained("m2v_model") |
| 64 | +``` |
| 65 | + |
| 66 | +## What is Model2Vec? |
| 67 | +Model2Vec is a simple and effective method to distill any sentence transformer into static embeddings. It works by inferencing a vocabulary with the specified Sentence Transformer model, reducing the dimensionality of the embeddings using PCA, weighting the embeddings using zipf weighting, and storing the embeddings in a static format. |
| 68 | + |
| 69 | +This technique creates a small, fast, and powerful model that outperforms other static embedding models by a large margin on a a number of relevent tasks, while being much faster to create than traditional static embedding models such as GloVe, without need for a dataset. |
| 70 | + |
| 71 | + |
| 72 | +## Who is this for? |
| 73 | +Model2Vec allows anyone to create their own static embeddings from any Sentence Transformer model in minutes. It can easily be applied to other languages by using a language-specific Sentence Transformer model and vocab. Similarly, it can be applied to specific domains by using a domain specific model, vocab, or both. This makes it an ideal tool for fast prototyping, research, and production use cases where speed and size are more important than performance. |
| 74 | + |
| 75 | + |
| 76 | + |
| 77 | + |
| 78 | +## Usage |
| 79 | + |
| 80 | +### Distilling a Model2Vec model |
| 81 | + |
| 82 | +Distilling a model from the output embeddings of a Sentence Transformer model: |
| 83 | +```python |
| 84 | +from model2vec.distill import distill |
| 85 | + |
| 86 | +# Choose a Sentence Transformer model |
| 87 | +model_name = "BAAI/bge-base-en-v1.5" |
| 88 | + |
| 89 | +# Distill the model |
| 90 | +m2v_model = distill(model_name=model_name, pca_dims=256) |
| 91 | + |
| 92 | +# Save the model |
| 93 | +m2v_model.save_pretrained("m2v_model") |
| 94 | + |
| 95 | +``` |
| 96 | + |
| 97 | +Distilling with a custom vocabulary: |
| 98 | +```python |
| 99 | +from model2vec.distill import distill |
| 100 | + |
| 101 | +# Load a vocabulary as a list of strings |
| 102 | +vocabulary = ["word1", "word2", "word3"] |
| 103 | +# Choose a Sentence Transformer model |
| 104 | +model_name = "BAAI/bge-base-en-v1.5" |
| 105 | + |
| 106 | +# Distill the model with the custom vocabulary |
| 107 | +m2v_model = distill(model_name=model_name, vocabulary=vocabulary, pca_dims=256) |
| 108 | + |
| 109 | +# Save the model |
| 110 | +m2v_model.save_pretrained("m2v_model") |
| 111 | +``` |
| 112 | + |
| 113 | +Alternatively, the command line interface can be used to distill a model: |
| 114 | +```bash |
| 115 | +python3 -m model2vec.distill --model-name BAAI/bge-base-en-v1.5 --vocabulary-path vocab.txt --device mps --save-path model2vec_model |
| 116 | +``` |
| 117 | + |
| 118 | +### Inferencing a Model2Vec model |
| 119 | +Inferencing with one of our flagship Model2Vec models: |
| 120 | +```python |
| 121 | +from model2vec import StaticModel |
| 122 | + |
| 123 | +# Load a model from the HuggingFace hub |
| 124 | +model_name = "minishlab/M2V_base_output" |
| 125 | +model = StaticModel.from_pretrained(model_name) |
| 126 | + |
| 127 | +# Make embeddings |
| 128 | +embeddings = model.encode(["It's dangerous to go alone!", "It's a secret to everyone."]) |
| 129 | +``` |
| 130 | + |
| 131 | + |
| 132 | +Inferencing with a saved Model2Vec model: |
| 133 | +```python |
| 134 | +from model2vec import StaticModel |
| 135 | + |
| 136 | +# Load a saved model |
| 137 | +model_name = "m2v_model" |
| 138 | +model = StaticModel.from_pretrained(model_name) |
| 139 | + |
| 140 | +# Make embeddings |
| 141 | +embeddings = model.encode(["It's dangerous to go alone!", "It's a secret to everyone."]) |
| 142 | +``` |
| 143 | + |
| 144 | +### Evaluating a Model2Vec model |
| 145 | + |
| 146 | +Model2Vec models can be evaluated using our [evaluation package](https://github.com/MinishLab/evaluation). To run this, first install the optionall evaluation package: |
| 147 | +```bash |
| 148 | +pip install model2vec[evaluation] |
| 149 | +``` |
| 150 | + |
| 151 | +Then, the following code snippet shows how to evaluate a Model2Vec model: |
| 152 | +```python |
| 153 | +from model2vec import StaticModel |
| 154 | + |
| 155 | +from evaluation import CustomMTEB, get_tasks, parse_mteb_results, make_leaderboard, summarize_results |
| 156 | +from mteb import ModelMeta |
| 157 | + |
| 158 | +# Get all available tasks |
| 159 | +tasks = get_tasks() |
| 160 | +# Define the CustomMTEB object with the specified tasks |
| 161 | +evaluation = CustomMTEB(tasks=tasks) |
| 162 | + |
| 163 | +# Load the model |
| 164 | +model_name = "m2v_model" |
| 165 | +model = StaticModel.from_pretrained(model_name) |
| 166 | + |
| 167 | +# Optionally, add model metadata in MTEB format |
| 168 | +model.mteb_model_meta = ModelMeta( |
| 169 | + name=model_name, revision="no_revision_available", release_date=None, languages=None |
| 170 | + ) |
| 171 | + |
| 172 | +# Run the evaluation |
| 173 | +results = evaluation.run(model, eval_splits=["test"], output_folder=f"results/{model_name}") |
| 174 | + |
| 175 | +# Parse the results and summarize them |
| 176 | +parsed_results = parse_mteb_results(mteb_results=results, model_name=model_name) |
| 177 | +task_scores = summarize_results(parsed_results) |
| 178 | +# Print the results in a leaderboard format |
| 179 | +print(make_leaderboard(task_scores)) |
| 180 | +``` |
| 181 | + |
| 182 | +## Results |
| 183 | + |
| 184 | +### Main Results |
| 185 | + |
| 186 | +Model2Vec is evaluated on MTEB, as well as two additional tasks: PEARL (a phrase representation task) and WordSim (a word similarity task). The results are shown in the table below. |
| 187 | + |
| 188 | + |
| 189 | + |
| 190 | + |
| 191 | +| Model | Avg (All) | Avg (MTEB) | Class | Clust | PairClass | Rank | Ret | STS | Sum | PEARL | WordSim | |
| 192 | +|------------------|-------------|------------|-------|-------|-----------|-------|-------|-------|-------|-------|---------| |
| 193 | +| all-MiniLM-L6-v2 | 56.08 | 56.09 | 62.62 | 41.94 | 82.37 | 58.04 | 41.95 | 78.90 | 30.81 | 60.83 | 49.91 | |
| 194 | +| M2V_base_glove | 48.58 | 47.60 | 61.35 | 30.52 | 75.34 | 48.50 | 29.26 | 70.31 | 31.50 | 50.28 | 54.29 | |
| 195 | +| M2V_base_output | 46.79 | 45.34 | 61.25 | 25.58 | 74.90 | 47.63 | 26.14 | 68.58 | 29.20 | 54.02 | 49.18 | |
| 196 | +| GloVe_300d | 42.84 | 42.36 | 57.31 | 27.66 | 72.48 | 43.30 | 22.78 | 61.90 | 28.81 | 45.65 | 43.05 | |
| 197 | +| WL256* | 48.88 | 49.36 | 58.98 | 33.34 | 74.00 | 52.03 | 33.12 | 73.34 | 29.05 | 48.81 | 45.16 | |
| 198 | + |
| 199 | +<details> |
| 200 | + <summary> Task Abbreviations </summary> |
| 201 | + |
| 202 | +For readability, the MTEB task names are abbreviated as follows: |
| 203 | +- Class: Classification |
| 204 | +- Clust: Clustering |
| 205 | +- PairClass: PairClassification |
| 206 | +- Rank: Reranking |
| 207 | +- Ret: Retrieval |
| 208 | +- STS: Semantic Textual Similarity |
| 209 | +- Sum: Summarization |
| 210 | +</details> |
| 211 | + |
| 212 | +\ |
| 213 | +\* WL256, introduced in the [WordLlama](https://github.com/dleemiller/WordLlama/tree/main) package is included for comparison due to its similarities to Model2Vec. However, we believe it is heavily overfit to the MTEB dataset since it is trained on datasets used in MTEB itself. This can be seen by the fact that the WL256 model performs much worse on the non-MTEB tasks (PEARL and WordSim) than our models. The results shown in the [Classification and Speed Benchmarks](#classification-and-speed-benchmarks) further support this. |
| 214 | + |
| 215 | +### Classification and Speed Benchmarks |
| 216 | + |
| 217 | +In addition to the MTEB evaluation, Model2Vec is evaluated on a number of classification datasets. These are used as additional analysis to avoid overfitting to the MTEB dataset and to benchmark the speed of the model. The results are shown in the table below. |
| 218 | + |
| 219 | +| model | Average | sst2 | imdb | trec | ag_news | |
| 220 | +|:-----------------|----------:|---------:|-------:|---------:|----------:| |
| 221 | +| bge-base-en-v1.5 | 90.00 | 91.54 | 91.88 | 85.16 | 91.45 | |
| 222 | +| all-MiniLM-L6-v2 | 84.10 | 83.95 | 81.36 | 81.31 | 89.77 | |
| 223 | +| M2V_base_output | 82.23 | 80.92 | 84.56 | 75.27 | 88.17 | |
| 224 | +| M2V_base_glove | 80.76 | 83.07 | 85.24 | 66.12 | 88.61 | |
| 225 | +| WL256 | 78.48 | 76.88 | 80.12 | 69.23 | 87.68 | |
| 226 | +| GloVe_300d | 77.77 | 81.68 | 84.00 | 55.67 | 89.71 | |
| 227 | + |
| 228 | +As can be seen, the Model2Vec models outperforms the GloVe and WL256 models on all classification tasks, and is competitive with the all-MiniLM-L6-v2 model while being much faster. |
| 229 | + |
| 230 | +The scatterplot below shows the relationship between the number of sentences per second and the average classification score. The bubble sizes correspond to the number of parameters in the models (larger = more parameters), and the colors correspond to the sentences per second (greener = more sentences per second). This plot shows that the Model2Vec models are much faster than the other models, while still being competitive in terms of classification performance with the all-MiniLM-L6-v2 model. |
| 231 | + |
| 232 | + |
| 233 | + |
| 234 | +## Citing |
| 235 | + |
| 236 | +If you use Model2Vec in your research, please cite the following: |
| 237 | +```bibtex |
| 238 | +@software{minishlab2024word2vec, |
| 239 | + authors = {Stephan Tulkens, Thomas van Dongen}, |
| 240 | + title = {Model2Vec: Turn any Sentence Transformer into a Small Fast Model}, |
| 241 | + year = {2024}, |
| 242 | + url = {https://github.com/MinishLab/model2vec}, |
| 243 | +} |
| 244 | +``` |
0 commit comments