|
2 | 2 |
|
3 | 3 | **Model2Vec** is a method to distill a small, fast model from any Sentence Transformer model. |
4 | 4 |
|
| 5 | +|  | |
| 6 | +|:--:| |
| 7 | +|*Model2vec allows you to create really fast small models that still perform well.*| |
| 8 | + |
| 9 | + |
5 | 10 | ## Table of Contents |
6 | | -- [Main Features](#main-features) |
7 | 11 | - [Quickstart](#quickstart) |
8 | 12 | - [What is Model2Vec?](#what-is-model2vec) |
| 13 | +- [Main Features](#main-features) |
9 | 14 | - [Who is this for?](#who-is-this-for) |
10 | 15 | - [Usage](#usage) |
11 | 16 | - [Distilling a Model2Vec model](#distilling-a-model2vec-model) |
|
15 | 20 | - [Results](#results) |
16 | 21 | - [Citing](#citing) |
17 | 22 |
|
18 | | -## Main Features |
19 | | -- **Small**: Model2Vec can reduce the size of a Sentence Transformer model by a factor of 15 *. |
20 | | -- **Fast distillation**: Model2Vec can distill a Sentence Transformer model in ~5 minutes on CPU *. |
21 | | -- **Fast inference**: Model2Vec creates static embeddings that are up to 500 times * faster than the original model. |
22 | | -- **State-of-the-art static embedding performance**: Model2Vec outperforms traditional static embeddings by a large margin on a number of benchmarks. |
23 | | -- **No data needed**: Distillation happens directly on a token leven, so no dataset is needed. |
24 | | -- **Simple to use**: Model2Vec provides an easy to use interface for distilling and inferencing Model2Vec models. |
25 | | -- **Bring your own model**: Model2Vec can be applied to any Sentence Transformer model. |
26 | | -- **Bring your own vocabulary**: Model2Vec can be applied to any vocabulary, allowing you to use your own domain-specific vocabulary. |
27 | | -- **Multi-lingual**: Model2Vec can easily be applied to any language. |
28 | | -- **Tightly integrated with HuggingFace hub**: Model2Vec models can be easily shared and loaded from the HuggingFace hub. Our models can be found [here](https://huggingface.co/minishlab). |
29 | | -- **Easy Evaluation**: Model2Vec comes with a set of evaluation tasks to measure the performance of the distilled model. |
30 | | - |
31 | | -\* Based on the [bge-base-en-v1.5 model](https://huggingface.co/BAAI/bge-base-en-v1.5). |
32 | | - |
33 | | - |
34 | 23 | ## Quickstart |
35 | 24 |
|
36 | 25 | Install the package with: |
@@ -69,6 +58,20 @@ Model2Vec is a simple and effective method to distill any sentence transformer i |
69 | 58 |
|
70 | 59 | This technique creates a small, fast, and powerful model that outperforms other static embedding models by a large margin on a a number of relevent tasks, while being much faster to create than traditional static embedding models such as GloVe, without need for a dataset. |
71 | 60 |
|
| 61 | +## Main Features |
| 62 | +- **Small**: Model2Vec can reduce the size of a Sentence Transformer model by a factor of 15 *. |
| 63 | +- **Fast distillation**: Model2Vec can distill a Sentence Transformer model in ~5 minutes on CPU *. |
| 64 | +- **Fast inference**: Model2Vec creates static embeddings that are up to 500 times * faster than the original model. |
| 65 | +- **State-of-the-art static embedding performance**: Model2Vec outperforms traditional static embeddings by a large margin on a number of benchmarks. |
| 66 | +- **No data needed**: Distillation happens directly on a token leven, so no dataset is needed. |
| 67 | +- **Simple to use**: Model2Vec provides an easy to use interface for distilling and inferencing Model2Vec models. |
| 68 | +- **Bring your own model**: Model2Vec can be applied to any Sentence Transformer model. |
| 69 | +- **Bring your own vocabulary**: Model2Vec can be applied to any vocabulary, allowing you to use your own domain-specific vocabulary. |
| 70 | +- **Multi-lingual**: Model2Vec can easily be applied to any language. |
| 71 | +- **Tightly integrated with HuggingFace hub**: Model2Vec models can be easily shared and loaded from the HuggingFace hub. Our models can be found [here](https://huggingface.co/minishlab). |
| 72 | +- **Easy Evaluation**: Model2Vec comes with a set of evaluation tasks to measure the performance of the distilled model. |
| 73 | + |
| 74 | +\* Based on the [bge-base-en-v1.5 model](https://huggingface.co/BAAI/bge-base-en-v1.5). |
72 | 75 |
|
73 | 76 | ## Who is this for? |
74 | 77 | Model2Vec allows anyone to create their own static embeddings from any Sentence Transformer model in minutes. It can easily be applied to other languages by using a language-specific Sentence Transformer model and vocab. Similarly, it can be applied to specific domains by using a domain specific model, vocab, or both. This makes it an ideal tool for fast prototyping, research, and production use cases where speed and size are more important than performance. |
@@ -238,7 +241,9 @@ As can be seen, the Model2Vec models outperforms the GloVe and WL256 models on a |
238 | 241 |
|
239 | 242 | The scatterplot below shows the relationship between the number of sentences per second and the average classification score. The bubble sizes correspond to the number of parameters in the models (larger = more parameters), and the colors correspond to the sentences per second (greener = more sentences per second). This plot shows that the Model2Vec models are much faster than the other models, while still being competitive in terms of classification performance with the all-MiniLM-L6-v2 model. |
240 | 243 |
|
241 | | - |
| 244 | +|  | |
| 245 | +|:--:| |
| 246 | +|*Figure: The average accuracy over all classification datasets plotted against sentence per second. The circle size indicates model size.*| |
242 | 247 |
|
243 | 248 | ## Citing |
244 | 249 |
|
|
0 commit comments