You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Model2Vec is a simple and effective method to distill any sentence transformer into static embeddings. It works by inferencing a vocabulary with the specified Sentence Transformer model, reducing the dimensionality of the embeddings using PCA, weighting the embeddings using zipf weighting, and storing the embeddings in a static format.
68
+
Model2Vec is a simple and effective method to distill any sentence transformer into static embeddings. It works by inferencing a vocabulary with the specified Sentence Transformer model, reducing the dimensionality of the embeddings using PCA, weighting the embeddings using zipf weighting, and storing the embeddings in a static format. When a vocabulary is passed, a word-level tokenizer is created on the fly based on the vocabulary. When output embeddings are used, the subword tokenizer from the Sentence Transformer is used.
68
69
69
70
This technique creates a small, fast, and powerful model that outperforms other static embedding models by a large margin on a a number of relevent tasks, while being much faster to create than traditional static embedding models such as GloVe, without need for a dataset.
70
71
@@ -143,7 +144,7 @@ embeddings = model.encode(["It's dangerous to go alone!", "It's a secret to ever
143
144
144
145
### Evaluating a Model2Vec model
145
146
146
-
Model2Vec models can be evaluated using our [evaluation package](https://github.com/MinishLab/evaluation). To run this, first install the optionall evaluation package:
147
+
Model2Vec models can be evaluated using our [evaluation package](https://github.com/MinishLab/evaluation). To run this, first install the optional evaluation package:
|[M2V_base_glove](https://huggingface.co/minishlab/M2V_base_glove)| English | Flagship embedding model based on GloVe vocab. | GloVe |[bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)| 102M |
190
+
|[M2V_base_output](https://huggingface.co/minishlab/M2V_base_output)| English | Flagship embedding model based on bge-base-en-v1.5 vocab. Uses a subword tokenizer. | Output |[bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)| 7.5M |
0 commit comments