Skip to content

Commit f4157f1

Browse files
authored
docs: Added model list (#16)
* Added model list * Updated word * Added description about tokenizer * Small update to evaluation example * Updated docs
1 parent d5cd938 commit f4157f1

1 file changed

Lines changed: 12 additions & 3 deletions

File tree

README.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
- [Distilling a Model2Vec model](#distilling-a-model2vec-model)
1212
- [Inferencing a Model2Vec model](#inferencing-a-model2vec-model)
1313
- [Evaluating a Model2Vec model](#evaluating-a-model2vec-model)
14+
- [Model List](#model-list)
1415
- [Results](#results)
1516
- [Citing](#citing)
1617

@@ -64,7 +65,7 @@ m2v_model.save_pretrained("m2v_model")
6465
```
6566

6667
## What is Model2Vec?
67-
Model2Vec is a simple and effective method to distill any sentence transformer into static embeddings. It works by inferencing a vocabulary with the specified Sentence Transformer model, reducing the dimensionality of the embeddings using PCA, weighting the embeddings using zipf weighting, and storing the embeddings in a static format.
68+
Model2Vec is a simple and effective method to distill any sentence transformer into static embeddings. It works by inferencing a vocabulary with the specified Sentence Transformer model, reducing the dimensionality of the embeddings using PCA, weighting the embeddings using zipf weighting, and storing the embeddings in a static format. When a vocabulary is passed, a word-level tokenizer is created on the fly based on the vocabulary. When output embeddings are used, the subword tokenizer from the Sentence Transformer is used.
6869

6970
This technique creates a small, fast, and powerful model that outperforms other static embedding models by a large margin on a a number of relevent tasks, while being much faster to create than traditional static embedding models such as GloVe, without need for a dataset.
7071

@@ -143,7 +144,7 @@ embeddings = model.encode(["It's dangerous to go alone!", "It's a secret to ever
143144

144145
### Evaluating a Model2Vec model
145146

146-
Model2Vec models can be evaluated using our [evaluation package](https://github.com/MinishLab/evaluation). To run this, first install the optionall evaluation package:
147+
Model2Vec models can be evaluated using our [evaluation package](https://github.com/MinishLab/evaluation). To run this, first install the optional evaluation package:
147148
```bash
148149
pip install evaluation@git+https://github.com/MinishLab/evaluation@main
149150
```
@@ -170,15 +171,23 @@ model.mteb_model_meta = ModelMeta(
170171
)
171172

172173
# Run the evaluation
173-
results = evaluation.run(model, eval_splits=["test"], output_folder=f"results/{model_name}")
174+
results = evaluation.run(model, eval_splits=["test"], output_folder=f"results")
174175

175176
# Parse the results and summarize them
176177
parsed_results = parse_mteb_results(mteb_results=results, model_name=model_name)
177178
task_scores = summarize_results(parsed_results)
179+
178180
# Print the results in a leaderboard format
179181
print(make_leaderboard(task_scores))
180182
```
181183

184+
## Model List
185+
186+
187+
| Model | Language | Description | Vocab | Sentence Transformer | Params |
188+
|------------------------|-------------|-----------------------------------------------------------------------|----------------|-----------------------|--------------|
189+
| [M2V_base_glove](https://huggingface.co/minishlab/M2V_base_glove) | English | Flagship embedding model based on GloVe vocab. | GloVe | [bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | 102M |
190+
| [M2V_base_output](https://huggingface.co/minishlab/M2V_base_output) | English | Flagship embedding model based on bge-base-en-v1.5 vocab. Uses a subword tokenizer. | Output | [bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | 7.5M |
182191
## Results
183192

184193
### Main Results

0 commit comments

Comments
 (0)