add: ogma model results#525
Conversation
Renames each axiotic__ogma-* sha subdir to the HF revision now pinned by mteb PR #4620 (commit 1887fc06), which switched the wrapper to use OgmaTokenizerFast for canonical special-token handling: ogma-micro: d9d323709a... -> c9a793dacd... ogma-mini: 300b6184ef... -> 580266301b... ogma-small: 9c3f997130... -> 761deba3f4... ogma-base: 7524c6e1b2... -> 6c9cd11d41... Updates revision field in each model_meta.json to match, and harmonizes n_parameters with the mteb wrapper (off-by-N counting was the only delta: ogma-small 8596544 -> 8596352, etc). JSON task scores are unchanged: the encoding fix in mteb produces bit-identical output to the canonical model.embed() API on the new revisions, so existing scores remain the canonical numbers.
Model Results ComparisonReference models: Results for
|
| task_name | axiotic/ogma-base | google/gemini-embedding-001 | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|---|
| AmazonCounterfactualClassification | 0.7025 | 0.9200 | nan | 0.9903 | Bytedance/Seed1.6-embedding-1215 | False |
| AmazonPolarityClassification | 0.7985 | nan | 0.9326 | 0.9774 | nvidia/NV-Embed-v2 | False |
| AmazonReviewsClassification | 0.3943 | nan | nan | 0.6880 | TencentBAC/Conan-embedding-v2 | False |
| ArXivHierarchicalClusteringP2P | 0.5583 | 0.6492 | 0.5569 | 0.6869 | NovaSearch/jasper_en_vision_language_v1 | False |
| ArXivHierarchicalClusteringS2S | 0.5273 | 0.6384 | 0.5367 | 0.6548 | Qwen/Qwen3-Embedding-8B | False |
| ArguAna | 0.45 | 0.8644 | 0.5436 | 0.8979 | voyageai/voyage-3-m-exp | False |
| AskUbuntuDupQuestions | 0.5676 | 0.6424 | 0.5924 | 0.7528 | IEITYuan/Yuan-embedding-2.0-en | False |
| BIOSSES | 0.8415 | 0.8897 | 0.8457 | 0.9692 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| Banking77Classification | 0.7856 | 0.9427 | 0.7492 | 0.9427 | google/gemini-embedding-001 | False |
| BiorxivClusteringP2P | 0.3411 | nan | 0.355 | 0.5522 | TencentBAC/Conan-embedding-v2 | False |
| BiorxivClusteringS2S | 0.2634 | nan | 0.333 | 0.5092 | TencentBAC/Conan-embedding-v2 | False |
| CQADupstackAndroidRetrieval | 0.3728 | nan | 0.4904 | 0.7426 | voyageai/voyage-3-m-exp | False |
| CQADupstackEnglishRetrieval | 0.3491 | nan | 0.4581 | 0.6998 | voyageai/voyage-3-m-exp | False |
| CQADupstackGamingRetrieval | 0.4491 | 0.7068 | 0.587 | 0.8161 | IEITYuan/Yuan-embedding-2.0-en | False |
| CQADupstackGisRetrieval | 0.2965 | nan | 0.3695 | 0.6340 | voyageai/voyage-3-m-exp | False |
| CQADupstackMathematicaRetrieval | 0.249 | nan | 0.2818 | 0.6948 | voyageai/voyage-3-m-exp | False |
| CQADupstackPhysicsRetrieval | 0.3423 | nan | 0.4366 | 0.7371 | voyageai/voyage-3-m-exp | False |
| CQADupstackProgrammersRetrieval | 0.3353 | nan | 0.416 | 0.6587 | voyageai/voyage-3-m-exp | False |
| CQADupstackRetrieval | 0.3105 | nan | 0.3967 | 0.6830 | voyageai/voyage-3-m-exp | False |
| CQADupstackStatsRetrieval | 0.2666 | nan | 0.3238 | 0.6242 | voyageai/voyage-3-m-exp | False |
| CQADupstackTexRetrieval | 0.2177 | nan | 0.2836 | 0.6295 | voyageai/voyage-3-m-exp | False |
| CQADupstackUnixRetrieval | 0.2957 | 0.5369 | 0.3988 | 0.7198 | voyageai/voyage-3-m-exp | False |
| CQADupstackWebmastersRetrieval | 0.3133 | nan | 0.3988 | 0.6835 | voyageai/voyage-3-m-exp | False |
| CQADupstackWordpressRetrieval | 0.2386 | nan | 0.3164 | 0.5862 | voyageai/voyage-3-m-exp | False |
| ClimateFEVER | 0.2851 | nan | 0.2573 | 0.5693 | voyageai/voyage-3-m-exp | False |
| DBPedia | 0.3632 | nan | 0.413 | 0.5350 | nvidia/NV-Embed-v2 | False |
| EmotionClassification | 0.4769 | nan | 0.4758 | 0.9387 | TencentBAC/Conan-embedding-v2 | False |
| FEVER | 0.6027 | nan | 0.8279 | 0.9628 | voyageai/voyage-3-m-exp | False |
| FiQA2018 | 0.3259 | 0.6178 | 0.4381 | 0.8206 | ai-sage/Giga-Embeddings-instruct | False |
| HotpotQA | 0.5243 | nan | 0.7122 | 0.8696 | voyageai/voyage-3-m-exp | False |
| ImdbClassification | 0.7359 | 0.9498 | 0.8867 | 0.9737 | Qwen/Qwen3-Embedding-8B | False |
| MSMARCO | 0.3586 | nan | 0.437 | 0.4812 | TencentBAC/Conan-embedding-v2 | False |
| MTOPDomainClassification | 0.9047 | 0.9926 | 0.9097 | 0.9995 | voyageai/voyage-3-m-exp | False |
| MTOPIntentClassification | 0.6314 | nan | nan | 0.9551 | BAAI/bge-multilingual-gemma2 | False |
| MassiveIntentClassification | 0.6838 | 0.8871 | 0.6843 | 0.9194 | voyageai/voyage-3-m-exp | False |
| MassiveScenarioClassification | 0.7315 | 0.9191 | 0.7146 | 0.9930 | voyageai/voyage-3-m-exp | False |
| MedrxivClusteringP2P | 0.3202 | nan | 0.317 | 0.5153 | voyageai/voyage-3-m-exp | False |
| MedrxivClusteringS2S | 0.2922 | nan | 0.2976 | 0.4969 | TencentBAC/Conan-embedding-v2 | False |
| MindSmallReranking | 0.3062 | 0.3295 | 0.3024 | 0.3437 | Kingsoft-LLM/QZhou-Embedding | False |
| NFCorpus | 0.3035 | nan | 0.3398 | 0.5575 | TencentBAC/Conan-embedding-v2 | False |
| NQ | 0.5071 | nan | 0.6403 | 0.8248 | voyageai/voyage-3-m-exp | False |
| QuoraRetrieval | 0.6088 | nan | 0.8926 | 0.9235 | TencentBAC/Conan-embedding-v2 | False |
| RedditClustering | 0.4467 | nan | 0.4691 | 0.7716 | voyageai/voyage-3-m-exp | False |
| RedditClusteringP2P | 0.5367 | nan | 0.63 | 0.7527 | NovaSearch/stella_en_1.5B_v5 | False |
| SCIDOCS | 0.1637 | 0.2515 | 0.1745 | 0.5986 | IEITYuan/Yuan-embedding-2.0-en | False |
| SICK-R | 0.7981 | 0.8275 | 0.8023 | 0.9465 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS12 | 0.7603 | 0.8155 | 0.8002 | 0.9546 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS13 | 0.8505 | 0.8989 | 0.8155 | 0.9776 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS14 | 0.8097 | 0.8541 | 0.7772 | 0.9753 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS15 | 0.8688 | 0.9044 | 0.8931 | 0.9811 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS16 | 0.833 | nan | 0.8579 | 0.9763 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STSBenchmark | 0.8649 | 0.8908 | 0.8729 | 0.9504 | Kingsoft-LLM/QZhou-Embedding | False |
| SciDocsRR | 0.741 | nan | 0.8422 | 0.9114 | TencentBAC/Conan-embedding-v2 | False |
| SciFact | 0.5942 | nan | 0.702 | 0.8660 | openbmb/MiniCPM-Embedding | False |
| SprintDuplicateQuestions | 0.9491 | 0.9690 | 0.9314 | 0.9838 | Kingsoft-LLM/QZhou-Embedding | False |
| StackExchangeClustering | 0.5204 | nan | 0.5837 | 0.8395 | TencentBAC/Conan-embedding-v2 | False |
| StackExchangeClusteringP2P | 0.3414 | nan | 0.329 | 0.5157 | TencentBAC/Conan-embedding-v2 | False |
| StackOverflowDupQuestions | 0.4353 | nan | 0.5014 | 0.5904 | Qwen/Qwen3-Embedding-8B | False |
| SummEval | 0.2973 | nan | 0.2964 | 0.3360 | bigscience/sgpt-bloom-7b1-msmarco | False |
| TRECCOVID | 0.6701 | 0.8631 | 0.7115 | 0.9833 | IEITYuan/Yuan-embedding-2.0-en | False |
| Touche2020 | 0.2858 | nan | 0.2313 | 0.3939 | voyageai/voyage-3-m-exp | False |
| ToxicConversationsClassification | 0.6623 | 0.8875 | 0.6601 | 0.9759 | voyageai/voyage-3-m-exp | False |
| TweetSentimentExtractionClassification | 0.6204 | 0.6988 | 0.628 | 0.8823 | voyageai/voyage-3-m-exp | False |
| TwentyNewsgroupsClustering | 0.4163 | nan | 0.394 | 0.8349 | voyageai/voyage-3-m-exp | False |
| TwitterSemEval2015 | 0.7079 | 0.7917 | 0.7528 | 0.8946 | voyageai/voyage-large-2-instruct | False |
| TwitterURLCorpus | 0.855 | 0.8705 | 0.8583 | 0.9571 | TencentBAC/Conan-embedding-v2 | False |
| Average | 0.5191 | 0.7861 | 0.5661 | 0.7736 | nan | - |
Results for axiotic/ogma-micro
| task_name | axiotic/ogma-micro | google/gemini-embedding-001 | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|---|
| AmazonCounterfactualClassification | 0.6485 | 0.9289 | nan | 0.9893 | Bytedance/Seed1.6-embedding-1215 | False |
| AmazonPolarityClassification | 0.6763 | nan | 0.9326 | 0.9774 | nvidia/NV-Embed-v2 | False |
| AmazonReviewsClassification | 0.3523 | nan | nan | 0.6880 | TencentBAC/Conan-embedding-v2 | False |
| ArXivHierarchicalClusteringP2P | 0.5505 | 0.6492 | 0.5569 | 0.6869 | NovaSearch/jasper_en_vision_language_v1 | False |
| ArXivHierarchicalClusteringS2S | 0.5036 | 0.6384 | 0.5367 | 0.6548 | Qwen/Qwen3-Embedding-8B | False |
| ArguAna | 0.4194 | 0.8644 | 0.5436 | 0.8979 | voyageai/voyage-3-m-exp | False |
| AskUbuntuDupQuestions | 0.5594 | 0.6424 | 0.5924 | 0.7528 | IEITYuan/Yuan-embedding-2.0-en | False |
| BIOSSES | 0.7885 | 0.8897 | 0.8457 | 0.9692 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| Banking77Classification | 0.7003 | 0.9427 | 0.7492 | 0.9427 | google/gemini-embedding-001 | False |
| BiorxivClusteringP2P | 0.3105 | nan | 0.355 | 0.5522 | TencentBAC/Conan-embedding-v2 | False |
| BiorxivClusteringS2S | 0.202 | nan | 0.333 | 0.5092 | TencentBAC/Conan-embedding-v2 | False |
| CQADupstackAndroidRetrieval | 0.2614 | nan | 0.4904 | 0.7426 | voyageai/voyage-3-m-exp | False |
| CQADupstackEnglishRetrieval | 0.1982 | nan | 0.4581 | 0.6998 | voyageai/voyage-3-m-exp | False |
| CQADupstackGamingRetrieval | 0.3592 | 0.7068 | 0.587 | 0.8161 | IEITYuan/Yuan-embedding-2.0-en | False |
| CQADupstackGisRetrieval | 0.213 | nan | 0.3695 | 0.6340 | voyageai/voyage-3-m-exp | False |
| CQADupstackMathematicaRetrieval | 0.1454 | nan | 0.2818 | 0.6948 | voyageai/voyage-3-m-exp | False |
| CQADupstackPhysicsRetrieval | 0.2806 | nan | 0.4366 | 0.7371 | voyageai/voyage-3-m-exp | False |
| CQADupstackProgrammersRetrieval | 0.2433 | nan | 0.416 | 0.6587 | voyageai/voyage-3-m-exp | False |
| CQADupstackRetrieval | 0.2234 | nan | 0.3967 | 0.6830 | voyageai/voyage-3-m-exp | False |
| CQADupstackStatsRetrieval | 0.2158 | nan | 0.3238 | 0.6242 | voyageai/voyage-3-m-exp | False |
| CQADupstackTexRetrieval | 0.1504 | nan | 0.2836 | 0.6295 | voyageai/voyage-3-m-exp | False |
| CQADupstackUnixRetrieval | 0.2012 | 0.5369 | 0.3988 | 0.7198 | voyageai/voyage-3-m-exp | False |
| CQADupstackWebmastersRetrieval | 0.2343 | nan | 0.3988 | 0.6835 | voyageai/voyage-3-m-exp | False |
| CQADupstackWordpressRetrieval | 0.1779 | nan | 0.3164 | 0.5862 | voyageai/voyage-3-m-exp | False |
| ClimateFEVER | 0.206 | nan | 0.2573 | 0.5693 | voyageai/voyage-3-m-exp | False |
| DBPedia | 0.2727 | nan | 0.413 | 0.5350 | nvidia/NV-Embed-v2 | False |
| EmotionClassification | 0.3598 | nan | 0.4758 | 0.9387 | TencentBAC/Conan-embedding-v2 | False |
| FEVER | 0.6289 | nan | 0.8279 | 0.9628 | voyageai/voyage-3-m-exp | False |
| FiQA2018 | 0.1779 | 0.6178 | 0.4381 | 0.8206 | ai-sage/Giga-Embeddings-instruct | False |
| HotpotQA | 0.3875 | nan | 0.7122 | 0.8696 | voyageai/voyage-3-m-exp | False |
| ImdbClassification | 0.6525 | 0.9498 | 0.8867 | 0.9737 | Qwen/Qwen3-Embedding-8B | False |
| MSMARCO | 0.2178 | nan | 0.437 | 0.4812 | TencentBAC/Conan-embedding-v2 | False |
| MTOPDomainClassification | 0.8345 | 0.9927 | 0.9097 | 0.9995 | voyageai/voyage-3-m-exp | False |
| MTOPIntentClassification | 0.5172 | nan | nan | 0.9551 | BAAI/bge-multilingual-gemma2 | False |
| MassiveIntentClassification | 0.5875 | 0.8846 | 0.6804 | 0.9194 | voyageai/voyage-3-m-exp | False |
| MassiveScenarioClassification | 0.6684 | 0.9208 | 0.7178 | 0.9930 | voyageai/voyage-3-m-exp | False |
| MedrxivClusteringP2P | 0.3043 | nan | 0.317 | 0.5153 | voyageai/voyage-3-m-exp | False |
| MedrxivClusteringS2S | 0.2515 | nan | 0.2976 | 0.4969 | TencentBAC/Conan-embedding-v2 | False |
| MindSmallReranking | 0.301 | 0.3295 | 0.3024 | 0.3437 | Kingsoft-LLM/QZhou-Embedding | False |
| NFCorpus | 0.2383 | nan | 0.3398 | 0.5575 | TencentBAC/Conan-embedding-v2 | False |
| NQ | 0.2935 | nan | 0.6403 | 0.8248 | voyageai/voyage-3-m-exp | False |
| QuoraRetrieval | 0.4713 | nan | 0.8926 | 0.9235 | TencentBAC/Conan-embedding-v2 | False |
| RedditClustering | 0.3783 | nan | 0.4691 | 0.7716 | voyageai/voyage-3-m-exp | False |
| RedditClusteringP2P | 0.4691 | nan | 0.63 | 0.7527 | NovaSearch/stella_en_1.5B_v5 | False |
| SCIDOCS | 0.1197 | 0.2515 | 0.1745 | 0.5986 | IEITYuan/Yuan-embedding-2.0-en | False |
| SICK-R | 0.6997 | 0.8275 | 0.8023 | 0.9465 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS12 | 0.6762 | 0.8155 | 0.8002 | 0.9546 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS13 | 0.7693 | 0.8989 | 0.8155 | 0.9776 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS14 | 0.7441 | 0.8541 | 0.7772 | 0.9753 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS15 | 0.8184 | 0.9044 | 0.8931 | 0.9811 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS16 | 0.7759 | nan | 0.8579 | 0.9763 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STSBenchmark | 0.7782 | 0.8908 | 0.8729 | 0.9504 | Kingsoft-LLM/QZhou-Embedding | False |
| SciDocsRR | 0.7162 | nan | 0.8422 | 0.9114 | TencentBAC/Conan-embedding-v2 | False |
| SciFact | 0.4796 | nan | 0.702 | 0.8660 | openbmb/MiniCPM-Embedding | False |
| SprintDuplicateQuestions | 0.9348 | 0.9690 | 0.9314 | 0.9838 | Kingsoft-LLM/QZhou-Embedding | False |
| StackExchangeClustering | 0.4363 | nan | 0.5837 | 0.8395 | TencentBAC/Conan-embedding-v2 | False |
| StackExchangeClusteringP2P | 0.3344 | nan | 0.329 | 0.5157 | TencentBAC/Conan-embedding-v2 | False |
| StackOverflowDupQuestions | 0.4129 | nan | 0.5014 | 0.5904 | Qwen/Qwen3-Embedding-8B | False |
| SummEval | 0.3177 | nan | 0.2964 | 0.3360 | bigscience/sgpt-bloom-7b1-msmarco | False |
| TRECCOVID | 0.5952 | 0.8631 | 0.7115 | 0.9833 | IEITYuan/Yuan-embedding-2.0-en | False |
| Touche2020 | 0.2328 | nan | 0.2313 | 0.3939 | voyageai/voyage-3-m-exp | False |
| ToxicConversationsClassification | 0.6013 | 0.8875 | 0.6601 | 0.9759 | voyageai/voyage-3-m-exp | False |
| TweetSentimentExtractionClassification | 0.5453 | 0.6988 | 0.628 | 0.8823 | voyageai/voyage-3-m-exp | False |
| TwentyNewsgroupsClustering | 0.3159 | nan | 0.394 | 0.8349 | voyageai/voyage-3-m-exp | False |
| TwitterSemEval2015 | 0.6003 | 0.7917 | 0.7528 | 0.8946 | voyageai/voyage-large-2-instruct | False |
| TwitterURLCorpus | 0.8236 | 0.8705 | 0.8583 | 0.9571 | TencentBAC/Conan-embedding-v2 | False |
| Average | 0.4479 | 0.7864 | 0.5661 | 0.7736 | nan | - |
Results for axiotic/ogma-mini
| task_name | axiotic/ogma-mini | google/gemini-embedding-001 | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|---|
| AmazonCounterfactualClassification | 0.6501 | 0.9200 | nan | 0.9903 | Bytedance/Seed1.6-embedding-1215 | False |
| AmazonPolarityClassification | 0.7044 | nan | 0.9326 | 0.9774 | nvidia/NV-Embed-v2 | False |
| AmazonReviewsClassification | 0.3722 | nan | nan | 0.6880 | TencentBAC/Conan-embedding-v2 | False |
| ArXivHierarchicalClusteringP2P | 0.5434 | 0.6492 | 0.5569 | 0.6869 | NovaSearch/jasper_en_vision_language_v1 | False |
| ArXivHierarchicalClusteringS2S | 0.4988 | 0.6384 | 0.5367 | 0.6548 | Qwen/Qwen3-Embedding-8B | False |
| ArguAna | 0.4072 | 0.8644 | 0.5436 | 0.8979 | voyageai/voyage-3-m-exp | False |
| AskUbuntuDupQuestions | 0.5213 | 0.6424 | 0.5924 | 0.7528 | IEITYuan/Yuan-embedding-2.0-en | False |
| BIOSSES | 0.8 | 0.8897 | 0.8457 | 0.9692 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| Banking77Classification | 0.7293 | 0.9427 | 0.7492 | 0.9427 | google/gemini-embedding-001 | False |
| BiorxivClusteringP2P | 0.3011 | nan | 0.355 | 0.5522 | TencentBAC/Conan-embedding-v2 | False |
| BiorxivClusteringS2S | 0.2036 | nan | 0.333 | 0.5092 | TencentBAC/Conan-embedding-v2 | False |
| CQADupstackAndroidRetrieval | 0.3175 | nan | 0.4904 | 0.7426 | voyageai/voyage-3-m-exp | False |
| CQADupstackEnglishRetrieval | 0.211 | nan | 0.4581 | 0.6998 | voyageai/voyage-3-m-exp | False |
| CQADupstackGamingRetrieval | 0.4044 | 0.7068 | 0.587 | 0.8161 | IEITYuan/Yuan-embedding-2.0-en | False |
| CQADupstackGisRetrieval | 0.2396 | nan | 0.3695 | 0.6340 | voyageai/voyage-3-m-exp | False |
| CQADupstackMathematicaRetrieval | 0.1698 | nan | 0.2818 | 0.6948 | voyageai/voyage-3-m-exp | False |
| CQADupstackPhysicsRetrieval | 0.2932 | nan | 0.4366 | 0.7371 | voyageai/voyage-3-m-exp | False |
| CQADupstackProgrammersRetrieval | 0.2728 | nan | 0.416 | 0.6587 | voyageai/voyage-3-m-exp | False |
| CQADupstackRetrieval | 0.2482 | nan | 0.3967 | 0.6830 | voyageai/voyage-3-m-exp | False |
| CQADupstackStatsRetrieval | 0.2259 | nan | 0.3238 | 0.6242 | voyageai/voyage-3-m-exp | False |
| CQADupstackTexRetrieval | 0.1705 | nan | 0.2836 | 0.6295 | voyageai/voyage-3-m-exp | False |
| CQADupstackUnixRetrieval | 0.2314 | 0.5369 | 0.3988 | 0.7198 | voyageai/voyage-3-m-exp | False |
| CQADupstackWebmastersRetrieval | 0.2497 | nan | 0.3988 | 0.6835 | voyageai/voyage-3-m-exp | False |
| CQADupstackWordpressRetrieval | 0.193 | nan | 0.3164 | 0.5862 | voyageai/voyage-3-m-exp | False |
| ClimateFEVER | 0.2461 | nan | 0.2573 | 0.5693 | voyageai/voyage-3-m-exp | False |
| DBPedia | 0.2958 | nan | 0.413 | 0.5350 | nvidia/NV-Embed-v2 | False |
| EmotionClassification | 0.3907 | nan | 0.4758 | 0.9387 | TencentBAC/Conan-embedding-v2 | False |
| FEVER | 0.6983 | nan | 0.8279 | 0.9628 | voyageai/voyage-3-m-exp | False |
| FiQA2018 | 0.2072 | 0.6178 | 0.4381 | 0.8206 | ai-sage/Giga-Embeddings-instruct | False |
| HotpotQA | 0.4357 | nan | 0.7122 | 0.8696 | voyageai/voyage-3-m-exp | False |
| ImdbClassification | 0.6725 | 0.9498 | 0.8867 | 0.9737 | Qwen/Qwen3-Embedding-8B | False |
| MSMARCO | 0.2573 | nan | 0.437 | 0.4812 | TencentBAC/Conan-embedding-v2 | False |
| MTOPDomainClassification | 0.8533 | 0.9926 | 0.9097 | 0.9995 | voyageai/voyage-3-m-exp | False |
| MTOPIntentClassification | 0.5477 | nan | nan | 0.9551 | BAAI/bge-multilingual-gemma2 | False |
| MassiveIntentClassification | 0.6104 | 0.8871 | 0.6843 | 0.9194 | voyageai/voyage-3-m-exp | False |
| MassiveScenarioClassification | 0.6948 | 0.9191 | 0.7146 | 0.9930 | voyageai/voyage-3-m-exp | False |
| MedrxivClusteringP2P | 0.3088 | nan | 0.317 | 0.5153 | voyageai/voyage-3-m-exp | False |
| MedrxivClusteringS2S | 0.2535 | nan | 0.2976 | 0.4969 | TencentBAC/Conan-embedding-v2 | False |
| MindSmallReranking | 0.2968 | 0.3295 | 0.3024 | 0.3437 | Kingsoft-LLM/QZhou-Embedding | False |
| NFCorpus | 0.2554 | nan | 0.3398 | 0.5575 | TencentBAC/Conan-embedding-v2 | False |
| NQ | 0.3393 | nan | 0.6403 | 0.8248 | voyageai/voyage-3-m-exp | False |
| QuoraRetrieval | 0.5177 | nan | 0.8926 | 0.9235 | TencentBAC/Conan-embedding-v2 | False |
| RedditClustering | 0.395 | nan | 0.4691 | 0.7716 | voyageai/voyage-3-m-exp | False |
| RedditClusteringP2P | 0.4908 | nan | 0.63 | 0.7527 | NovaSearch/stella_en_1.5B_v5 | False |
| SCIDOCS | 0.138 | 0.2515 | 0.1745 | 0.5986 | IEITYuan/Yuan-embedding-2.0-en | False |
| SICK-R | 0.7183 | 0.8275 | 0.8023 | 0.9465 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS12 | 0.7193 | 0.8155 | 0.8002 | 0.9546 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS13 | 0.7927 | 0.8989 | 0.8155 | 0.9776 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS14 | 0.7596 | 0.8541 | 0.7772 | 0.9753 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS15 | 0.8306 | 0.9044 | 0.8931 | 0.9811 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS16 | 0.7904 | nan | 0.8579 | 0.9763 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STSBenchmark | 0.8057 | 0.8908 | 0.8729 | 0.9504 | Kingsoft-LLM/QZhou-Embedding | False |
| SciDocsRR | 0.6965 | nan | 0.8422 | 0.9114 | TencentBAC/Conan-embedding-v2 | False |
| SciFact | 0.5301 | nan | 0.702 | 0.8660 | openbmb/MiniCPM-Embedding | False |
| SprintDuplicateQuestions | 0.9496 | 0.9690 | 0.9314 | 0.9838 | Kingsoft-LLM/QZhou-Embedding | False |
| StackExchangeClustering | 0.4464 | nan | 0.5837 | 0.8395 | TencentBAC/Conan-embedding-v2 | False |
| StackExchangeClusteringP2P | 0.3341 | nan | 0.329 | 0.5157 | TencentBAC/Conan-embedding-v2 | False |
| StackOverflowDupQuestions | 0.3813 | nan | 0.5014 | 0.5904 | Qwen/Qwen3-Embedding-8B | False |
| SummEval | 0.3133 | nan | 0.2964 | 0.3360 | bigscience/sgpt-bloom-7b1-msmarco | False |
| TRECCOVID | 0.6142 | 0.8631 | 0.7115 | 0.9833 | IEITYuan/Yuan-embedding-2.0-en | False |
| Touche2020 | 0.2409 | nan | 0.2313 | 0.3939 | voyageai/voyage-3-m-exp | False |
| ToxicConversationsClassification | 0.6123 | 0.8875 | 0.6601 | 0.9759 | voyageai/voyage-3-m-exp | False |
| TweetSentimentExtractionClassification | 0.5713 | 0.6988 | 0.628 | 0.8823 | voyageai/voyage-3-m-exp | False |
| TwentyNewsgroupsClustering | 0.3364 | nan | 0.394 | 0.8349 | voyageai/voyage-3-m-exp | False |
| TwitterSemEval2015 | 0.6068 | 0.7917 | 0.7528 | 0.8946 | voyageai/voyage-large-2-instruct | False |
| TwitterURLCorpus | 0.8333 | 0.8705 | 0.8583 | 0.9571 | TencentBAC/Conan-embedding-v2 | False |
| Average | 0.4659 | 0.7861 | 0.5661 | 0.7736 | nan | - |
Results for axiotic/ogma-small
| task_name | axiotic/ogma-small | google/gemini-embedding-001 | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|---|
| AmazonCounterfactualClassification | 0.6964 | 0.9289 | nan | 0.9893 | Bytedance/Seed1.6-embedding-1215 | False |
| AmazonPolarityClassification | 0.7672 | nan | 0.9326 | 0.9774 | nvidia/NV-Embed-v2 | False |
| AmazonReviewsClassification | 0.39 | nan | nan | 0.6880 | TencentBAC/Conan-embedding-v2 | False |
| ArXivHierarchicalClusteringP2P | 0.5554 | 0.6492 | 0.5569 | 0.6869 | NovaSearch/jasper_en_vision_language_v1 | False |
| ArXivHierarchicalClusteringS2S | 0.5212 | 0.6384 | 0.5367 | 0.6548 | Qwen/Qwen3-Embedding-8B | False |
| ArguAna | 0.4232 | 0.8644 | 0.5436 | 0.8979 | voyageai/voyage-3-m-exp | False |
| AskUbuntuDupQuestions | 0.5508 | 0.6424 | 0.5924 | 0.7528 | IEITYuan/Yuan-embedding-2.0-en | False |
| BIOSSES | 0.8381 | 0.8897 | 0.8457 | 0.9692 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| Banking77Classification | 0.7738 | 0.9427 | 0.7492 | 0.9427 | google/gemini-embedding-001 | False |
| BiorxivClusteringP2P | 0.3328 | nan | 0.355 | 0.5522 | TencentBAC/Conan-embedding-v2 | False |
| BiorxivClusteringS2S | 0.2541 | nan | 0.333 | 0.5092 | TencentBAC/Conan-embedding-v2 | False |
| CQADupstackAndroidRetrieval | 0.368 | nan | 0.4904 | 0.7426 | voyageai/voyage-3-m-exp | False |
| CQADupstackEnglishRetrieval | 0.3301 | nan | 0.4581 | 0.6998 | voyageai/voyage-3-m-exp | False |
| CQADupstackGamingRetrieval | 0.4501 | 0.7068 | 0.587 | 0.8161 | IEITYuan/Yuan-embedding-2.0-en | False |
| CQADupstackGisRetrieval | 0.2819 | nan | 0.3695 | 0.6340 | voyageai/voyage-3-m-exp | False |
| CQADupstackMathematicaRetrieval | 0.2165 | nan | 0.2818 | 0.6948 | voyageai/voyage-3-m-exp | False |
| CQADupstackPhysicsRetrieval | 0.3383 | nan | 0.4366 | 0.7371 | voyageai/voyage-3-m-exp | False |
| CQADupstackProgrammersRetrieval | 0.3307 | nan | 0.416 | 0.6587 | voyageai/voyage-3-m-exp | False |
| CQADupstackRetrieval | 0.3004 | nan | 0.3967 | 0.6830 | voyageai/voyage-3-m-exp | False |
| CQADupstackStatsRetrieval | 0.263 | nan | 0.3238 | 0.6242 | voyageai/voyage-3-m-exp | False |
| CQADupstackTexRetrieval | 0.2056 | nan | 0.2836 | 0.6295 | voyageai/voyage-3-m-exp | False |
| CQADupstackUnixRetrieval | 0.2873 | 0.5369 | 0.3988 | 0.7198 | voyageai/voyage-3-m-exp | False |
| CQADupstackWebmastersRetrieval | 0.3025 | nan | 0.3988 | 0.6835 | voyageai/voyage-3-m-exp | False |
| CQADupstackWordpressRetrieval | 0.2309 | nan | 0.3164 | 0.5862 | voyageai/voyage-3-m-exp | False |
| ClimateFEVER | 0.2861 | nan | 0.2573 | 0.5693 | voyageai/voyage-3-m-exp | False |
| DBPedia | 0.3594 | nan | 0.413 | 0.5350 | nvidia/NV-Embed-v2 | False |
| EmotionClassification | 0.4522 | nan | 0.4758 | 0.9387 | TencentBAC/Conan-embedding-v2 | False |
| FEVER | 0.688 | nan | 0.8279 | 0.9628 | voyageai/voyage-3-m-exp | False |
| FiQA2018 | 0.3005 | 0.6178 | 0.4381 | 0.8206 | ai-sage/Giga-Embeddings-instruct | False |
| HotpotQA | 0.5157 | nan | 0.7122 | 0.8696 | voyageai/voyage-3-m-exp | False |
| ImdbClassification | 0.7249 | 0.9498 | 0.8867 | 0.9737 | Qwen/Qwen3-Embedding-8B | False |
| MSMARCO | 0.3431 | nan | 0.437 | 0.4812 | TencentBAC/Conan-embedding-v2 | False |
| MTOPDomainClassification | 0.9065 | 0.9927 | 0.9097 | 0.9995 | voyageai/voyage-3-m-exp | False |
| MTOPIntentClassification | 0.6081 | nan | nan | 0.9551 | BAAI/bge-multilingual-gemma2 | False |
| MassiveIntentClassification | 0.6636 | 0.8846 | 0.6804 | 0.9194 | voyageai/voyage-3-m-exp | False |
| MassiveScenarioClassification | 0.7278 | 0.9208 | 0.7178 | 0.9930 | voyageai/voyage-3-m-exp | False |
| MedrxivClusteringP2P | 0.3196 | nan | 0.317 | 0.5153 | voyageai/voyage-3-m-exp | False |
| MedrxivClusteringS2S | 0.2859 | nan | 0.2976 | 0.4969 | TencentBAC/Conan-embedding-v2 | False |
| MindSmallReranking | 0.3055 | 0.3295 | 0.3024 | 0.3437 | Kingsoft-LLM/QZhou-Embedding | False |
| NFCorpus | 0.3012 | nan | 0.3398 | 0.5575 | TencentBAC/Conan-embedding-v2 | False |
| NQ | 0.4672 | nan | 0.6403 | 0.8248 | voyageai/voyage-3-m-exp | False |
| QuoraRetrieval | 0.6055 | nan | 0.8926 | 0.9235 | TencentBAC/Conan-embedding-v2 | False |
| RedditClustering | 0.4394 | nan | 0.4691 | 0.7716 | voyageai/voyage-3-m-exp | False |
| RedditClusteringP2P | 0.526 | nan | 0.63 | 0.7527 | NovaSearch/stella_en_1.5B_v5 | False |
| SCIDOCS | 0.1587 | 0.2515 | 0.1745 | 0.5986 | IEITYuan/Yuan-embedding-2.0-en | False |
| SICK-R | 0.7875 | 0.8275 | 0.8023 | 0.9465 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS12 | 0.7562 | 0.8155 | 0.8002 | 0.9546 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS13 | 0.8404 | 0.8989 | 0.8155 | 0.9776 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS14 | 0.7994 | 0.8541 | 0.7772 | 0.9753 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS15 | 0.8577 | 0.9044 | 0.8931 | 0.9811 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS16 | 0.8252 | nan | 0.8579 | 0.9763 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STSBenchmark | 0.8554 | 0.8908 | 0.8729 | 0.9504 | Kingsoft-LLM/QZhou-Embedding | False |
| SciDocsRR | 0.7355 | nan | 0.8422 | 0.9114 | TencentBAC/Conan-embedding-v2 | False |
| SciFact | 0.6004 | nan | 0.702 | 0.8660 | openbmb/MiniCPM-Embedding | False |
| SprintDuplicateQuestions | 0.953 | 0.9690 | 0.9314 | 0.9838 | Kingsoft-LLM/QZhou-Embedding | False |
| StackExchangeClustering | 0.5022 | nan | 0.5837 | 0.8395 | TencentBAC/Conan-embedding-v2 | False |
| StackExchangeClusteringP2P | 0.3408 | nan | 0.329 | 0.5157 | TencentBAC/Conan-embedding-v2 | False |
| StackOverflowDupQuestions | 0.4285 | nan | 0.5014 | 0.5904 | Qwen/Qwen3-Embedding-8B | False |
| SummEval | 0.2959 | nan | 0.2964 | 0.3360 | bigscience/sgpt-bloom-7b1-msmarco | False |
| TRECCOVID | 0.6905 | 0.8631 | 0.7115 | 0.9833 | IEITYuan/Yuan-embedding-2.0-en | False |
| Touche2020 | 0.2676 | nan | 0.2313 | 0.3939 | voyageai/voyage-3-m-exp | False |
| ToxicConversationsClassification | 0.6558 | 0.8875 | 0.6601 | 0.9759 | voyageai/voyage-3-m-exp | False |
| TweetSentimentExtractionClassification | 0.6119 | 0.6988 | 0.628 | 0.8823 | voyageai/voyage-3-m-exp | False |
| TwentyNewsgroupsClustering | 0.3986 | nan | 0.394 | 0.8349 | voyageai/voyage-3-m-exp | False |
| TwitterSemEval2015 | 0.6849 | 0.7917 | 0.7528 | 0.8946 | voyageai/voyage-large-2-instruct | False |
| TwitterURLCorpus | 0.8494 | 0.8705 | 0.8583 | 0.9571 | TencentBAC/Conan-embedding-v2 | False |
| Average | 0.5111 | 0.7864 | 0.5661 | 0.7736 | nan | - |
Note: Content truncated due to GitHub API limits. See the full report in the workflow artifacts.
|
@sam-at-axiotic I tried to run your model, but got an error Seems your model don't have |
|
Thanks again @Samoed. The mteb fix is in embeddings-benchmark/mteb#4670. OgmaWrapper now inherits from AbsEncoder, which provides the default similarity / similarity_pairwise driven by ModelMeta.similarity_fn_name (cosine for all Ogma models). Verified locally with |
|
@sam-at-axiotic I tried to run |
|
Hi @Samoed, I have reproduced from a clean checkout (mteb 2.12.30 + PR #4670 applied as written, axiotic/ogma-base @ 6c9cd11d, CPU, torch 2.12, transformers 5.8.1): ArguAna nDCG@10 = 0.48 via mteb.evaluate(meta, get_task("ArguAna")). Matches our score jsons. Can you share pip freeze, your torch/transformers versions, and confirm your ~/.cache/huggingface/hub/models--axiotic--ogma-base/snapshots/ only contains 6c9cd11d…? Also worth rm -rf ~/.cache/mteb and re-running — 0.296 isn't reachable by any (task-token × self-mask × dataset) combination we tested. Many thanks! |
|
Yes, I rerun again and got |
OgmaWrapper was declared as a bare class, so isinstance(model, EncoderProtocol) returns False and RetrievalEvaluator rejects it with TypeError. Inheriting from AbsEncoder picks up the default similarity / similarity_pairwise implementations driven by ModelMeta.similarity_fn_name (already set to COSINE for all Ogma models). No change to encoding behaviour. Reported by @Samoed on embeddings-benchmark/results#525.
Adds MTEB result JSONs for the Axiotic Ogma models.
AI assistance disclosure: initial file organization was prepared with AI assistance; I reviewed the submitted files and take responsibility for this PR.
I confirm the result-submission integrity declaration is made by me as the human submitter.
Checklist
mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here