More F2LLM-v2 results#527
Conversation
Model Results ComparisonReference models: Results for
|
| task_name | codefuse-ai/F2LLM-v2-0.6B | google/gemini-embedding-001 | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|---|
| AILACasedocs | 0.4109 | 0.4833 | 0.2643 | 0.6560 | Octen/Octen-Embedding-8B-INT8 | False |
| ARCChallenge | 0.1862 | nan | 0.1083 | 0.2668 | GritLM/GritLM-7B | False |
| AlphaNLI | 0.2736 | nan | 0.1359 | 0.4393 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| AmazonPolarityClassification | 0.9675 | nan | 0.9326 | 0.9774 | nvidia/NV-Embed-v2 | True |
| AmazonReviewsClassification | 0.5718 | nan | 0.4312 | 0.6880 | TencentBAC/Conan-embedding-v2 | False |
| ArxivClusteringP2P | 0.5396 | nan | 0.4473 | 0.6092 | TencentBAC/Conan-embedding-v2 | True |
| ArxivClusteringS2S | 0.4805 | nan | 0.3871 | 0.5520 | TencentBAC/Conan-embedding-v2 | True |
| BiorxivClusteringP2P | 0.5862 | nan | 0.355 | 0.5522 | TencentBAC/Conan-embedding-v2 | True |
| BiorxivClusteringS2S | 0.5363 | nan | 0.333 | 0.5092 | TencentBAC/Conan-embedding-v2 | True |
| BrightRetrieval | 0.1326 | nan | nan | 0.2720 | ByteDance-Seed/Seed1.5-Embedding | False |
| BuiltBenchClusteringP2P | 0.6025 | nan | 0.4869 | 0.6767 | Alibaba-NLP/gte-Qwen2-1.5B-instruct | False |
| BuiltBenchClusteringS2S | 0.4772 | nan | 0.3909 | 0.5766 | Salesforce/SFR-Embedding-2_R | False |
| BuiltBenchReranking | 0.6268 | nan | 0.6236 | 0.7653 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| BuiltBenchRetrieval | 0.6487 | nan | 0.6308 | 0.7687 | Linq-AI-Research/Linq-Embed-Mistral | False |
| CQADupstackAndroidRetrieval | 0.5421 | nan | 0.4904 | 0.7426 | voyageai/voyage-3-m-exp | False |
| CQADupstackEnglishRetrieval | 0.5125 | nan | 0.4581 | 0.6998 | voyageai/voyage-3-m-exp | False |
| CQADupstackGisRetrieval | 0.4365 | nan | 0.3695 | 0.6340 | voyageai/voyage-3-m-exp | False |
| CQADupstackMathematicaRetrieval | 0.3691 | nan | 0.2818 | 0.6948 | voyageai/voyage-3-m-exp | False |
| CQADupstackPhysicsRetrieval | 0.5350 | nan | 0.4366 | 0.7371 | voyageai/voyage-3-m-exp | False |
| CQADupstackProgrammersRetrieval | 0.4723 | nan | 0.416 | 0.6587 | voyageai/voyage-3-m-exp | False |
| CQADupstackRetrieval | 0.4573 | nan | 0.3967 | 0.6830 | voyageai/voyage-3-m-exp | False |
| CQADupstackStatsRetrieval | 0.4061 | nan | 0.3238 | 0.6242 | voyageai/voyage-3-m-exp | False |
| CQADupstackTexRetrieval | 0.3384 | nan | 0.2836 | 0.6295 | voyageai/voyage-3-m-exp | False |
| CQADupstackWebmastersRetrieval | 0.4398 | nan | 0.3988 | 0.6835 | voyageai/voyage-3-m-exp | False |
| CQADupstackWordpressRetrieval | 0.3654 | nan | 0.3164 | 0.5862 | voyageai/voyage-3-m-exp | False |
| ChatDoctorRetrieval | 0.7649 | 0.7352 | 0.5687 | 0.7722 | voyageai/voyage-4-large (embed_dim=2048) | False |
| ChemHotpotQARetrieval | 0.8069 | nan | 0.7979 | 0.9531 | infly/inf-retriever-v1 | False |
| ChemNQRetrieval | 0.5939 | nan | 0.6617 | 0.7046 | intfloat/multilingual-e5-small | False |
| ClimateFEVER | 0.4162 | nan | 0.2573 | 0.5693 | voyageai/voyage-3-m-exp | False |
| DBPedia | 0.4142 | nan | 0.413 | 0.5350 | nvidia/NV-Embed-v2 | True |
| DS1000Retrieval | 0.6507 | 0.6870 | nan | 0.7149 | google/gemini-embedding-2-preview | False |
| EmotionClassification | 0.9216 | nan | 0.4758 | 0.9387 | TencentBAC/Conan-embedding-v2 | True |
| FEVER | 0.9075 | nan | 0.8279 | 0.9628 | voyageai/voyage-3-m-exp | True |
| FinQARetrieval | 0.5181 | 0.6464 | nan | 0.8897 | voyageai/voyage-4-large (embed_dim=2048) | False |
| FinanceBenchRetrieval | 0.7673 | 0.9157 | nan | 0.9459 | Octen/Octen-Embedding-8B | False |
| FreshStackRetrieval | 0.3519 | 0.3979 | 0.2519 | 0.5776 | Octen/Octen-Embedding-8B | False |
| GerDaLIRSmall | 0.3084 | nan | 0.1572 | 0.5944 | mteb/baseline-bm25s | False |
| HC3FinanceRetrieval | 0.6081 | 0.7758 | nan | 0.8242 | nvidia/NV-Embed-v2 | False |
| HellaSwag | 0.2999 | nan | 0.2735 | 0.3966 | infly/inf-retriever-v1 | False |
| HotpotQA | 0.6522 | nan | 0.7122 | 0.8696 | voyageai/voyage-3-m-exp | True |
| HumanEvalRetrieval | 0.9623 | 0.9910 | nan | 1.0000 | google/gemini-embedding-2-preview | False |
| LEMBNarrativeQARetrieval | 0.5195 | nan | 0.2422 | 0.7690 | lightonai/GTE-ModernColBERT-v1 | False |
| LEMBNeedleRetrieval | 0.5875 | nan | 0.28 | 0.9325 | mteb/baseline-bm25s | False |
| LEMBQMSumRetrieval | 0.4522 | nan | 0.2426 | 0.8323 | mteb/baseline-bm25s | False |
| LEMBSummScreenFDRetrieval | 0.9691 | nan | 0.7112 | 0.9784 | mteb/baseline-bm25s | False |
| LEMBWikimQARetrieval | 0.8976 | nan | 0.568 | 0.9988 | lightonai/GTE-ModernColBERT-v1 | False |
| LeCaRDv2 | 0.7076 | nan | 0.5583 | 0.7777 | Mira190/Euler-Legal-Embedding-V1 | False |
| LegalBenchConsumerContractsQA | 0.7601 | nan | 0.733 | 0.8675 | voyageai/voyage-3 | False |
| LegalSummarization | 0.6391 | 0.7122 | 0.621 | 0.7921 | voyageai/voyage-3.5 | False |
| MBPPRetrieval | 0.8848 | 0.9416 | nan | 0.9608 | voyageai/voyage-4-large (embed_dim=2048) | False |
| MSMARCO | 0.4134 | nan | 0.437 | 0.4812 | TencentBAC/Conan-embedding-v2 | True |
| MTOPIntentClassification | 0.9379 | nan | 0.672 | 0.9429 | BAAI/bge-multilingual-gemma2 | True |
| MedrxivClusteringP2P | 0.4821 | nan | 0.317 | 0.5153 | voyageai/voyage-3-m-exp | True |
| MedrxivClusteringS2S | 0.4556 | nan | 0.2976 | 0.4969 | TencentBAC/Conan-embedding-v2 | True |
| NQ | 0.6090 | nan | 0.6403 | 0.8248 | voyageai/voyage-3-m-exp | True |
| NanoArguAnaRetrieval | 0.5816 | nan | nan | 0.7739 | infly/inf-retriever-v1-1.5b | True |
| NanoClimateFeverRetrieval | 0.4831 | nan | nan | 0.4667 | infly/inf-retriever-v1-1.5b | False |
| NanoDBPediaRetrieval | 0.6176 | nan | nan | 0.7345 | infly/inf-retriever-v1 | True |
| NanoFEVERRetrieval | 0.9528 | nan | nan | 0.9759 | infly/inf-retriever-v1 | True |
| NanoFiQA2018Retrieval | 0.5823 | nan | nan | 0.6972 | infly/inf-retriever-v1 | True |
| NanoHotpotQARetrieval | 0.7897 | nan | nan | 0.9095 | infly/inf-retriever-v1 | True |
| NanoMSMARCORetrieval | 0.6912 | nan | nan | 0.7006 | infly/inf-retriever-v1 | True |
| NanoNFCorpusRetrieval | 0.3777 | nan | nan | 0.4710 | infly/inf-retriever-v1 | True |
| NanoNQRetrieval | 0.7045 | nan | nan | 0.7831 | infly/inf-retriever-v1 | True |
| NanoQuoraRetrieval | 0.9609 | nan | nan | 0.9728 | intfloat/multilingual-e5-small | False |
| NanoSCIDOCSRetrieval | 0.4215 | nan | nan | 0.5333 | infly/inf-retriever-v1 | False |
| NanoSciFactRetrieval | 0.7561 | nan | nan | 0.8632 | infly/inf-retriever-v1 | True |
| NanoTouche2020Retrieval | 0.5108 | nan | nan | 0.6953 | mteb/baseline-bm25s | False |
| PIQA | 0.3234 | nan | 0.2882 | 0.4544 | nvidia/NV-Embed-v2 | False |
| PubChemAISentenceParaphrasePC | 0.9492 | nan | 0.9664 | 0.9748 | sentence-transformers/multi-qa-mpnet-base-dot-v1 | False |
| PubChemSMILESBitextMining | 0.0067 | nan | 0.0021 | 0.0074 | ICT-TIME-and-Querit/BOOM_4B_v1 | False |
| PubChemSMILESPC | 0.1373 | nan | 0.1077 | 0.1612 | ICT-TIME-and-Querit/BOOM_4B_v1 | False |
| PubChemSynonymPC | 0.7339 | nan | 0.6396 | 0.7352 | openai/text-embedding-3-large | False |
| PubChemWikiPairClassification | 0.9639 | nan | 0.9452 | 0.9641 | bedrock/amazon-titan-embed-text-v2 | False |
| PubChemWikiParagraphsPC | 0.4636 | nan | 0.192 | 0.5127 | openai/text-embedding-3-large | False |
| Quail | 0.1592 | nan | 0.0485 | 0.2657 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| QuoraRetrieval | 0.8890 | nan | 0.8926 | 0.9235 | TencentBAC/Conan-embedding-v2 | False |
| RARbCode | 0.6932 | nan | 0.5891 | 0.9049 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| RARbMath | 0.9489 | nan | 0.6732 | 0.9420 | voyageai/voyage-3.5 | False |
| RedditClustering | 0.6033 | nan | 0.4691 | 0.7716 | voyageai/voyage-3-m-exp | True |
| RedditClusteringP2P | 0.6635 | nan | 0.63 | 0.7527 | NovaSearch/stella_en_1.5B_v5 | True |
| SDSEyeProtectionClassification | 0.7621 | nan | 0.7115 | 0.8299 | minishlab/potion-multilingual-128M | False |
| SDSGlovesClassification | 0.7382 | nan | 0.6371 | 0.7533 | sentence-transformers/static-similarity-mrl-multilingual-v1 | False |
| SIQA | 0.0442 | nan | 0.0536 | 0.0836 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| STS16 | 0.8504 | nan | 0.8579 | 0.9763 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS22 | 0.6644 | 0.7176 | 0.6365 | 0.8314 | OrdalieTech/Solon-embeddings-mini-beta-1.1 | True |
| SciDocsRR | 0.8510 | nan | 0.8422 | 0.9114 | TencentBAC/Conan-embedding-v2 | False |
| StackExchangeClustering | 0.7389 | nan | 0.5837 | 0.8395 | TencentBAC/Conan-embedding-v2 | True |
| StackExchangeClusteringP2P | 0.4449 | nan | 0.329 | 0.5157 | TencentBAC/Conan-embedding-v2 | True |
| StackOverflowDupQuestions | 0.4713 | nan | 0.5014 | 0.5904 | Qwen/Qwen3-Embedding-8B | True |
| SummEval | 0.3224 | nan | 0.2964 | 0.3360 | bigscience/sgpt-bloom-7b1-msmarco | False |
| TempReasonL2Context | 0.3071 | nan | 0.2975 | 0.6405 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| TempReasonL2Fact | 0.3512 | nan | 0.4296 | 0.6412 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| TempReasonL2Pure | 0.0265 | nan | 0.0205 | 0.1420 | GritLM/GritLM-8x7B | False |
| TempReasonL3Context | 0.2354 | nan | 0.2551 | 0.4766 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| TempReasonL3Fact | 0.2774 | nan | 0.3821 | 0.4739 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| TempReasonL3Pure | 0.0617 | nan | 0.0831 | 0.1666 | Linq-AI-Research/Linq-Embed-Mistral | False |
| Touche2020 | 0.2660 | nan | 0.2313 | 0.3939 | voyageai/voyage-3-m-exp | False |
| TwentyNewsgroupsClustering | 0.5468 | nan | 0.394 | 0.8349 | voyageai/voyage-3-m-exp | True |
| WikiSQLRetrieval | 0.9834 | 0.8814 | nan | 0.9892 | Octen/Octen-Embedding-8B | False |
| WikipediaBioMetChemClassification | 0.9894 | nan | 0.9877 | 0.9980 | ICT-TIME-and-Querit/BOOM_4B_v1 | False |
| WikipediaBiolumNeurochemClassification | 0.9153 | nan | 0.9571 | 0.9847 | openai/text-embedding-3-large | False |
| WikipediaChemEngSpecialtiesClassification | 0.6524 | nan | 0.3202 | 0.7976 | bedrock/cohere-embed-english-v3 | False |
| WikipediaChemFieldsClassification | 0.5144 | nan | 0.4876 | 0.6020 | ICT-TIME-and-Querit/BOOM_4B_v1 | False |
| WikipediaChemistryTopicsClassification | 0.6884 | nan | 0.8463 | 0.9366 | openai/text-embedding-3-large | False |
| WikipediaChemistryTopicsClustering | 0.3952 | nan | 0.652 | 0.7900 | openai/text-embedding-3-large | False |
| WikipediaCompChemSpectroscopyClassification | 0.7448 | nan | 0.7466 | 0.8258 | VPLabs/SearchMap_Preview | False |
| WikipediaCryobiologySeparationClassification | 0.8026 | nan | 0.9197 | 0.9631 | bedrock/amazon-titan-embed-text-v1 | False |
| WikipediaCrystallographyAnalyticalClassification | 0.9316 | nan | 0.9296 | 0.9842 | ICT-TIME-and-Querit/BOOM_4B_v1 | False |
| WikipediaGreenhouseEnantiopureClassification | 0.9596 | nan | 0.9737 | 0.9890 | VPLabs/SearchMap_Preview | False |
| WikipediaIsotopesFissionClassification | 0.8476 | nan | 0.9071 | 0.9333 | openai/text-embedding-3-large | False |
| WikipediaLuminescenceClassification | 0.9000 | nan | 0.8793 | 0.9341 | bedrock/amazon-titan-embed-text-v1 | False |
| WikipediaOrganicInorganicClassification | 0.8266 | nan | 0.8856 | 0.9205 | ICT-TIME-and-Querit/BOOM_4B_v1 | False |
| WikipediaSaltsSemiconductorsClassification | 0.8152 | nan | 0.8545 | 0.9242 | VPLabs/SearchMap_Preview | False |
| WikipediaSolidStateColloidalClassification | 0.7489 | nan | 0.7872 | 0.8550 | bedrock/amazon-titan-embed-text-v1 | False |
| WikipediaSpecialtiesInChemistryClustering | 0.2173 | nan | 0.0065 | 0.4695 | VPLabs/SearchMap_Preview | False |
| WikipediaTheoreticalAppliedClassification | 0.6869 | nan | 0.6316 | 0.6978 | ICT-TIME-and-Querit/BOOM_4B_v1 | False |
| Average | 0.5873 | 0.7404 | 0.5018 | 0.7101 | nan | - |
Model have high performance on these tasks: RARbMath,BiorxivClusteringP2P,BiorxivClusteringS2S,NanoClimateFeverRetrieval
Training datasets: ANLI, AmazonCounterfactualClassification, AmazonCounterfactualVNClassification, AmazonPolarityClassification, AmazonPolarityClassification.v2, AmazonPolarityVNClassification, AmazonQA, AmazonReviewClassification, ArXivHierarchicalClusteringP2P, ArXivHierarchicalClusteringS2S, ArguAna, ArguAna-Fa, ArguAna-Fa.v2, ArguAna-NL, ArguAna-NL.v2, ArguAna-PL, ArguAna-VN, ArxivClusteringP2P, ArxivClusteringP2P.v2, ArxivClusteringS2S, Aya, BQ, BactrianXLanguageClassification, BactrianXTranslation, Banking77Classification, Banking77Classification.v2, Banking77VNClassification, BioASQ, BiorxivClusteringP2P, BiorxivClusteringP2P.v2, BiorxivClusteringS2S, BiorxivClusteringS2S.v2, CEDR, CLIRMatrix, CMCQA, CMNLI, CNNDM, COIG, COLIEE, CORD19, CSL, CoLA, CodeFeedbackMT, CodeFeedbackST, CodeSearchNet, CodeSearchNetCCR, CosQA, DBPedia, DBPedia-Fa, DBPedia-NL, DBPedia-PL, DBPedia-PLHardNegatives, DBPedia-VN, DBPediaHardNegatives, DBPediaHardNegatives.v2, DuReader, ELI5, ESCI, EmotionClassification, EmotionClassification.v2, EmotionVNClassification, Europarl, FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, FEVERNL, FiQA-PL, FiQA2018, FiQA2018-Fa, FiQA2018-Fa.v2, FiQA2018-NL, FiQA2018-VN, GooAQ, HUMEArxivClusteringP2P, HUMEEmotionClassification, HUMERedditClusteringP2P, HUMESTS12, HUMESTS22, HUMESTSBenchmark, HUMEToxicConversationsClassification, HUMETweetSentimentExtractionClassification, HealthCareMagic, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, HotpotQANL, HuatuoEncQA, HuatuoKGQA, ImdbClassification, ImdbClassification.v2, ImdbVNClassification, InfinityInstruct, KoAlpaca, KoAlpacaRealQA, KoMagpie, LCQMC, LCSTS, LLMRetrievalData, Lawzhidao, M2Lingual, MEDI2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MKQA, MLDR, MLSUMClustering, MLSUMRetrieval, MMARCO, MNLI, MQA, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MSciNLI, MTOPDomainClassification, MTOPDomainVNClassification, MTOPIntentClassification, MTOPIntentVNClassification, MURI, MailruQA, MassiveIntentClassification, MassiveIntentVNClassification, MassiveScenarioClassification, MassiveScenarioVNClassification, MedInstruct, MedMCQA, MedQA, MedQuAD, MedicalFlashcards, MedicalInstruction, MedicalQARu, MedrxivClusteringP2P, MedrxivClusteringP2P.v2, MedrxivClusteringS2S, MedrxivClusteringS2S.v2, MrTidyRetrieval, MrTyDiJaRetrievalLite, MultiAlpaca, MultiCPRECom, MultiCPRMedical, NFCorpus, NFCorpus-Fa, NFCorpus-NL, NFCorpus-NL.v2, NFCorpus-PL, NFCorpus-VN, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoArguAnaRetrieval, NanoDBPedia-VN, NanoDBPediaRetrieval, NanoFEVER-VN, NanoFEVERRetrieval, NanoFiQA2018Retrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNFCorpusRetrieval, NanoNQ-VN, NanoNQRetrieval, NanoSciFactRetrieval, NaturalReasoning, NordicClassification, NordicRetrieval, NordicSTS, NordicTextMatching, OASST2, OCNLI, OpenCodeGeneticInstruct, OpenCodeReasoning2, OpenOrca, PAQ, PQuAD, ParSQuAD, ParaCrawl, PawsX, PersianQA, ProCQA, PubMedQA, QBQTC, QQP, RedditClustering, RedditClustering-VN, RedditClustering.v2, RedditClusteringP2P, RedditClusteringP2P-VN, RedditClusteringP2P.v2, RefGPT, RuInstruct, RuSentimentClustering, S2ORC, SIB200, SNLI, SPECTER, SQuAD, STS12, STS22, STS22.v2, STSBenchmark, STSBenchmark-VN, SciFact, SciFact-Fa, SciFact-Fa.v2, SciFact-NL, SciFact-NL.v2, SciFact-PL, SciFact-VN, SentenceCompression, SiberianDataset, SimCLUE, StackExchange, StackExchangeClustering, StackExchangeClustering-VN, StackExchangeClustering.v2, StackExchangeClusteringP2P, StackExchangeClusteringP2P-VN, StackExchangeClusteringP2P.v2, StackExchangeDupQuestions, StackOverflowDupQuestions, StackOverflowDupQuestions-VN, StackOverflowQA, SyntheticText2SQL, T2Ranking, THUCNews, TNews, TNews.v2, ToxicConversationsClassification, ToxicConversationsClassification.v2, ToxicConversationsVNClassification, TriviaQA, TweetSentimentExtractionClassification, TweetSentimentExtractionClassification.v2, TweetSentimentExtractionVNClassification, TwentyNewsgroupsClustering, TwentyNewsgroupsClustering-VN, TwentyNewsgroupsClustering.v2, UNPC, Waimai, Waimai.v2, WebFAQ, WikiOmnia, WildChat, XCodeEvalCodeToCode, XCodeEvalNLToCode, XCodeEvalTranslation, XNLI, XSum, YahooAnswers, cMedQAv2, webMedQA
Results for codefuse-ai/F2LLM-v2-1.7B
| task_name | codefuse-ai/F2LLM-v2-1.7B | google/gemini-embedding-001 | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|---|
| AILACasedocs | 0.4182 | 0.4833 | 0.2643 | 0.6560 | Octen/Octen-Embedding-8B-INT8 | False |
| ARCChallenge | 0.2371 | nan | 0.1083 | 0.2668 | GritLM/GritLM-7B | False |
| AlphaNLI | 0.3063 | nan | 0.1359 | 0.4393 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| AmazonPolarityClassification | 0.9715 | nan | 0.9326 | 0.9774 | nvidia/NV-Embed-v2 | True |
| AmazonReviewsClassification | 0.5915 | nan | 0.4312 | 0.6880 | TencentBAC/Conan-embedding-v2 | False |
| ArxivClusteringP2P | 0.5515 | nan | 0.4473 | 0.6092 | TencentBAC/Conan-embedding-v2 | True |
| ArxivClusteringS2S | 0.5034 | nan | 0.3871 | 0.5520 | TencentBAC/Conan-embedding-v2 | True |
| BiorxivClusteringP2P | 0.6454 | nan | 0.355 | 0.5522 | TencentBAC/Conan-embedding-v2 | True |
| BiorxivClusteringS2S | 0.6117 | nan | 0.333 | 0.5092 | TencentBAC/Conan-embedding-v2 | True |
| BrightRetrieval | 0.1538 | nan | nan | 0.2720 | ByteDance-Seed/Seed1.5-Embedding | False |
| BuiltBenchClusteringP2P | 0.6701 | nan | 0.4869 | 0.6767 | Alibaba-NLP/gte-Qwen2-1.5B-instruct | False |
| BuiltBenchClusteringS2S | 0.5148 | nan | 0.3909 | 0.5766 | Salesforce/SFR-Embedding-2_R | False |
| BuiltBenchReranking | 0.6618 | nan | 0.6236 | 0.7653 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| BuiltBenchRetrieval | 0.6961 | nan | 0.6308 | 0.7687 | Linq-AI-Research/Linq-Embed-Mistral | False |
| CQADupstackAndroidRetrieval | 0.5648 | nan | 0.4904 | 0.7426 | voyageai/voyage-3-m-exp | False |
| CQADupstackEnglishRetrieval | 0.5435 | nan | 0.4581 | 0.6998 | voyageai/voyage-3-m-exp | False |
| CQADupstackGisRetrieval | 0.4570 | nan | 0.3695 | 0.6340 | voyageai/voyage-3-m-exp | False |
| CQADupstackMathematicaRetrieval | 0.4065 | nan | 0.2818 | 0.6948 | voyageai/voyage-3-m-exp | False |
| CQADupstackPhysicsRetrieval | 0.5687 | nan | 0.4366 | 0.7371 | voyageai/voyage-3-m-exp | False |
| CQADupstackProgrammersRetrieval | 0.4948 | nan | 0.416 | 0.6587 | voyageai/voyage-3-m-exp | False |
| CQADupstackRetrieval | 0.4855 | nan | 0.3967 | 0.6830 | voyageai/voyage-3-m-exp | False |
| CQADupstackStatsRetrieval | 0.4264 | nan | 0.3238 | 0.6242 | voyageai/voyage-3-m-exp | False |
| CQADupstackTexRetrieval | 0.3681 | nan | 0.2836 | 0.6295 | voyageai/voyage-3-m-exp | False |
| CQADupstackWebmastersRetrieval | 0.4651 | nan | 0.3988 | 0.6835 | voyageai/voyage-3-m-exp | False |
| CQADupstackWordpressRetrieval | 0.3876 | nan | 0.3164 | 0.5862 | voyageai/voyage-3-m-exp | False |
| ChatDoctorRetrieval | 0.7860 | 0.7352 | 0.5687 | 0.7722 | voyageai/voyage-4-large (embed_dim=2048) | False |
| ChemHotpotQARetrieval | 0.8439 | nan | 0.7979 | 0.9531 | infly/inf-retriever-v1 | False |
| ChemNQRetrieval | 0.6560 | nan | 0.6617 | 0.7046 | intfloat/multilingual-e5-small | False |
| ClimateFEVER | 0.4327 | nan | 0.2573 | 0.5693 | voyageai/voyage-3-m-exp | False |
| DBPedia | 0.4274 | nan | 0.413 | 0.5350 | nvidia/NV-Embed-v2 | True |
| DS1000Retrieval | 0.6646 | 0.6870 | nan | 0.7149 | google/gemini-embedding-2-preview | False |
| EmotionClassification | 0.9169 | nan | 0.4758 | 0.9387 | TencentBAC/Conan-embedding-v2 | True |
| FEVER | 0.9107 | nan | 0.8279 | 0.9628 | voyageai/voyage-3-m-exp | True |
| FinQARetrieval | 0.5648 | 0.6464 | nan | 0.8897 | voyageai/voyage-4-large (embed_dim=2048) | False |
| FinanceBenchRetrieval | 0.8155 | 0.9157 | nan | 0.9459 | Octen/Octen-Embedding-8B | False |
| FreshStackRetrieval | 0.3664 | 0.3979 | 0.2519 | 0.5776 | Octen/Octen-Embedding-8B | False |
| GerDaLIRSmall | 0.3898 | nan | 0.1572 | 0.5944 | mteb/baseline-bm25s | False |
| HC3FinanceRetrieval | 0.6862 | 0.7758 | nan | 0.8242 | nvidia/NV-Embed-v2 | False |
| HellaSwag | 0.3173 | nan | 0.2735 | 0.3966 | infly/inf-retriever-v1 | False |
| HotpotQA | 0.6789 | nan | 0.7122 | 0.8696 | voyageai/voyage-3-m-exp | True |
| HumanEvalRetrieval | 0.9797 | 0.9910 | nan | 1.0000 | google/gemini-embedding-2-preview | False |
| LEMBNarrativeQARetrieval | 0.5908 | nan | 0.2422 | 0.7690 | lightonai/GTE-ModernColBERT-v1 | False |
| LEMBNeedleRetrieval | 0.4700 | nan | 0.28 | 0.9325 | mteb/baseline-bm25s | False |
| LEMBQMSumRetrieval | 0.4829 | nan | 0.2426 | 0.8323 | mteb/baseline-bm25s | False |
| LEMBSummScreenFDRetrieval | 0.9776 | nan | 0.7112 | 0.9784 | mteb/baseline-bm25s | False |
| LEMBWikimQARetrieval | 0.9147 | nan | 0.568 | 0.9988 | lightonai/GTE-ModernColBERT-v1 | False |
| LeCaRDv2 | 0.7177 | nan | 0.5583 | 0.7777 | Mira190/Euler-Legal-Embedding-V1 | False |
| LegalBenchConsumerContractsQA | 0.7823 | nan | 0.733 | 0.8675 | voyageai/voyage-3 | False |
| LegalSummarization | 0.6614 | 0.7122 | 0.621 | 0.7921 | voyageai/voyage-3.5 | False |
| MBPPRetrieval | 0.9022 | 0.9416 | nan | 0.9608 | voyageai/voyage-4-large (embed_dim=2048) | False |
| MSMARCO | 0.4265 | nan | 0.437 | 0.4812 | TencentBAC/Conan-embedding-v2 | True |
| MTOPIntentClassification | 0.9459 | nan | 0.672 | 0.9429 | BAAI/bge-multilingual-gemma2 | True |
| MedrxivClusteringP2P | 0.5232 | nan | 0.317 | 0.5153 | voyageai/voyage-3-m-exp | True |
| MedrxivClusteringS2S | 0.4972 | nan | 0.2976 | 0.4969 | TencentBAC/Conan-embedding-v2 | True |
| NQ | 0.6436 | nan | 0.6403 | 0.8248 | voyageai/voyage-3-m-exp | True |
| NanoArguAnaRetrieval | 0.5546 | nan | nan | 0.7739 | infly/inf-retriever-v1-1.5b | True |
| NanoClimateFeverRetrieval | 0.4509 | nan | nan | 0.4667 | infly/inf-retriever-v1-1.5b | False |
| NanoDBPediaRetrieval | 0.6457 | nan | nan | 0.7345 | infly/inf-retriever-v1 | True |
| NanoFEVERRetrieval | 0.9352 | nan | nan | 0.9759 | infly/inf-retriever-v1 | True |
| NanoFiQA2018Retrieval | 0.6678 | nan | nan | 0.6972 | infly/inf-retriever-v1 | True |
| NanoHotpotQARetrieval | 0.8199 | nan | nan | 0.9095 | infly/inf-retriever-v1 | True |
| NanoMSMARCORetrieval | 0.6664 | nan | nan | 0.7006 | infly/inf-retriever-v1 | True |
| NanoNFCorpusRetrieval | 0.3622 | nan | nan | 0.4710 | infly/inf-retriever-v1 | True |
| NanoNQRetrieval | 0.7307 | nan | nan | 0.7831 | infly/inf-retriever-v1 | True |
| NanoQuoraRetrieval | 0.9670 | nan | nan | 0.9728 | intfloat/multilingual-e5-small | False |
| NanoSCIDOCSRetrieval | 0.4474 | nan | nan | 0.5333 | infly/inf-retriever-v1 | False |
| NanoSciFactRetrieval | 0.8154 | nan | nan | 0.8632 | infly/inf-retriever-v1 | True |
| NanoTouche2020Retrieval | 0.5169 | nan | nan | 0.6953 | mteb/baseline-bm25s | False |
| PIQA | 0.3478 | nan | 0.2882 | 0.4544 | nvidia/NV-Embed-v2 | False |
| PubChemAISentenceParaphrasePC | 0.9511 | nan | 0.9664 | 0.9748 | sentence-transformers/multi-qa-mpnet-base-dot-v1 | False |
| PubChemSMILESBitextMining | 0.0085 | nan | 0.0021 | 0.0074 | ICT-TIME-and-Querit/BOOM_4B_v1 | False |
| PubChemSMILESPC | 0.1760 | nan | 0.1077 | 0.1612 | ICT-TIME-and-Querit/BOOM_4B_v1 | False |
| PubChemSynonymPC | 0.7425 | nan | 0.6396 | 0.7352 | openai/text-embedding-3-large | False |
| PubChemWikiPairClassification | 0.9720 | nan | 0.9452 | 0.9641 | bedrock/amazon-titan-embed-text-v2 | False |
| PubChemWikiParagraphsPC | 0.5686 | nan | 0.192 | 0.5127 | openai/text-embedding-3-large | False |
| Quail | 0.1875 | nan | 0.0485 | 0.2657 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| QuoraRetrieval | 0.8939 | nan | 0.8926 | 0.9235 | TencentBAC/Conan-embedding-v2 | False |
| RARbCode | 0.7251 | nan | 0.5891 | 0.9049 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| RARbMath | 0.9645 | nan | 0.6732 | 0.9420 | voyageai/voyage-3.5 | False |
| RedditClustering | 0.6556 | nan | 0.4691 | 0.7716 | voyageai/voyage-3-m-exp | True |
| RedditClusteringP2P | 0.6867 | nan | 0.63 | 0.7527 | NovaSearch/stella_en_1.5B_v5 | True |
| SDSEyeProtectionClassification | 0.8195 | nan | 0.7115 | 0.8299 | minishlab/potion-multilingual-128M | False |
| SDSGlovesClassification | 0.7723 | nan | 0.6371 | 0.7533 | sentence-transformers/static-similarity-mrl-multilingual-v1 | False |
| SIQA | 0.0594 | nan | 0.0536 | 0.0836 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| STS16 | 0.8520 | nan | 0.8579 | 0.9763 | Gameselo/STS-multilingual-mpnet-base-v2 | False |
| STS22 | 0.6670 | 0.7176 | 0.6365 | 0.8314 | OrdalieTech/Solon-embeddings-mini-beta-1.1 | True |
| SciDocsRR | 0.8636 | nan | 0.8422 | 0.9114 | TencentBAC/Conan-embedding-v2 | False |
| StackExchangeClustering | 0.7764 | nan | 0.5837 | 0.8395 | TencentBAC/Conan-embedding-v2 | True |
| StackExchangeClusteringP2P | 0.4471 | nan | 0.329 | 0.5157 | TencentBAC/Conan-embedding-v2 | True |
| StackOverflowDupQuestions | 0.4912 | nan | 0.5014 | 0.5904 | Qwen/Qwen3-Embedding-8B | True |
| SummEval | 0.3114 | nan | 0.2964 | 0.3360 | bigscience/sgpt-bloom-7b1-msmarco | False |
| TempReasonL2Context | 0.3784 | nan | 0.2975 | 0.6405 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| TempReasonL2Fact | 0.4287 | nan | 0.4296 | 0.6412 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| TempReasonL2Pure | 0.0425 | nan | 0.0205 | 0.1420 | GritLM/GritLM-8x7B | False |
| TempReasonL3Context | 0.2727 | nan | 0.2551 | 0.4766 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| TempReasonL3Fact | 0.3234 | nan | 0.3821 | 0.4739 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| TempReasonL3Pure | 0.0839 | nan | 0.0831 | 0.1666 | Linq-AI-Research/Linq-Embed-Mistral | False |
| Touche2020 | 0.2684 | nan | 0.2313 | 0.3939 | voyageai/voyage-3-m-exp | False |
| TwentyNewsgroupsClustering | 0.5927 | nan | 0.394 | 0.8349 | voyageai/voyage-3-m-exp | True |
| WikiSQLRetrieval | 0.9884 | 0.8814 | nan | 0.9892 | Octen/Octen-Embedding-8B | False |
| WikipediaBioMetChemClassification | 0.9928 | nan | 0.9877 | 0.9980 | ICT-TIME-and-Querit/BOOM_4B_v1 | False |
| WikipediaBiolumNeurochemClassification | 0.9204 | nan | 0.9571 | 0.9847 | openai/text-embedding-3-large | False |
| WikipediaChemEngSpecialtiesClassification | 0.7065 | nan | 0.3202 | 0.7976 | bedrock/cohere-embed-english-v3 | False |
| WikipediaChemFieldsClassification | 0.5514 | nan | 0.4876 | 0.6020 | ICT-TIME-and-Querit/BOOM_4B_v1 | False |
| WikipediaChemistryTopicsClassification | 0.7677 | nan | 0.8463 | 0.9366 | openai/text-embedding-3-large | False |
| WikipediaChemistryTopicsClustering | 0.4075 | nan | 0.652 | 0.7900 | openai/text-embedding-3-large | False |
| WikipediaCompChemSpectroscopyClassificat |
Note: Content truncated due to GitHub API limits. See the full report in the workflow artifacts.
Checklist
mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found hereAdd results on MTEB(Law, v1), ChemTEB, RTEB, RAR-b, LongEmbed, NanoBEIR, and BuiltBench(eng). Relevant prompts are added to implementation in embeddings-benchmark/mteb#4643.