Skip to content

Add a mixture of results#529

Draft
KennethEnevoldsen wants to merge 6 commits into
mainfrom
add-more-more-res
Draft

Add a mixture of results#529
KennethEnevoldsen wants to merge 6 commits into
mainfrom
add-more-more-res

Conversation

@KennethEnevoldsen
Copy link
Copy Markdown
Contributor

@KennethEnevoldsen KennethEnevoldsen commented May 9, 2026

This PR add various results

The target is

The goal is not to finish everything, simply add a large part of the results in one go.

Continuation of #528

Checklist

  • My model has a model sheet, report, or similar
  • My model has a reference implementation in mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here
    • No, but there is an existing PR ___
  • The results submitted are obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have, I have disclosed it clearly.

Kenneth added 6 commits May 9, 2026 15:11
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5, Qwen/Qwen3-Embedding-0.6B, google/gemini-embedding-2-preview, google/siglip-base-patch16-224, manveertamber/cadet-embed-base-v1
Tasks: AfriSentiClassification, BUCC.v2, BibleNLPBitextMining, BornholmBitextMining, BrightAopsRetrieval, BrightBiologyLongRetrieval, BrightBiologyRetrieval, BrightEarthScienceLongRetrieval, BrightEarthScienceRetrieval, BrightEconomicsLongRetrieval, BrightEconomicsRetrieval, BrightLeetcodeRetrieval, BrightPonyLongRetrieval, BrightPonyRetrieval, BrightPsychologyLongRetrieval, BrightPsychologyRetrieval, BrightRoboticsLongRetrieval, BrightRoboticsRetrieval, BrightStackoverflowLongRetrieval, BrightStackoverflowRetrieval, BrightSustainableLivingLongRetrieval, BrightSustainableLivingRetrieval, BrightTheoremQAQuestionsRetrieval, BrightTheoremQATheoremsRetrieval, BulgarianStoreReviewSentimentClassfication, CzechProductReviewSentimentClassification, DBpediaClassification, DiaBlaBitextMining, EstonianValenceClassification, FEVERHardNegatives, FilipinoShopeeReviewsClassification, FinancialPhrasebankClassification, FloresBitextMining, GreekLegalCodeClassification, GujaratiNewsClassification, IN22GenBitextMining, IndicGenBenchFloresBitextMining, IndonesianIdClickbaitClassification, ItaCaseholdClassification, KorSarcasmClassification, KurdishSentimentClassification, LccSentimentClassification, MacedonianTweetSentimentClassification, NTREXBitextMining, NollySentiBitextMining, NorwegianCourtsBitextMining, NusaTranslationBitextMining, NusaXBitextMining, PoemSentimentClassification, SentimentAnalysisHindi, Tatoeba, ToxicConversationsClassification, TweetSentimentClassification, TweetTopicSingleClassification

Results for KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5

task_name KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5 intfloat/multilingual-e5-large Max result Model with max result In Training Data
TweetSentimentClassification 0.4935 0.503 0.6570 codefuse-ai/F2LLM-v2-14B False
Average 0.4935 0.503 0.6570 nan -

Training datasets: ATEC, AmazonCounterfactualClassification, AmazonCounterfactualVNClassification, AmazonPolarityClassification, AmazonPolarityClassification.v2, AmazonPolarityVNClassification, AmazonReviewsClassification, AmazonReviewsVNClassification, ArXivHierarchicalClusteringP2P, ArXivHierarchicalClusteringS2S, ArxivClusteringP2P, ArxivClusteringP2P.v2, ArxivClusteringS2S, BQ, Banking77Classification, Banking77Classification.v2, Banking77VNClassification, BiorxivClusteringP2P, BiorxivClusteringP2P.v2, BiorxivClusteringS2S, BiorxivClusteringS2S.v2, CQADupstack, CodeFeedbackMT, CodeFeedbackST, ContractNLIConfidentialityOfAgreementLegalBenchClassification, ContractNLIExplicitIdentificationLegalBenchClassification, ContractNLIInclusionOfVerballyConveyedInformationLegalBenchClassification, ContractNLILimitedUseLegalBenchClassification, ContractNLINoLicensingLegalBenchClassification, ContractNLINoticeOnCompelledDisclosureLegalBenchClassification, ContractNLIPermissibleAcquirementOfSimilarInformationLegalBenchClassification, ContractNLIPermissibleCopyLegalBenchClassification, ContractNLIPermissibleDevelopmentOfSimilarInformationLegalBenchClassification, ContractNLIPermissiblePostAgreementPossessionLegalBenchClassification, ContractNLIReturnOfConfidentialInformationLegalBenchClassification, ContractNLISharingWithEmployeesLegalBenchClassification, ContractNLISharingWithThirdPartiesLegalBenchClassification, ContractNLISurvivalOfObligationsLegalBenchClassification, DBPedia, DBPedia-Fa, DBPedia-NL, DBPedia-PL, DBPedia-PLHardNegatives, DBPedia-VN, DBPediaHardNegatives, DBPediaHardNegatives.v2, ESCIReranking, EmotionClassification, EmotionClassification.v2, EmotionVNClassification, FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, FiQA-PL, FiQA2018, FiQA2018-Fa, FiQA2018-Fa.v2, FiQA2018-NL, FiQA2018-VN, HUMEArxivClusteringP2P, HUMEEmotionClassification, HUMEToxicConversationsClassification, HUMETweetSentimentExtractionClassification, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, ImdbClassification, ImdbClassification.v2, ImdbVNClassification, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MTOPDomainClassification, MTOPDomainVNClassification, MTOPIntentClassification, MTOPIntentVNClassification, MassiveIntentClassification, MassiveIntentVNClassification, MassiveScenarioClassification, MassiveScenarioVNClassification, MedrxivClusteringP2P, MedrxivClusteringP2P.v2, MedrxivClusteringS2S, MedrxivClusteringS2S.v2, MrTidyRetrieval, MrTyDiJaRetrievalLite, MultiLongDocReranking, MultiLongDocRetrieval, MultilingualSentiment, MultilingualSentiment.v2, NFCorpus, NFCorpus-Fa, NFCorpus-NL, NFCorpus-NL.v2, NFCorpus-PL, NFCorpus-VN, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoDBPedia-VN, NanoDBPediaRetrieval, NanoFEVER-VN, NanoFEVERRetrieval, NanoFiQA2018Retrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNFCorpusRetrieval, NanoNQ-VN, NanoNQRetrieval, NanoQuoraRetrieval, NanoSciFactRetrieval, PawsXPairClassification, Quora-NL, Quora-PL, Quora-PLHardNegatives, QuoraRetrieval, QuoraRetrieval-Fa, QuoraRetrieval-Fa.v2, QuoraRetrievalHardNegatives, QuoraRetrievalHardNegatives.v2, Reddit-Clustering, Reddit-Clustering-P2P, SciFact, SciFact-Fa, SciFact-Fa.v2, SciFact-NL, SciFact-NL.v2, SciFact-PL, SciFact-VN, Stackexchange-Clustering, Stackexchange-Clustering-P2P, TRECCOVID, TRECCOVID-Fa, TRECCOVID-Fa.v2, TRECCOVID-NL, TRECCOVID-PL, TRECCOVID-VN, ToxicConversationsClassification, ToxicConversationsClassification.v2, ToxicConversationsVNClassification, TweetSentimentExtractionClassification, TweetSentimentExtractionClassification.v2, TweetSentimentExtractionVNClassification, TwentyNewsgroups-Clustering, YahooAnswersTopicsClassification, YahooAnswersTopicsClassification.v2, mMARCO-NL


Results for Qwen/Qwen3-Embedding-0.6B

task_name Qwen/Qwen3-Embedding-0.6B google/gemini-embedding-001 intfloat/multilingual-e5-large Max result Model with max result In Training Data
FEVERHardNegatives 0.8726 0.8898 0.8379 0.9453 ByteDance-Seed/Seed1.5-Embedding True
TweetSentimentClassification 0.4813 nan 0.503 0.6570 codefuse-ai/F2LLM-v2-14B False
Average 0.677 0.8898 0.6704 0.8012 nan -

Training datasets: CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DuRetrieval, FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Retrieval


Results for google/gemini-embedding-2-preview

task_name google/gemini-embedding-001 google/gemini-embedding-2-preview intfloat/multilingual-e5-large Max result Model with max result In Training Data
BUCC.v2 0.9899 1.0000 0.9878 0.9905 codefuse-ai/F2LLM-v2-8B False
BibleNLPBitextMining 0.2072 0.5298 0.1665 0.9899 deepvk/USER-bge-m3 False
BornholmBitextMining 0.5169 0.9167 0.4416 0.7798 jinaai/jina-embeddings-v5-text-small False
BulgarianStoreReviewSentimentClassfication 0.7813 0.7319 0.6385 0.8159 microsoft/harrier-oss-v1-27b False
CzechProductReviewSentimentClassification 0.6816 0.6435 0.5714 0.7667 Bytedance/Seed1.6-embedding-1215 False
DBpediaClassification 0.9476 0.8865 0.8828 0.9926 Qwen/Qwen3-Embedding-4B False
DiaBlaBitextMining 0.8723 0.9963 0.8483 0.8882 codefuse-ai/F2LLM-v2-14B False
EstonianValenceClassification 0.5352 0.4581 0.4289 0.6764 microsoft/harrier-oss-v1-27b False
FilipinoShopeeReviewsClassification 0.4845 0.4243 0.3527 0.5279 microsoft/harrier-oss-v1-27b False
FinancialPhrasebankClassification 0.8864 0.8447 0.8394 0.9519 microsoft/harrier-oss-v1-0.6b False
FloresBitextMining 0.8371 0.9824 0.8108 0.9087 SamilPwC-AXNode-GenAI/PwC-Embedding_expr False
GreekLegalCodeClassification 0.4376 0.3386 0.3713 0.8052 Bytedance/Seed1.6-embedding-1215 False
GujaratiNewsClassification 0.9205 0.9010 0.7674 0.9343 Bytedance/Seed1.6-embedding-1215 False
IN22GenBitextMining 0.9375 0.9953 0.7675 0.9375 google/gemini-embedding-001 False
IndicGenBenchFloresBitextMining 0.9677 0.9690 0.8875 0.9881 Sailesh97/Hinvec False
IndonesianIdClickbaitClassification 0.67 0.6256 0.6122 0.7560 nvidia/llama-embed-nemotron-8b False
LccSentimentClassification 0.6993 0.5933 0.594 0.7687 Alibaba-NLP/gte-Qwen2-7B-instruct False
NTREXBitextMining 0.9364 0.9881 0.914 0.9592 microsoft/harrier-oss-v1-27b False
NollySentiBitextMining 0.6871 0.7837 0.675 0.8376 microsoft/harrier-oss-v1-27b False
NorwegianCourtsBitextMining 0.9342 1.0000 0.9404 0.9481 jinaai/jina-embeddings-v5-text-nano False
NusaTranslationBitextMining 0.7752 0.9316 0.672 0.9222 Qwen/Qwen3-Embedding-8B False
NusaXBitextMining 0.8252 0.8393 0.7267 0.9056 Bytedance/Seed1.6-embedding-1215 False
PoemSentimentClassification 0.5966 0.4756 0.5067 0.8642 Bytedance/Seed1.6-embedding-1215 False
SentimentAnalysisHindi 0.7606 0.5818 0.642 0.8070 microsoft/harrier-oss-v1-27b False
Tatoeba 0.8197 0.9947 0.7573 0.9659 SamilPwC-AXNode-GenAI/PwC-Embedding_expr False
ToxicConversationsClassification 0.8875 0.7352 0.6601 0.9759 voyageai/voyage-3-m-exp False
TweetTopicSingleClassification 0.7111 0.6699 0.6532 0.8631 jinaai/jina-embeddings-v5-text-small False
Average 0.7521 0.7717 0.671 0.8714 nan -

Model have high performance on these tasks: BUCC.v2,Tatoeba,NTREXBitextMining,NorwegianCourtsBitextMining,IN22GenBitextMining,NusaTranslationBitextMining,FloresBitextMining,DiaBlaBitextMining,BornholmBitextMining

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoNQ-VN, NanoNQRetrieval


Results for google/siglip-base-patch16-224

task_name google/gemini-embedding-001 google/siglip-base-patch16-224 intfloat/multilingual-e5-large Max result Model with max result In Training Data
AfriSentiClassification 0.5356 0.5124 0.455 0.5688 tencent/KaLM-Embedding-Gemma3-12B-2511 False
BUCC.v2 0.9899 1.0000 0.9878 0.9905 codefuse-ai/F2LLM-v2-8B False
BibleNLPBitextMining 0.2072 0.5298 0.1665 0.9899 deepvk/USER-bge-m3 False
BornholmBitextMining 0.5169 0.9167 0.4416 0.7798 jinaai/jina-embeddings-v5-text-small False
BulgarianStoreReviewSentimentClassfication 0.7813 0.7319 0.6385 0.8159 microsoft/harrier-oss-v1-27b False
CzechProductReviewSentimentClassification 0.6816 0.6435 0.5714 0.7667 Bytedance/Seed1.6-embedding-1215 False
DBpediaClassification 0.9476 0.8865 0.8828 0.9926 Qwen/Qwen3-Embedding-4B False
DiaBlaBitextMining 0.8723 0.9963 0.8483 0.8882 codefuse-ai/F2LLM-v2-14B False
EstonianValenceClassification 0.5352 0.4581 0.4289 0.6764 microsoft/harrier-oss-v1-27b False
FilipinoShopeeReviewsClassification 0.4845 0.4243 0.3527 0.5279 microsoft/harrier-oss-v1-27b False
FinancialPhrasebankClassification 0.8864 0.8447 0.8394 0.9519 microsoft/harrier-oss-v1-0.6b False
FloresBitextMining 0.8371 0.9824 0.8108 0.9087 SamilPwC-AXNode-GenAI/PwC-Embedding_expr False
GreekLegalCodeClassification 0.4376 0.3386 0.3713 0.8052 Bytedance/Seed1.6-embedding-1215 False
GujaratiNewsClassification 0.9205 0.9010 0.7674 0.9343 Bytedance/Seed1.6-embedding-1215 False
IN22GenBitextMining 0.9375 0.9953 0.7675 0.9375 google/gemini-embedding-001 False
IndicGenBenchFloresBitextMining 0.9677 0.9690 0.8875 0.9881 Sailesh97/Hinvec False
IndonesianIdClickbaitClassification 0.67 0.6256 0.6122 0.7560 nvidia/llama-embed-nemotron-8b False
ItaCaseholdClassification 0.733 0.5127 0.6679 0.9439 bigscience/sgpt-bloom-7b1-msmarco False
KorSarcasmClassification 0.6051 0.6358 0.5679 0.8190 ICT-TIME-and-Querit/BOOM_4B_v1 False
KurdishSentimentClassification 0.8639 0.7964 0.7708 0.9403 Bytedance/Seed1.6-embedding-1215 False
LccSentimentClassification 0.6993 0.5933 0.594 0.7687 Alibaba-NLP/gte-Qwen2-7B-instruct False
MacedonianTweetSentimentClassification 0.7183 0.6325 0.6192 0.7547 Qwen/Qwen3-Embedding-4B False
NTREXBitextMining 0.9364 0.9881 0.914 0.9592 microsoft/harrier-oss-v1-27b False
NollySentiBitextMining 0.6871 0.7837 0.675 0.8376 microsoft/harrier-oss-v1-27b False
NorwegianCourtsBitextMining 0.9342 1.0000 0.9404 0.9481 jinaai/jina-embeddings-v5-text-nano False
NusaTranslationBitextMining 0.7752 0.9316 0.672 0.9222 Qwen/Qwen3-Embedding-8B False
NusaXBitextMining 0.8252 0.8393 0.7267 0.9056 Bytedance/Seed1.6-embedding-1215 False
PoemSentimentClassification 0.5966 0.4756 0.5067 0.8642 Bytedance/Seed1.6-embedding-1215 False
SentimentAnalysisHindi 0.7606 0.5818 0.642 0.8070 microsoft/harrier-oss-v1-27b False
Tatoeba 0.8197 0.9947 0.7573 0.9659 SamilPwC-AXNode-GenAI/PwC-Embedding_expr False
ToxicConversationsClassification 0.8875 0.7352 0.6601 0.9759 voyageai/voyage-3-m-exp False
TweetTopicSingleClassification 0.7111 0.6699 0.6532 0.8631 jinaai/jina-embeddings-v5-text-small False
Average 0.7426 0.7477 0.6624 0.8611 nan -

Model have high performance on these tasks: BUCC.v2,Tatoeba,NTREXBitextMining,NorwegianCourtsBitextMining,IN22GenBitextMining,NusaTranslationBitextMining,FloresBitextMining,DiaBlaBitextMining,BornholmBitextMining


Results for manveertamber/cadet-embed-base-v1

task_name intfloat/multilingual-e5-large manveertamber/cadet-embed-base-v1 Max result Model with max result In Training Data
BrightAopsRetrieval 0.0722 0.0755 0.0825 lightonai/Reason-ModernColBERT False
BrightBiologyLongRetrieval 0.0194 0.2532 0.2557 sentence-transformers/all-mpnet-base-v2 False
BrightBiologyRetrieval 0.0174 0.2129 0.3387 lightonai/Reason-ModernColBERT False
BrightEarthScienceLongRetrieval 0.2155 0.3348 0.3405 sentence-transformers/all-mpnet-base-v2 False
BrightEarthScienceRetrieval 0.1506 0.3452 0.4170 lightonai/Reason-ModernColBERT False
BrightEconomicsLongRetrieval 0.1359 0.1408 0.2087 BAAI/bge-large-en-v1.5 False
BrightEconomicsRetrieval 0.0706 0.1912 0.2455 lightonai/Reason-ModernColBERT False
BrightLeetcodeRetrieval 0.2787 0.2793 0.3086 lightonai/Reason-ModernColBERT False
BrightPonyLongRetrieval 0.0234 0.0284 0.0338 minishlab/potion-multilingual-128M False
BrightPonyRetrieval 0.1302 0.0747 0.1517 BAAI/bge-m3 False
BrightPsychologyLongRetrieval 0.0594 0.1555 0.1931 BAAI/bge-m3 False
BrightPsychologyRetrieval 0.0879 0.2123 0.3104 lightonai/Reason-ModernColBERT False
BrightRoboticsLongRetrieval 0.0792 0.1287 0.1238 BAAI/bge-m3 False
BrightRoboticsRetrieval 0.1112 0.1547 0.2181 lightonai/Reason-ModernColBERT False
BrightStackoverflowLongRetrieval 0.1581 0.1282 0.2350 mteb/baseline-bm25s False
BrightStackoverflowRetrieval 0.0694 0.1365 0.2425 lightonai/Reason-ModernColBERT False
BrightSustainableLivingLongRetrieval 0.081 0.1847 0.1852 mteb/baseline-bm25s False
BrightSustainableLivingRetrieval 0.0961 0.1515 0.2021 lightonai/Reason-ModernColBERT False
BrightTheoremQAQuestionsRetrieval 0.1296 0.1526 0.2004 sentence-transformers/all-mpnet-base-v2 False
BrightTheoremQATheoremsRetrieval 0.0549 0.0762 0.1078 sentence-transformers/all-mpnet-base-v2 False
Average 0.102 0.1708 0.2201 nan -

Model have high performance on these tasks: BrightRoboticsLongRetrieval

Training datasets: ArguAna, ArguAna-Fa, ArguAna-Fa.v2, ArguAna-NL, ArguAna-NL.v2, ArguAna-PL, ArguAna-VN, CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DBPedia, DBPedia-Fa, DBPedia-NL, DBPedia-PL, DBPedia-PLHardNegatives, DBPedia-VN, DBPediaHardNegatives, DBPediaHardNegatives.v2, DuRetrieval, FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, LeCaRDv2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoArguAnaRetrieval, NanoDBPedia-VN, NanoDBPediaRetrieval, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Reranking, T2Retrieval, mMARCO-NL



Note: Content truncated due to GitHub API limits. See the full report in the workflow artifacts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant