Add a mixture of results by KennethEnevoldsen · Pull Request #529 · embeddings-benchmark/results

KennethEnevoldsen · 2026-05-09T16:03:27Z

This PR add various results

The target is

BRIGHT v1.1 (Switch to brightv1.1 in leaderboard mteb#4340)
running gemini 2 (Eval Gemini Embedding 2 mteb#4260)
Adding e5 multilingual base, large and large-instruct as well as random-encoder to reference models
and a few other that I noted was missing.

The goal is not to finish everything, simply add a large part of the results in one go.

Continuation of #528

Checklist

My model has a model sheet, report, or similar
My model has a reference implementation in mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here
- No, but there is an existing PR ___
The results submitted are obtained using the reference implementation
My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have, I have disclosed it clearly.

This is for multiple PRs e.g. BRIGHT v1.1, running gemini 2 and a few other that I noted was missing.

…into add-res

github-actions · 2026-05-09T16:52:53Z

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5, Qwen/Qwen3-Embedding-0.6B, google/gemini-embedding-2-preview, google/siglip-base-patch16-224, manveertamber/cadet-embed-base-v1
Tasks: AfriSentiClassification, BUCC.v2, BibleNLPBitextMining, BornholmBitextMining, BrightAopsRetrieval, BrightBiologyLongRetrieval, BrightBiologyRetrieval, BrightEarthScienceLongRetrieval, BrightEarthScienceRetrieval, BrightEconomicsLongRetrieval, BrightEconomicsRetrieval, BrightLeetcodeRetrieval, BrightPonyLongRetrieval, BrightPonyRetrieval, BrightPsychologyLongRetrieval, BrightPsychologyRetrieval, BrightRoboticsLongRetrieval, BrightRoboticsRetrieval, BrightStackoverflowLongRetrieval, BrightStackoverflowRetrieval, BrightSustainableLivingLongRetrieval, BrightSustainableLivingRetrieval, BrightTheoremQAQuestionsRetrieval, BrightTheoremQATheoremsRetrieval, BulgarianStoreReviewSentimentClassfication, CzechProductReviewSentimentClassification, DBpediaClassification, DiaBlaBitextMining, EstonianValenceClassification, FEVERHardNegatives, FilipinoShopeeReviewsClassification, FinancialPhrasebankClassification, FloresBitextMining, GreekLegalCodeClassification, GujaratiNewsClassification, IN22GenBitextMining, IndicGenBenchFloresBitextMining, IndonesianIdClickbaitClassification, ItaCaseholdClassification, KorSarcasmClassification, KurdishSentimentClassification, LccSentimentClassification, MacedonianTweetSentimentClassification, NTREXBitextMining, NollySentiBitextMining, NorwegianCourtsBitextMining, NusaTranslationBitextMining, NusaXBitextMining, PoemSentimentClassification, SentimentAnalysisHindi, Tatoeba, ToxicConversationsClassification, TweetSentimentClassification, TweetTopicSingleClassification

Results for `KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5`

task_name	KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5	intfloat/multilingual-e5-large	Max result	Model with max result	In Training Data
TweetSentimentClassification	0.4935	0.503	0.6570	codefuse-ai/F2LLM-v2-14B	False
Average	0.4935	0.503	0.6570	nan	-

Training datasets: ATEC, AmazonCounterfactualClassification, AmazonCounterfactualVNClassification, AmazonPolarityClassification, AmazonPolarityClassification.v2, AmazonPolarityVNClassification, AmazonReviewsClassification, AmazonReviewsVNClassification, ArXivHierarchicalClusteringP2P, ArXivHierarchicalClusteringS2S, ArxivClusteringP2P, ArxivClusteringP2P.v2, ArxivClusteringS2S, BQ, Banking77Classification, Banking77Classification.v2, Banking77VNClassification, BiorxivClusteringP2P, BiorxivClusteringP2P.v2, BiorxivClusteringS2S, BiorxivClusteringS2S.v2, CQADupstack, CodeFeedbackMT, CodeFeedbackST, ContractNLIConfidentialityOfAgreementLegalBenchClassification, ContractNLIExplicitIdentificationLegalBenchClassification, ContractNLIInclusionOfVerballyConveyedInformationLegalBenchClassification, ContractNLILimitedUseLegalBenchClassification, ContractNLINoLicensingLegalBenchClassification, ContractNLINoticeOnCompelledDisclosureLegalBenchClassification, ContractNLIPermissibleAcquirementOfSimilarInformationLegalBenchClassification, ContractNLIPermissibleCopyLegalBenchClassification, ContractNLIPermissibleDevelopmentOfSimilarInformationLegalBenchClassification, ContractNLIPermissiblePostAgreementPossessionLegalBenchClassification, ContractNLIReturnOfConfidentialInformationLegalBenchClassification, ContractNLISharingWithEmployeesLegalBenchClassification, ContractNLISharingWithThirdPartiesLegalBenchClassification, ContractNLISurvivalOfObligationsLegalBenchClassification, DBPedia, DBPedia-Fa, DBPedia-NL, DBPedia-PL, DBPedia-PLHardNegatives, DBPedia-VN, DBPediaHardNegatives, DBPediaHardNegatives.v2, ESCIReranking, EmotionClassification, EmotionClassification.v2, EmotionVNClassification, FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, FiQA-PL, FiQA2018, FiQA2018-Fa, FiQA2018-Fa.v2, FiQA2018-NL, FiQA2018-VN, HUMEArxivClusteringP2P, HUMEEmotionClassification, HUMEToxicConversationsClassification, HUMETweetSentimentExtractionClassification, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, ImdbClassification, ImdbClassification.v2, ImdbVNClassification, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MTOPDomainClassification, MTOPDomainVNClassification, MTOPIntentClassification, MTOPIntentVNClassification, MassiveIntentClassification, MassiveIntentVNClassification, MassiveScenarioClassification, MassiveScenarioVNClassification, MedrxivClusteringP2P, MedrxivClusteringP2P.v2, MedrxivClusteringS2S, MedrxivClusteringS2S.v2, MrTidyRetrieval, MrTyDiJaRetrievalLite, MultiLongDocReranking, MultiLongDocRetrieval, MultilingualSentiment, MultilingualSentiment.v2, NFCorpus, NFCorpus-Fa, NFCorpus-NL, NFCorpus-NL.v2, NFCorpus-PL, NFCorpus-VN, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoDBPedia-VN, NanoDBPediaRetrieval, NanoFEVER-VN, NanoFEVERRetrieval, NanoFiQA2018Retrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNFCorpusRetrieval, NanoNQ-VN, NanoNQRetrieval, NanoQuoraRetrieval, NanoSciFactRetrieval, PawsXPairClassification, Quora-NL, Quora-PL, Quora-PLHardNegatives, QuoraRetrieval, QuoraRetrieval-Fa, QuoraRetrieval-Fa.v2, QuoraRetrievalHardNegatives, QuoraRetrievalHardNegatives.v2, Reddit-Clustering, Reddit-Clustering-P2P, SciFact, SciFact-Fa, SciFact-Fa.v2, SciFact-NL, SciFact-NL.v2, SciFact-PL, SciFact-VN, Stackexchange-Clustering, Stackexchange-Clustering-P2P, TRECCOVID, TRECCOVID-Fa, TRECCOVID-Fa.v2, TRECCOVID-NL, TRECCOVID-PL, TRECCOVID-VN, ToxicConversationsClassification, ToxicConversationsClassification.v2, ToxicConversationsVNClassification, TweetSentimentExtractionClassification, TweetSentimentExtractionClassification.v2, TweetSentimentExtractionVNClassification, TwentyNewsgroups-Clustering, YahooAnswersTopicsClassification, YahooAnswersTopicsClassification.v2, mMARCO-NL

Results for `Qwen/Qwen3-Embedding-0.6B`

task_name	Qwen/Qwen3-Embedding-0.6B	google/gemini-embedding-001	intfloat/multilingual-e5-large	Max result	Model with max result	In Training Data
FEVERHardNegatives	0.8726	0.8898	0.8379	0.9453	ByteDance-Seed/Seed1.5-Embedding	True
TweetSentimentClassification	0.4813	nan	0.503	0.6570	codefuse-ai/F2LLM-v2-14B	False
Average	0.677	0.8898	0.6704	0.8012	nan	-

Training datasets: CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DuRetrieval, FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Retrieval

Results for `google/gemini-embedding-2-preview`

task_name	google/gemini-embedding-001	google/gemini-embedding-2-preview	intfloat/multilingual-e5-large	Max result	Model with max result	In Training Data
BUCC.v2	0.9899	1.0000	0.9878	0.9905	codefuse-ai/F2LLM-v2-8B	False
BibleNLPBitextMining	0.2072	0.5298	0.1665	0.9899	deepvk/USER-bge-m3	False
BornholmBitextMining	0.5169	0.9167	0.4416	0.7798	jinaai/jina-embeddings-v5-text-small	False
BulgarianStoreReviewSentimentClassfication	0.7813	0.7319	0.6385	0.8159	microsoft/harrier-oss-v1-27b	False
CzechProductReviewSentimentClassification	0.6816	0.6435	0.5714	0.7667	Bytedance/Seed1.6-embedding-1215	False
DBpediaClassification	0.9476	0.8865	0.8828	0.9926	Qwen/Qwen3-Embedding-4B	False
DiaBlaBitextMining	0.8723	0.9963	0.8483	0.8882	codefuse-ai/F2LLM-v2-14B	False
EstonianValenceClassification	0.5352	0.4581	0.4289	0.6764	microsoft/harrier-oss-v1-27b	False
FilipinoShopeeReviewsClassification	0.4845	0.4243	0.3527	0.5279	microsoft/harrier-oss-v1-27b	False
FinancialPhrasebankClassification	0.8864	0.8447	0.8394	0.9519	microsoft/harrier-oss-v1-0.6b	False
FloresBitextMining	0.8371	0.9824	0.8108	0.9087	SamilPwC-AXNode-GenAI/PwC-Embedding_expr	False
GreekLegalCodeClassification	0.4376	0.3386	0.3713	0.8052	Bytedance/Seed1.6-embedding-1215	False
GujaratiNewsClassification	0.9205	0.9010	0.7674	0.9343	Bytedance/Seed1.6-embedding-1215	False
IN22GenBitextMining	0.9375	0.9953	0.7675	0.9375	google/gemini-embedding-001	False
IndicGenBenchFloresBitextMining	0.9677	0.9690	0.8875	0.9881	Sailesh97/Hinvec	False
IndonesianIdClickbaitClassification	0.67	0.6256	0.6122	0.7560	nvidia/llama-embed-nemotron-8b	False
LccSentimentClassification	0.6993	0.5933	0.594	0.7687	Alibaba-NLP/gte-Qwen2-7B-instruct	False
NTREXBitextMining	0.9364	0.9881	0.914	0.9592	microsoft/harrier-oss-v1-27b	False
NollySentiBitextMining	0.6871	0.7837	0.675	0.8376	microsoft/harrier-oss-v1-27b	False
NorwegianCourtsBitextMining	0.9342	1.0000	0.9404	0.9481	jinaai/jina-embeddings-v5-text-nano	False
NusaTranslationBitextMining	0.7752	0.9316	0.672	0.9222	Qwen/Qwen3-Embedding-8B	False
NusaXBitextMining	0.8252	0.8393	0.7267	0.9056	Bytedance/Seed1.6-embedding-1215	False
PoemSentimentClassification	0.5966	0.4756	0.5067	0.8642	Bytedance/Seed1.6-embedding-1215	False
SentimentAnalysisHindi	0.7606	0.5818	0.642	0.8070	microsoft/harrier-oss-v1-27b	False
Tatoeba	0.8197	0.9947	0.7573	0.9659	SamilPwC-AXNode-GenAI/PwC-Embedding_expr	False
ToxicConversationsClassification	0.8875	0.7352	0.6601	0.9759	voyageai/voyage-3-m-exp	False
TweetTopicSingleClassification	0.7111	0.6699	0.6532	0.8631	jinaai/jina-embeddings-v5-text-small	False
Average	0.7521	0.7717	0.671	0.8714	nan	-

Model have high performance on these tasks: BUCC.v2,Tatoeba,NTREXBitextMining,NorwegianCourtsBitextMining,IN22GenBitextMining,NusaTranslationBitextMining,FloresBitextMining,DiaBlaBitextMining,BornholmBitextMining

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoNQ-VN, NanoNQRetrieval

Results for `google/siglip-base-patch16-224`

task_name	google/gemini-embedding-001	google/siglip-base-patch16-224	intfloat/multilingual-e5-large	Max result	Model with max result	In Training Data
AfriSentiClassification	0.5356	0.5124	0.455	0.5688	tencent/KaLM-Embedding-Gemma3-12B-2511	False
BUCC.v2	0.9899	1.0000	0.9878	0.9905	codefuse-ai/F2LLM-v2-8B	False
BibleNLPBitextMining	0.2072	0.5298	0.1665	0.9899	deepvk/USER-bge-m3	False
BornholmBitextMining	0.5169	0.9167	0.4416	0.7798	jinaai/jina-embeddings-v5-text-small	False
BulgarianStoreReviewSentimentClassfication	0.7813	0.7319	0.6385	0.8159	microsoft/harrier-oss-v1-27b	False
CzechProductReviewSentimentClassification	0.6816	0.6435	0.5714	0.7667	Bytedance/Seed1.6-embedding-1215	False
DBpediaClassification	0.9476	0.8865	0.8828	0.9926	Qwen/Qwen3-Embedding-4B	False
DiaBlaBitextMining	0.8723	0.9963	0.8483	0.8882	codefuse-ai/F2LLM-v2-14B	False
EstonianValenceClassification	0.5352	0.4581	0.4289	0.6764	microsoft/harrier-oss-v1-27b	False
FilipinoShopeeReviewsClassification	0.4845	0.4243	0.3527	0.5279	microsoft/harrier-oss-v1-27b	False
FinancialPhrasebankClassification	0.8864	0.8447	0.8394	0.9519	microsoft/harrier-oss-v1-0.6b	False
FloresBitextMining	0.8371	0.9824	0.8108	0.9087	SamilPwC-AXNode-GenAI/PwC-Embedding_expr	False
GreekLegalCodeClassification	0.4376	0.3386	0.3713	0.8052	Bytedance/Seed1.6-embedding-1215	False
GujaratiNewsClassification	0.9205	0.9010	0.7674	0.9343	Bytedance/Seed1.6-embedding-1215	False
IN22GenBitextMining	0.9375	0.9953	0.7675	0.9375	google/gemini-embedding-001	False
IndicGenBenchFloresBitextMining	0.9677	0.9690	0.8875	0.9881	Sailesh97/Hinvec	False
IndonesianIdClickbaitClassification	0.67	0.6256	0.6122	0.7560	nvidia/llama-embed-nemotron-8b	False
ItaCaseholdClassification	0.733	0.5127	0.6679	0.9439	bigscience/sgpt-bloom-7b1-msmarco	False
KorSarcasmClassification	0.6051	0.6358	0.5679	0.8190	ICT-TIME-and-Querit/BOOM_4B_v1	False
KurdishSentimentClassification	0.8639	0.7964	0.7708	0.9403	Bytedance/Seed1.6-embedding-1215	False
LccSentimentClassification	0.6993	0.5933	0.594	0.7687	Alibaba-NLP/gte-Qwen2-7B-instruct	False
MacedonianTweetSentimentClassification	0.7183	0.6325	0.6192	0.7547	Qwen/Qwen3-Embedding-4B	False
NTREXBitextMining	0.9364	0.9881	0.914	0.9592	microsoft/harrier-oss-v1-27b	False
NollySentiBitextMining	0.6871	0.7837	0.675	0.8376	microsoft/harrier-oss-v1-27b	False
NorwegianCourtsBitextMining	0.9342	1.0000	0.9404	0.9481	jinaai/jina-embeddings-v5-text-nano	False
NusaTranslationBitextMining	0.7752	0.9316	0.672	0.9222	Qwen/Qwen3-Embedding-8B	False
NusaXBitextMining	0.8252	0.8393	0.7267	0.9056	Bytedance/Seed1.6-embedding-1215	False
PoemSentimentClassification	0.5966	0.4756	0.5067	0.8642	Bytedance/Seed1.6-embedding-1215	False
SentimentAnalysisHindi	0.7606	0.5818	0.642	0.8070	microsoft/harrier-oss-v1-27b	False
Tatoeba	0.8197	0.9947	0.7573	0.9659	SamilPwC-AXNode-GenAI/PwC-Embedding_expr	False
ToxicConversationsClassification	0.8875	0.7352	0.6601	0.9759	voyageai/voyage-3-m-exp	False
TweetTopicSingleClassification	0.7111	0.6699	0.6532	0.8631	jinaai/jina-embeddings-v5-text-small	False
Average	0.7426	0.7477	0.6624	0.8611	nan	-

Model have high performance on these tasks: BUCC.v2,Tatoeba,NTREXBitextMining,NorwegianCourtsBitextMining,IN22GenBitextMining,NusaTranslationBitextMining,FloresBitextMining,DiaBlaBitextMining,BornholmBitextMining

Results for `manveertamber/cadet-embed-base-v1`

task_name	intfloat/multilingual-e5-large	manveertamber/cadet-embed-base-v1	Max result	Model with max result	In Training Data
BrightAopsRetrieval	0.0722	0.0755	0.0825	lightonai/Reason-ModernColBERT	False
BrightBiologyLongRetrieval	0.0194	0.2532	0.2557	sentence-transformers/all-mpnet-base-v2	False
BrightBiologyRetrieval	0.0174	0.2129	0.3387	lightonai/Reason-ModernColBERT	False
BrightEarthScienceLongRetrieval	0.2155	0.3348	0.3405	sentence-transformers/all-mpnet-base-v2	False
BrightEarthScienceRetrieval	0.1506	0.3452	0.4170	lightonai/Reason-ModernColBERT	False
BrightEconomicsLongRetrieval	0.1359	0.1408	0.2087	BAAI/bge-large-en-v1.5	False
BrightEconomicsRetrieval	0.0706	0.1912	0.2455	lightonai/Reason-ModernColBERT	False
BrightLeetcodeRetrieval	0.2787	0.2793	0.3086	lightonai/Reason-ModernColBERT	False
BrightPonyLongRetrieval	0.0234	0.0284	0.0338	minishlab/potion-multilingual-128M	False
BrightPonyRetrieval	0.1302	0.0747	0.1517	BAAI/bge-m3	False
BrightPsychologyLongRetrieval	0.0594	0.1555	0.1931	BAAI/bge-m3	False
BrightPsychologyRetrieval	0.0879	0.2123	0.3104	lightonai/Reason-ModernColBERT	False
BrightRoboticsLongRetrieval	0.0792	0.1287	0.1238	BAAI/bge-m3	False
BrightRoboticsRetrieval	0.1112	0.1547	0.2181	lightonai/Reason-ModernColBERT	False
BrightStackoverflowLongRetrieval	0.1581	0.1282	0.2350	mteb/baseline-bm25s	False
BrightStackoverflowRetrieval	0.0694	0.1365	0.2425	lightonai/Reason-ModernColBERT	False
BrightSustainableLivingLongRetrieval	0.081	0.1847	0.1852	mteb/baseline-bm25s	False
BrightSustainableLivingRetrieval	0.0961	0.1515	0.2021	lightonai/Reason-ModernColBERT	False
BrightTheoremQAQuestionsRetrieval	0.1296	0.1526	0.2004	sentence-transformers/all-mpnet-base-v2	False
BrightTheoremQATheoremsRetrieval	0.0549	0.0762	0.1078	sentence-transformers/all-mpnet-base-v2	False
Average	0.102	0.1708	0.2201	nan	-

Model have high performance on these tasks: BrightRoboticsLongRetrieval

Training datasets: ArguAna, ArguAna-Fa, ArguAna-Fa.v2, ArguAna-NL, ArguAna-NL.v2, ArguAna-PL, ArguAna-VN, CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DBPedia, DBPedia-Fa, DBPedia-NL, DBPedia-PL, DBPedia-PLHardNegatives, DBPedia-VN, DBPediaHardNegatives, DBPediaHardNegatives.v2, DuRetrieval, FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, LeCaRDv2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoArguAnaRetrieval, NanoDBPedia-VN, NanoDBPediaRetrieval, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Reranking, T2Retrieval, mMARCO-NL

Note: Content truncated due to GitHub API limits. See the full report in the workflow artifacts.

Kenneth added 6 commits May 9, 2026 15:11

Add various results

f1093ea

This is for multiple PRs e.g. BRIGHT v1.1, running gemini 2 and a few other that I noted was missing.

Merge branch 'main' of https://github.com/embeddings-benchmark/results …

e14ed70

…into add-res

reduce size

38023f4

add more results

8b1167e

add res

ed4dab6

Merge branch 'add-res' into add-more-more-res

89d4d71

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a mixture of results#529

Add a mixture of results#529
KennethEnevoldsen wants to merge 6 commits into
mainfrom
add-more-more-res

KennethEnevoldsen commented May 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KennethEnevoldsen commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

github-actions Bot commented May 9, 2026

Model Results Comparison

Results for KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5

Results for Qwen/Qwen3-Embedding-0.6B

Results for google/gemini-embedding-2-preview

Results for google/siglip-base-patch16-224

Results for manveertamber/cadet-embed-base-v1

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KennethEnevoldsen commented May 9, 2026 •

edited

Loading

Results for `KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5`

Results for `Qwen/Qwen3-Embedding-0.6B`

Results for `google/gemini-embedding-2-preview`

Results for `google/siglip-base-patch16-224`

Results for `manveertamber/cadet-embed-base-v1`