model: Add LanguageBind video and audio model wrapper by myang333 · Pull Request #4557 · embeddings-benchmark/mteb

myang333 · 2026-04-29T09:34:26Z

Hey! This adds model integration for LanguageBind (ICLR 2024) — a multimodal embedding model that aligns video, audio, and text into a shared embedding space.
Models added:

LanguageBind/LanguageBind_Video_FT (video + text, MIT license)
LanguageBind/LanguageBind_Audio_FT (audio + text, MIT license)

Implementation notes:

LanguageBind uses its own library (needs to be cloned from GitHub, built on OpenCLIP)
Video and audio models are loaded separately but share the same text encoder
Embedding dim: 768

Results:

Ran evaluation on VideoRetrieval task, results JSON attached

Refs: Paper | HuggingFace | Related to #4130

Model checklist:
I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
mteb.get_model(model_name, revision) and
mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.
The model is public, i.e., is available either as an API or the weights are publicly available to download
I reproduced results from the original paper (if applicable) on at least one benchmark, and I am including the results in the PR description.(Note on the last item: LanguageBind's original paper doesn't include MTEB benchmarks, so there are no paper results to reproduce against.)

:wq#:wq

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

AdnanElAssadi56

we can remove uv.lock I guess

Samoed · 2026-05-01T05:44:53Z

@Michelleyyy333 How do you install languagebind? In logs

Updating https://github.com/PKU-YuanGroup/LanguageBind.git (HEAD)
    Updated https://github.com/PKU-YuanGroup/LanguageBind.git (7070c53375661cdb235801176b564b45f96f0648)
  × Failed to download and build `languagebind @
  │ git+[https://github.com/PKU-YuanGroup/LanguageBind.git`](https://github.com/PKU-YuanGroup/LanguageBind.git%60)
  ╰─▶ /home/runner/work/_temp/setup-uv-cache/git-v0/checkouts/a994bd661ba96114/7070c53
      does not appear to be a Python project, as neither `pyproject.toml` nor
      `setup.py` are present in the directory

we can remove uv.lock I guess

No, we shouldn't delete it

myang333 · 2026-05-01T06:03:46Z

@Samoed

Hey! The LanguageBind doesn't actually have a setup.py or pyproject.toml in their repo, so it can't be pip-installed. When I was testing it on GPU, I just cloned the repo and added it to PYTHONPATH:
git clone https://github.com/PKU-YuanGroup/LanguageBind.git
export PYTHONPATH="/path/to/LanguageBind:$PYTHONPATH"
Not sure what the best way to handle this in the pyproject.toml is — should I just remove the languagebind entry from optional deps and set extra_requirements_groups=[]?

Samoed · 2026-05-01T06:06:36Z

should I just remove the languagebind entry from optional deps and set extra_requirements_groups=[]?

Yes. Can you add an instruction with setup in class docstring?

Samoed · 2026-05-01T07:21:24Z

@myang333 Can you run this models on some tasks to see scores?

myang333 · 2026-05-01T07:29:03Z

CI tests are failing due to a timeout in the setup step (Launchpad PPA connection timeout) , doesn't seem related to my code changes. Should I retrigger the workflow or will it resolve on its own?

Samoed · 2026-05-01T08:08:35Z

I think we need to wait a bit, and I'll restart CI

myang333 · 2026-05-01T09:15:12Z

Task	Type	Model	Split	Main Score
VideoRetrieval	Retrieval	LanguageBind_Video_FT	dev	0.089
CREMA_D	AudioClassification	LanguageBind_Audio_FT	train	0.334
ESC50_Zeroshot	AudioZeroshotClassification	LanguageBind_Audio_FT	train	0.822
CIFAR10	ImageClassification	LanguageBind_Image	test	0.985
CIFAR100ZeroShot	ZeroShotClassification	LanguageBind_Image	test	0.852

Also found and fixed a bug in '_transform_audio', the audio processor expects a (waveform, sample_rate) tuple. Pushed the fix.

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Samoed · 2026-05-05T07:20:01Z

@myang333 Can you run these models on tasks from paper and try to reproduce their results?

myang333 · 2026-05-05T07:22:50Z

Sure! I'll check the paper for their reported benchmarks and try to reproduce.

AdnanElAssadi56 · 2026-05-05T22:01:04Z

+        device: str = "cuda" if torch.cuda.is_available() else "cpu",
+        fps: float | None = None,
+        max_frames: int | None = None,
+        num_frames: int | None = 8,


Did you verify this is the number it expects (used in training or recommened)?

AdnanElAssadi56 · 2026-05-05T22:02:46Z

+        if has_text:
+            text_emb = self.video_model.get_text_embeddings(
+                inputs, prompt_type=prompt_type, **kwargs
+            )
+            embeddings = text_emb if embeddings is None else embeddings + text_emb
+        if has_image:
+            image_emb = self.image_model.get_image_embeddings(inputs, **kwargs)
+            embeddings = image_emb if embeddings is None else embeddings + image_emb
+        if has_audio:
+            audio_emb = self.audio_model.get_audio_embeddings(inputs, **kwargs)
+            embeddings = audio_emb if embeddings is None else embeddings + audio_emb
+        if has_video:
+            video_emb = self.video_model.get_video_embeddings(inputs, **kwargs)
+            embeddings = video_emb if embeddings is None else embeddings + video_emb


Is this how they fuse modalities in original implementation?

No, they're separate models. I implemented similarly with how we implemented speech t5

AdnanElAssadi56

Looks good!

myang333 and others added 3 commits April 28, 2026 21:50

model: add LanguageBind wrapper

7d98e4e

model: add LanguageBind video and audio wrapper

02bb191

Merge branch 'embeddings-benchmark:main' into main

79424f4

Samoed requested changes Apr 29, 2026

View reviewed changes

Samoed added new model Questions related to adding a new model to the benchmark video video extension labels Apr 29, 2026

fix: address PR review feedback

3e1c933

:wq#:wq

myang333 mentioned this pull request Apr 30, 2026

MVEB Overview #4130

Open

72 tasks

Samoed reviewed Apr 30, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

Comment thread pyproject.toml Outdated

Apply suggestions from code review

e9279b7

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

AdnanElAssadi56 reviewed May 1, 2026

View reviewed changes

Samoed requested a review from KennethEnevoldsen May 1, 2026 06:06

fix: add setup instructions to docstring, remove uninstallable dep

e62f876

Samoed reviewed May 1, 2026

View reviewed changes

Comment thread uv.lock

fix: merge main and sync lock

6912291

Samoed approved these changes May 1, 2026

View reviewed changes

fix: update VideoCollator import path

024e282

fix: pass sample_rate tuple to audio processor

b0db4e5

lint

59ce4bc

Samoed requested changes May 1, 2026

View reviewed changes

Comment thread mteb/models/model_implementations/language_bind_models.py Outdated

Comment thread mteb/models/model_implementations/language_bind_models.py Outdated

Comment thread mteb/models/model_implementations/language_bind_models.py

Samoed removed the request for review from KennethEnevoldsen May 1, 2026 12:35

myang333 and others added 2 commits May 4, 2026 20:38

Update mteb/models/model_implementations/language_bind_models.py

7b216d5

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Update mteb/models/model_implementations/language_bind_models.py

f8e9309

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

myang333 and others added 3 commits May 4, 2026 20:57

refactor: split LanguageBind into per-modality wrappers

fc6a1d3

add omni model

80652d6

upd embedding parameters

36d3e3c

Samoed approved these changes May 5, 2026

View reviewed changes

fix imports

c830ef5

Samoed requested review from AdnanElAssadi56 and KennethEnevoldsen May 5, 2026 07:23

Samoed added audio Audio extension image The image extension of MTEB labels May 5, 2026

AdnanElAssadi56 reviewed May 5, 2026

View reviewed changes

AdnanElAssadi56 approved these changes May 5, 2026

View reviewed changes

Conversation

myang333 commented Apr 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AdnanElAssadi56 left a comment

Choose a reason for hiding this comment

Uh oh!

Samoed commented May 1, 2026

Uh oh!

myang333 commented May 1, 2026

Uh oh!

Samoed commented May 1, 2026

Uh oh!

Uh oh!

Samoed commented May 1, 2026

Uh oh!

myang333 commented May 1, 2026

Uh oh!

Samoed commented May 1, 2026

Uh oh!

myang333 commented May 1, 2026 • edited by Samoed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Samoed commented May 5, 2026

Uh oh!

myang333 commented May 5, 2026

Uh oh!

AdnanElAssadi56 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

AdnanElAssadi56 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Samoed May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AdnanElAssadi56 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

myang333 commented May 1, 2026 •

edited by Samoed

Loading

Samoed May 5, 2026 •

edited

Loading

AdnanElAssadi56 left a comment •

edited

Loading