Skip to content

Commit 0a9cd51

Browse files
docs: add MOVA Benchmark for Arena to README (#50)
Release the 732-sample evaluation benchmark on Hugging Face, covering MOVA-Bench (132 samples, 7 categories) and bilingual VerseBench (600 samples). Add News entry, Evaluation subsection with dataset link, and mark Evaluation Benchmark as done in TODO.
1 parent 529ea03 commit 0a9cd51

1 file changed

Lines changed: 15 additions & 0 deletions

File tree

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ We introduce **MOVA** (**MO**SS **V**ideo and **A**udio), a foundation model des
2727
- **Asymmetric Dual-Tower Architecture**: Leverages the power of pre-trained video and audio towers, fused via a bidirectional cross-attention mechanism for rich modality interaction.
2828

2929
## 🔥News!!!
30+
- 2026/03/14: 🎉We released the **MOVA Benchmark for Arena** (732 samples) on [🤗 Hugging Face](https://huggingface.co/datasets/zhiyuzhang-0212/MOVA_benchmark_for_arena) for reproducible evaluation.
3031
- 2026/03/09: 🎉**MOVA API** is now available! Apply for your API key at [studio.mosi.cn](https://studio.mosi.cn/docs/models/mova?src=github) to start generating videos programmatically.
3132
- 2026/03/09: 🎉**ComfyUI support** is here! Thanks to [@richservo](https://github.com/richservo), you can now use MOVA in ComfyUI at low cost via [comfyui-mova](https://github.com/richservo/comfyui-mova).
3233
- 2026/02/10: 🎉We released **MOVA** [technical report](https://arxiv.org/abs/2602.08794) and update [inference workflow](https://github.com/OpenMOSS/MOVA/pull/29).
@@ -153,6 +154,19 @@ Below are the Elo scores and win rates comparing MOVA to existing open-source mo
153154
<img src="./assets/winrate.png" alt="Win rate comparison" width="100%"/>
154155
</p>
155156

157+
### MOVA Benchmark for Arena
158+
159+
We release the **MOVA Benchmark for Arena** on Hugging Face for reproducible subjective evaluation. The benchmark contains **732 samples** organized into two subsets:
160+
161+
| Subset | Samples | Description |
162+
|--------|---------|-------------|
163+
| MOVA-Bench | 132 | Real-world scenarios across 7 categories: multi-speaker (27), movie (12), sports (20), games (20), shot-effect (30), anime (20), and others (3) |
164+
| VerseBench (Bilingual) | 600 | Bilingual English-Chinese speech data adapted from [VerseBench](https://huggingface.co/datasets/dorni/Verse-Bench), split into set1 (205), set2 (295), and set3 (100) |
165+
166+
Each sample includes a **first-frame image** and a **prompt** (rewritten by the workflow introduced in the paper) for joint image-text to video-audio generation.
167+
168+
🤗 **Download**: [zhiyuzhang-0212/MOVA_benchmark_for_arena](https://huggingface.co/datasets/zhiyuzhang-0212/MOVA_benchmark_for_arena)
169+
156170
## SGLang Integration
157171

158172
[SGLang](https://github.com/sgl-project/sglang) provides Day0-support for MOVA. You can use the latest SGLang release and the examples below for high-throughput inference.
@@ -309,6 +323,7 @@ All peak usage numbers below are measured on **360p, 8-second** video training s
309323
- [x] Technical Report
310324
- [x] API Access
311325
- [x] ComfyUI Integration
326+
- [x] Arena Benchmark
312327
- [ ] Diffusers Integration
313328

314329
## Citation

0 commit comments

Comments
 (0)