Commit 8f142bc
[Feat] v0.5 Release Pack (#846)
* add scibench task (full) and change medqa (#840)
* add scibench task (full ) and change medqa
* run precommit
---------
Co-authored-by: pbcong <congphamba2005@gmail.com>
* add csbench (#841)
* add csbench
* run precommit
---------
Co-authored-by: pbcong <congphamba2005@gmail.com>
* fix linting (#842)
* [Feature] Add WenetSpeech Dataset (#837)
* [fix] batch size in openai compatible endpoint (#835)
* more
* more
* more
* more
* more
* more
* more
* more
* more
* more
* more
* more
* more
* more
* [Feature] Add WenetSpeech Dataset
* add lmms-eval-0.5 doc's 1st draft
* remove unneccessary parts in lmms-eval-0.5.md
---------
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
* This commit documents the official release of **LMMS-Eval v0.5: Multimodal Expansion**, detailing significant new features including:
* A comprehensive **audio evaluation suite** (Step2 Audio Paralinguistic, VoiceBench, WenetSpeech).
* A production-ready **response caching system**.
* Integration of **five new models** (e.g., GPT-4o Audio Preview, Gemma-3).
* Addition of **numerous new benchmarks** across vision, coding, and STEM domains.
* Support for the **Model Context Protocol (MCP)** and improvements to **Async OpenAI integration**.
* This commit formally announces and documents the **LMMS-Eval v0.5: Multimodal Expansion** release, updating the `README.md` and refining the `v0.5` release notes with improved structure and reproducibility validation for new benchmarks.
* Updates the status legend for reproducibility validation in the LMMS-Eval v0.5 release notes, changing '†' to '+-'.
* Revise metrics and model integration in lmms-eval doc
Updated metrics and model integration details in the documentation.
* Fix model name in LMMs-Eval v0.5 announcement
Corrected the name of the model 'GPT-4o Audio' to 'GPT-4o Audio Preview' in the announcement section.
---------
Co-authored-by: Do Duc Anh (Erwin) <104162175+KelvinDo183@users.noreply.github.com>
Co-authored-by: pbcong <congphamba2005@gmail.com>
Co-authored-by: Cong <101887866+pbcong@users.noreply.github.com>
Co-authored-by: JAM_Yichen <110095482+YichenG170@users.noreply.github.com>
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>1 parent 36dcfbb commit 8f142bc
20 files changed
Lines changed: 1184 additions & 151 deletions
File tree
- docs
- lmms_eval/tasks
- csbench
- lemonade
- medqa
- scibench
- wenet_speech
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
0 commit comments