Skip to content

Commit c0f81b9

Browse files
committed
[Move DISCO queue to core]:
- Update dependencies
1 parent 304e54c commit c0f81b9

2 files changed

Lines changed: 22 additions & 6 deletions

File tree

docs/benchmark/mmlu.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,16 +18,22 @@ Check out the [BENCHMARKS.md](https://github.com/parameterlab/MASEval/blob/main/
1818

1919
## Installation
2020

21-
MMLU has an optional dependency extra (currently empty, as core MMLU requires no additional packages):
21+
Install MMLU with all dependencies needed to run the HuggingFace benchmark and example script:
2222

2323
```bash
2424
pip install maseval[mmlu]
2525
```
2626

27-
For the HuggingFace implementation, also install transformers:
27+
Or with uv:
2828

2929
```bash
30-
pip install maseval[mmlu,transformers]
30+
uv sync --extra mmlu
31+
```
32+
33+
This installs `transformers`, `torch`, `numpy`, and `huggingface_hub` (the latter two via `transformers`). You can then run the example:
34+
35+
```bash
36+
python examples/mmlu_benchmark/mmlu_benchmark.py --model_id alignment-handbook/zephyr-7b-sft-full
3137
```
3238

3339
For DISCO prediction support:

pyproject.toml

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -82,10 +82,20 @@ multiagentbench = [
8282
]
8383
tau2 = ["docstring-parser>=0.16", "addict>=2.4.0"]
8484
converse = []
85-
mmlu = []
85+
# HuggingFace model + tokenizer, default dataset download; numpy for example script and anchor-point loading;
86+
# lm-eval for --use_lmeval_batching (exact lm-evaluation-harness reproduction); aiohttp required by lm_eval.models.api_models
87+
mmlu = [
88+
"transformers>=4.37.0",
89+
"numpy>=1.20.0",
90+
"aiohttp>=3.9.0",
91+
"lm-eval @ git+https://github.com/arubique/lm-evaluation-harness.git@main",
92+
]
8693

87-
# LM Evaluation Harness (for HuggingFaceMMLUBenchmark.precompute_all_logprobs_lmeval)
88-
lm-eval = ["lm-eval @ git+https://github.com/arubique/lm-evaluation-harness.git@main"]
94+
# LM Evaluation Harness (same as in mmlu; aiohttp required by lm_eval.models.api_models)
95+
lm-eval = [
96+
"aiohttp>=3.9.0",
97+
"lm-eval @ git+https://github.com/arubique/lm-evaluation-harness.git@main",
98+
]
8999

90100
# DISCO prediction (for MMLU benchmark example)
91101
disco = [

0 commit comments

Comments
 (0)