Skip to content

Commit 6c3cb22

Browse files
committed
[Create PR]:
- Add default value to --data_path
1 parent 3b122bb commit 6c3cb22

2 files changed

Lines changed: 15 additions & 11 deletions

File tree

examples/mmlu_benchmark/README.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,13 @@ The benchmark expects a JSON file in the `mmlu_prompts_examples.json` format:
4343

4444
### Basic Evaluation
4545

46-
Evaluate a model on all MMLU tasks:
46+
Evaluate a model on all MMLU tasks (uses `arubique/flattened-MMLU` by default):
47+
48+
```bash
49+
python mmlu_benchmark.py --model_id "meta-llama/Llama-2-7b-hf"
50+
```
51+
52+
To use a local JSON file or another Hugging Face dataset:
4753

4854
```bash
4955
python mmlu_benchmark.py \
@@ -137,7 +143,7 @@ python mmlu_benchmark.py \
137143
| Argument | Description | Default |
138144
|----------|-------------|---------|
139145
| `--model_id` | HuggingFace model identifier (required) | - |
140-
| `--data_path` | Path to MMLU prompts JSON file (required) | - |
146+
| `--data_path` | Path to MMLU prompts JSON file or Hugging Face dataset repo id | `arubique/flattened-MMLU` |
141147
| `--anchor_points_path` | Path to anchor points pickle file | None |
142148
| `--output_dir` | Directory to save results | `./results` |
143149
| `--predictions_path` | Path to save predictions pickle (for DISCO) | None |

examples/mmlu_benchmark/mmlu_benchmark.py

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,29 +14,27 @@
1414
--use_full_prompt
1515
1616
Usage:
17-
# Run with default settings (evaluates on all tasks)
18-
python mmlu_benchmark.py --model_id "meta-llama/Llama-2-7b-hf" --data_path /path/to/mmlu_prompts_examples.json
17+
# Run with default settings (evaluates on all tasks; uses arubique/flattened-MMLU by default)
18+
python mmlu_benchmark.py --model_id "meta-llama/Llama-2-7b-hf"
1919
2020
# Run with anchor points filtering (for DISCO prediction)
2121
python mmlu_benchmark.py \\
2222
--model_id "alignment-handbook/zephyr-7b-sft-full" \\
23-
--data_path /path/to/mmlu_prompts_examples.json \\
2423
--anchor_points_path /path/to/anchor_points_disagreement.pkl
2524
2625
# Run with DISCO prediction (passing --disco_model_path enables it)
2726
python mmlu_benchmark.py \\
2827
--model_id "alignment-handbook/zephyr-7b-sft-full" \\
29-
--data_path /path/to/mmlu_prompts_examples.json \\
3028
--anchor_points_path /path/to/anchor_points_disagreement.pkl \\
3129
--disco_model_path /path/to/fitted_weights.pkl \\
3230
--disco_transform_path /path/to/transform.pkl \\
3331
--pca 256
3432
3533
# Run on a subset of tasks for testing
36-
python mmlu_benchmark.py \\
37-
--model_id "meta-llama/Llama-2-7b-hf" \\
38-
--data_path /path/to/mmlu_prompts_examples.json \\
39-
--limit 10
34+
python mmlu_benchmark.py --model_id "meta-llama/Llama-2-7b-hf" --limit 10
35+
36+
# Override data source (path to JSON or Hugging Face repo id)
37+
python mmlu_benchmark.py --model_id "meta-llama/Llama-2-7b-hf" --data_path /path/to/mmlu_prompts_examples.json
4038
"""
4139

4240
import argparse
@@ -90,7 +88,7 @@ def parse_args():
9088
parser.add_argument(
9189
"--data_path",
9290
type=str,
93-
required=True,
91+
default="arubique/flattened-MMLU",
9492
help="Path to MMLU prompts JSON file, or Hugging Face dataset repo id (e.g. username/mmlu-prompts-examples)",
9593
)
9694

0 commit comments

Comments
 (0)