| Model | Method |
|---|---|
| DeepSeek-R1-Distill-Qwen-1.5B | Best-of-N w/ orginal decoding |
| Best-of-N w/ CyclicReflex | |
| Beam search w/ orginal decoding | |
| Beam search w/ CyclicReflex |
Each approach can be launched by specifying the associated YAML file, for example:
export CONFIG=recipes/DeepSeek-R1-Distill-Qwen-1.5B/best_of_n_cyclical.yaml
python scripts/test_time_compute.py $CONFIG --dataset_name=HuggingFaceH4/MATH-500 --dataset_split=trainTo get the final numbers for the evalations, we use a fork of the Qwen2.5-Math evaluation repo. Please follow the installation and usage instructions in our fork to obtain accuracies on MATH-500.