Make sure you have Python 3.12+ and uv installed. uv is a faster alternative to pip and virtualenv.
# Install uv if not already available
pip install uv
# Create a virtual environment using uv
uv venv -p python3.12 .venv
source .venv/bin/activate
# Install project dependencies from pyproject.toml
uv pip install -r pyproject.toml🔐 Environment Configuration
## .env file example
API_KEY = "sk-or-v1-more_characters"
BASE_URL = "https://openrouter.ai/api/v1/chat/completions"
python main.py
# Run with specific datasets
python main.py --datasets mmlu tau_cqa
# Run with specific models
python main.py --models gpt_4o
# Run with specific models and datasets
python main.py --models gpt_4o --datasets mmlu
# Change number of runs and questions
python main.py --runs 10 --questions 20
# Combine all options
python main.py --models llama_33_70b_instruct --datasets ai2_arc --runs 3 --questions 10To fully reproduce the results, simply run:
python main.pyTo reduce runtime and cost:
- Run experiments on fewer models or datasets.
- Comment out steps you can run once (e.g., Hugging Face dataset downloads).
Lastly, be aware this execution is costly so be sure before you execute the code.