Skip to content

Commit a609a0f

Browse files
abrichrclaude
andauthored
feat: add systematic model comparison framework (#188)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 2263fa6 commit a609a0f

2 files changed

Lines changed: 964 additions & 0 deletions

File tree

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Example comparison config: API models on WAA tasks
2+
# Usage:
3+
# python scripts/compare_models.py --config example_comparisons/api_models.yaml
4+
# python scripts/compare_models.py --config example_comparisons/api_models.yaml --manage-vm --setup-tunnels
5+
6+
name: "API Model Comparison"
7+
description: "Compare GPT-5.4-mini vs GPT-4o-mini as unified desktop agents"
8+
9+
tasks:
10+
- example_tasks/notepad-hello.yaml
11+
- example_tasks/clear-browsing-data-chrome.yaml
12+
13+
models:
14+
- name: gpt-5.4-mini
15+
provider: openai
16+
type: unified
17+
18+
- name: gpt-4o-mini
19+
provider: openai
20+
type: unified
21+
22+
server_url: http://localhost:5001
23+
max_steps: 10
24+
runs_per_config: 1
25+
save_screenshots: true
26+
output_dir: comparison_results/

0 commit comments

Comments
 (0)