Hi, thank you so much for releasing code for the inspiring SWE-Bench-Pro benchmarks.
We're trying to reproduce results on the official leaderboard, and I'm wondering if you could kindly share the below with us to help us match your results:
-
Yaml Configs (including hyper-params, cost cap, etc) to be used by this repo
-
Per-instance trajectory (similar to these trajectories but for more recent models on leaderboard, which would help us debug closely)
for the below models:
claude-opus-4-5-20251101, claude-4-5-Sonnet, claude-4-5-haiku, qwen3-coder-480b-a35b, minimax-2.1
so we could better reproduce their results on the leaderboard.
Thank you very much for your time and help! @jeff-da @18vijayb
Hi, thank you so much for releasing code for the inspiring SWE-Bench-Pro benchmarks.
We're trying to reproduce results on the official leaderboard, and I'm wondering if you could kindly share the below with us to help us match your results:
Yaml Configs (including hyper-params, cost cap, etc) to be used by this repo
Per-instance trajectory (similar to these trajectories but for more recent models on leaderboard, which would help us debug closely)
for the below models:
claude-opus-4-5-20251101,claude-4-5-Sonnet,claude-4-5-haiku,qwen3-coder-480b-a35b,minimax-2.1so we could better reproduce their results on the leaderboard.
Thank you very much for your time and help! @jeff-da @18vijayb