Questions on reproducing SWE-Bench-Pro Leaderboard results

Hi, thank you so much for releasing code for the inspiring SWE-Bench-Pro benchmarks.

We're trying to reproduce results on the [official leaderboard](https://labs.scale.com/leaderboard/swe_bench_pro_public), and I'm wondering if you could kindly share the below with us to help us match your results:

- Yaml Configs (including hyper-params, cost cap, etc) to be used by this repo

- Per-instance trajectory (similar to these [trajectories](https://docent.transluce.org/dashboard/032fb63d-4992-4bfc-911d-3b7dafcb931f/agent_run) but for more recent models on leaderboard, which would help us debug closely)

for the below models:
`claude-opus-4-5-20251101`, `claude-4-5-Sonnet`, `claude-4-5-haiku`, `qwen3-coder-480b-a35b`, `minimax-2.1`

so we could better reproduce their results on the [leaderboard](https://labs.scale.com/leaderboard/swe_bench_pro_public).

Thank you very much for your time and help! @jeff-da @18vijayb 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions on reproducing SWE-Bench-Pro Leaderboard results #97

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Questions on reproducing SWE-Bench-Pro Leaderboard results #97

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions