Skip to content

Questions on reproducing SWE-Bench-Pro Leaderboard results #97

Description

@LeoXinhaoLee

Hi, thank you so much for releasing code for the inspiring SWE-Bench-Pro benchmarks.

We're trying to reproduce results on the official leaderboard, and I'm wondering if you could kindly share the below with us to help us match your results:

  • Yaml Configs (including hyper-params, cost cap, etc) to be used by this repo

  • Per-instance trajectory (similar to these trajectories but for more recent models on leaderboard, which would help us debug closely)

for the below models:
claude-opus-4-5-20251101, claude-4-5-Sonnet, claude-4-5-haiku, qwen3-coder-480b-a35b, minimax-2.1

so we could better reproduce their results on the leaderboard.

Thank you very much for your time and help! @jeff-da @18vijayb

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions