General quesiton/idea: Leaderboard or performance roundup for most popular models #267

bkutasi · 2026-03-13T12:34:56Z

bkutasi
Mar 13, 2026

I noticed that there are wild differences between models when it comes to the ability to use OpenAgentsControl. For example, MiMo-V2-Flash Minimax M2.5 can't really delegate unless prompted directly, while Qwen3.5-397B-A17B is pretty good when it comes to managing a coding session. Strangely, minimax m2.5 passes more "golden" tests than qwen3.5, due to a lack of asking for approval in my experience.
Obviously, those are not as good as sonnet/opus.
I think it would be a good idea to round up the past 1 month's top models based on OpenRouter rankings with some simple viability tests and possibly include these in the readme.

https://openrouter.ai/rankings?view=month

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General quesiton/idea: Leaderboard or performance roundup for most popular models #267

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

General quesiton/idea: Leaderboard or performance roundup for most popular models #267

Uh oh!

bkutasi Mar 13, 2026

Replies: 0 comments

bkutasi
Mar 13, 2026