You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that there are wild differences between models when it comes to the ability to use OpenAgentsControl. For example, MiMo-V2-Flash Minimax M2.5 can't really delegate unless prompted directly, while Qwen3.5-397B-A17B is pretty good when it comes to managing a coding session. Strangely, minimax m2.5 passes more "golden" tests than qwen3.5, due to a lack of asking for approval in my experience.
Obviously, those are not as good as sonnet/opus.
I think it would be a good idea to round up the past 1 month's top models based on OpenRouter rankings with some simple viability tests and possibly include these in the readme.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I noticed that there are wild differences between models when it comes to the ability to use OpenAgentsControl. For example, MiMo-V2-Flash Minimax M2.5 can't really delegate unless prompted directly, while Qwen3.5-397B-A17B is pretty good when it comes to managing a coding session. Strangely, minimax m2.5 passes more "golden" tests than qwen3.5, due to a lack of asking for approval in my experience.
Obviously, those are not as good as sonnet/opus.
I think it would be a good idea to round up the past 1 month's top models based on OpenRouter rankings with some simple viability tests and possibly include these in the readme.
https://openrouter.ai/rankings?view=month
Beta Was this translation helpful? Give feedback.
All reactions