There are so many models that are coming out, including Qwen3, Kimi K2, DeepSeek (updated versions), GLM4.5, and even recently, GPT-OSS!
Considering how OpenAI is suspiciously high up in instruction following benchmarks relative to math and reasoning benchmarks, it is good to test all the SOTA models https://artificialanalysis.ai/evaluations/ifbench
There are so many models that are coming out, including Qwen3, Kimi K2, DeepSeek (updated versions), GLM4.5, and even recently, GPT-OSS!
Considering how OpenAI is suspiciously high up in instruction following benchmarks relative to math and reasoning benchmarks, it is good to test all the SOTA models https://artificialanalysis.ai/evaluations/ifbench