Updating the leaderboard

There are so many models that are coming out, including Qwen3, Kimi K2, DeepSeek (updated versions), GLM4.5, and even recently, GPT-OSS!

Considering how OpenAI is suspiciously high up in instruction following benchmarks relative to math and reasoning benchmarks, it is good to test all the SOTA models https://artificialanalysis.ai/evaluations/ifbench