Skip to content

Updating the leaderboard #18

@BradKML

Description

@BradKML

There are so many models that are coming out, including Qwen3, Kimi K2, DeepSeek (updated versions), GLM4.5, and even recently, GPT-OSS!

Considering how OpenAI is suspiciously high up in instruction following benchmarks relative to math and reasoning benchmarks, it is good to test all the SOTA models https://artificialanalysis.ai/evaluations/ifbench

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions