-
Notifications
You must be signed in to change notification settings - Fork 67.1k
Expand file tree
/
Copy pathmodel-comparison.yml
More file actions
137 lines (112 loc) · 6.03 KB
/
model-comparison.yml
File metadata and controls
137 lines (112 loc) · 6.03 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
# Please keep this list sorted in the following order:
# 1. By provider, in this order:
# - OpenAI
# - Anthropic
# - Google
# - xAI
# 2. Within each provider group, alphabetically by model name.
# OpenAI
- name: GPT-4.1
task_area: General-purpose coding and writing
excels_at: Fast, accurate code completions and explanations
further_reading: '[GPT-4.1 model card](https://openai.com/index/gpt-4-1/)'
- name: GPT-5 mini
task_area: General-purpose coding and writing
excels_at: Fast, accurate code completions and explanations
further_reading: '[GPT-5 mini model card](https://cdn.openai.com/gpt-5-system-card.pdf)'
- name: GPT-5.1
task_area: Deep reasoning and debugging
excels_at: Multi-step problem solving and architecture-level code analysis
further_reading: '[GPT-5.1 model card](https://cdn.openai.com/pdf/4173ec8d-1229-47db-96de-06d87147e07e/5_1_system_card.pdf)'
- name: GPT-5.1-Codex
task_area: Deep reasoning and debugging
excels_at: Multi-step problem solving and architecture-level code analysis
further_reading: 'Not available'
- name: GPT-5.1 Codex Max
task_area: Agentic software development
excels_at: Agentic tasks
further_reading: '[GPT-5.1-Codex-Max model card](https://cdn.openai.com/pdf/2a7d98b1-57e5-4147-8d0e-683894d782ae/5p1_codex_max_card_03.pdf)'
- name: GPT-5.1-Codex-Mini
task_area: Deep reasoning and debugging
excels_at: Multi-step problem solving and architecture-level code analysis
further_reading: 'Not available'
- name: GPT-5.2
task_area: Deep reasoning and debugging
excels_at: Multi-step problem solving and architecture-level code analysis
further_reading: '[GPT-5.2 model card](https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944f8d/oai_5_2_system-card.pdf)'
- name: GPT-5.2-Codex
task_area: Agentic software development
excels_at: Agentic tasks
further_reading: '[GPT-5.2-Codex model card](https://cdn.openai.com/pdf/ac7c37ae-7f4c-4442-b741-2eabdeaf77e0/oai_5_2_Codex.pdf)'
- name: GPT-5.3-Codex
task_area: Agentic software development
excels_at: Agentic tasks
further_reading: '[GPT-5.3-Codex model card](https://deploymentsafety.openai.com/gpt-5-3-codex)'
- name: GPT-5.4
task_area: Deep reasoning and debugging
excels_at: Multi-step problem solving and architecture-level code analysis
further_reading: '[GPT-5.4 model card](https://deploymentsafety.openai.com/gpt-5-4-thinking/introduction)'
- name: GPT-5.4 mini
task_area: Agentic software development
excels_at: Codebase exploration and is especially effective when using grep-style tools
further_reading: 'Not available'
# Anthropic
- name: Claude Haiku 4.5
task_area: Fast help with simple or repetitive tasks
excels_at: Fast, reliable answers to lightweight coding questions
further_reading: '[Claude Haiku 4.5 model card](https://assets.anthropic.com/m/99128ddd009bdcb/Claude-Haiku-4-5-System-Card.pdf)'
- name: Claude Opus 4.5
task_area: Deep reasoning and debugging
excels_at: Complex problem-solving challenges, sophisticated reasoning
further_reading: '[Claude Opus 4.5 model card](https://assets.anthropic.com/m/64823ba7485345a7/Claude-Opus-4-5-System-Card.pdf)'
- name: Claude Opus 4.6
task_area: Deep reasoning and debugging
excels_at: Complex problem-solving challenges, sophisticated reasoning
further_reading: '[Claude Opus 4.6 model card](https://www-cdn.anthropic.com/14e4fb01875d2a69f646fa5e574dea2b1c0ff7b5.pdf)'
- name: Claude Opus 4.6 (fast mode) (preview)
task_area: Deep reasoning and debugging
excels_at: Complex problem-solving challenges, sophisticated reasoning
further_reading: 'Not available'
- name: Claude Sonnet 4.0
task_area: Deep reasoning and debugging
excels_at: Performance and practicality, perfectly balanced for coding workflows
further_reading: '[Claude Sonnet 4.0 model card](https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdf)'
- name: Claude Sonnet 4.5
task_area: General-purpose coding and agent tasks
excels_at: Complex problem-solving challenges, sophisticated reasoning
further_reading: '[Claude Sonnet 4.5 model card](https://assets.anthropic.com/m/12f214efcc2f457a/original/Claude-Sonnet-4-5-System-Card.pdf)'
- name: Claude Sonnet 4.6
task_area: General-purpose coding and agent tasks
excels_at: Complex problem-solving challenges, sophisticated reasoning
further_reading: '[Claude Sonnet 4.6 model card](https://www-cdn.anthropic.com/78073f739564e986ff3e28522761a7a0b4484f84.pdf)'
# Google
- name: Gemini 2.5 Pro
task_area: Deep reasoning and debugging
excels_at: Complex code generation, debugging, and research workflows
further_reading: '[Gemini 2.5 Pro model card](https://storage.googleapis.com/model-cards/documents/gemini-2.5-pro.pdf)'
- name: Gemini 3 Flash
task_area: Fast help with simple or repetitive tasks
excels_at: Fast, reliable answers to lightweight coding questions
further_reading: '[Gemini 3 Flash model card](https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Flash-Model-Card.pdf)'
- name: Gemini 3 Pro
task_area: Deep reasoning and debugging
excels_at: Complex code generation, debugging, and research workflows
further_reading: '[Gemini 3 Pro model card](https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf)'
- name: Gemini 3.1 Pro
task_area: Deep reasoning and debugging
excels_at: Effective and efficient edit-then-test loops with high tool precision
further_reading: 'not applicable'
# xAI
- name: Grok Code Fast 1
task_area: General-purpose coding and writing
excels_at: Fast, accurate code completions and explanations
further_reading: '[Grok Code Fast 1 model card](https://data.x.ai/2025-08-20-grok-4-model-card.pdf)'
# Other providers (alphabetized by model name)
- name: Qwen2.5
task_area: General-purpose coding and writing
excels_at: Code generation, reasoning, and code repair / debugging
further_reading: '[Qwen2.5 model card](https://arxiv.org/pdf/2409.12186)'
- name: Raptor mini
task_area: General-purpose coding and writing
excels_at: Fast, accurate code completions and explanations
further_reading: 'Coming soon'