Commit 777689a
authored
chore(ci): upgrade e2e Ollama model from qwen2.5:3b to qwen3:1.7b (#2147)
qwen2.5:3b scores 0.670 on tool-calling benchmarks with restraint 0.500
and 1 wrong tool call, causing flaky Playground e2e tests.
qwen3:1.7b scores 0.960 with perfect restraint (1.000) and zero wrong
tool calls. Half the size, more reliable tool calling.
Benchmark: https://github.com/MikeVeerman/tool-calling-benchmark1 parent 40989ba commit 777689a
1 file changed
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| |||
0 commit comments