Skip to content

Commit 777689a

Browse files
authored
chore(ci): upgrade e2e Ollama model from qwen2.5:3b to qwen3:1.7b (#2147)
qwen2.5:3b scores 0.670 on tool-calling benchmarks with restraint 0.500 and 1 wrong tool call, causing flaky Playground e2e tests. qwen3:1.7b scores 0.960 with perfect restraint (1.000) and zero wrong tool calls. Half the size, more reliable tool calling. Benchmark: https://github.com/MikeVeerman/tool-calling-benchmark
1 parent 40989ba commit 777689a

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

.github/workflows/_e2e.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ on:
44

55
env:
66
OLLAMA_BASE_URL: http://localhost:11434
7-
OLLAMA_MODEL: qwen2.5:3b
7+
OLLAMA_MODEL: qwen3:1.7b
88
OLLAMA_VERSION: 0.18.2
99

1010
jobs:

0 commit comments

Comments
 (0)