chore(ci): upgrade e2e Ollama model from qwen2.5:3b to qwen3:1.7b (#2147)

peppescg · web-flow · commit 777689ad0f13 · 2026-04-29T17:43:49.000+02:00
qwen2.5:3b scores 0.670 on tool-calling benchmarks with restraint 0.500 and 1 wrong tool call, causing flaky Playground e2e tests. qwen3:1.7b scores 0.960 with perfect restraint (1.000) and zero wrong tool calls. Half the size, more reliable tool calling. Benchmark: https://github.com/MikeVeerman/tool-calling-benchmark
diff --git a/.github/workflows/_e2e.yml b/.github/workflows/_e2e.yml
@@ -4,7 +4,7 @@ on:
 
 env:
   OLLAMA_BASE_URL: http://localhost:11434
-  OLLAMA_MODEL: qwen2.5:3b
+  OLLAMA_MODEL: qwen3:1.7b
   OLLAMA_VERSION: 0.18.2
 
 jobs: