Increase ReasoningBudgetTest N_PREDICT to reduce CI flakiness (#164)

bernardladenthin · claude · web-flow · commit f8c6afcf8747 · 2026-05-18T16:12:34.000+02:00
Qwen3-0.6B at Q4 occasionally consumes the full 500-token budget while
still inside the &lt;think&gt; block on slow CI runners (2-thread x86_64
hosted), leaving content empty and failing
testThinkingDefault_reasoningContentAndAnswerPresent. Bumping the
budget to 1500 gives comfortable headroom for thinking plus the
answer without changing the test's intent.

Co-authored-by: Claude &lt;noreply@anthropic.com&gt;
diff --git a/src/test/java/net/ladenthin/llama/ReasoningBudgetTest.java b/src/test/java/net/ladenthin/llama/ReasoningBudgetTest.java
@@ -56,10 +56,14 @@
 public class ReasoningBudgetTest {
 
     /**
-     * Generous token budget: Qwen3-0.6B spends up to ~200 tokens thinking before answering.
-     * 500 is enough for thinking + a short answer on all tested platforms.
+     * Generous token budget: Qwen3-0.6B typically spends ~200 tokens thinking before
+     * answering, but on slow/contended CI runners (e.g. 2-thread GitHub-hosted x86_64)
+     * the model occasionally rambles past 500 tokens while still inside the
+     * {@code <think>} block, leaving {@code content} empty and failing
+     * {@link #testThinkingDefault_reasoningContentAndAnswerPresent}. 1500 leaves
+     * comfortable headroom for thinking + a short answer across all tested platforms.
      */
-    private static final int N_PREDICT = 500;
+    private static final int N_PREDICT = 1500;
 
     private static LlamaModel model;
     private final ChatResponseParser parser = new ChatResponseParser();