Skip to content

Commit f8c6afc

Browse files
Increase ReasoningBudgetTest N_PREDICT to reduce CI flakiness (#164)
Qwen3-0.6B at Q4 occasionally consumes the full 500-token budget while still inside the <think> block on slow CI runners (2-thread x86_64 hosted), leaving content empty and failing testThinkingDefault_reasoningContentAndAnswerPresent. Bumping the budget to 1500 gives comfortable headroom for thinking plus the answer without changing the test's intent. Co-authored-by: Claude <noreply@anthropic.com>
1 parent 45f1a56 commit f8c6afc

1 file changed

Lines changed: 7 additions & 3 deletions

File tree

src/test/java/net/ladenthin/llama/ReasoningBudgetTest.java

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -56,10 +56,14 @@
5656
public class ReasoningBudgetTest {
5757

5858
/**
59-
* Generous token budget: Qwen3-0.6B spends up to ~200 tokens thinking before answering.
60-
* 500 is enough for thinking + a short answer on all tested platforms.
59+
* Generous token budget: Qwen3-0.6B typically spends ~200 tokens thinking before
60+
* answering, but on slow/contended CI runners (e.g. 2-thread GitHub-hosted x86_64)
61+
* the model occasionally rambles past 500 tokens while still inside the
62+
* {@code <think>} block, leaving {@code content} empty and failing
63+
* {@link #testThinkingDefault_reasoningContentAndAnswerPresent}. 1500 leaves
64+
* comfortable headroom for thinking + a short answer across all tested platforms.
6165
*/
62-
private static final int N_PREDICT = 500;
66+
private static final int N_PREDICT = 1500;
6367

6468
private static LlamaModel model;
6569
private final ChatResponseParser parser = new ChatResponseParser();

0 commit comments

Comments
 (0)