You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
chore(flows): switch default QA LLM from qwen36-fast (4B) to qwen36-deep (27B)
The smaller qwen36-fast was the previous default for OBOL_LLM_MODEL across
release-smoke and flow-{03,04,11,13,14} plus buy-external. It's documented as
flaky on the long single-shot agent-buy prompt at flow-13/14 step 46 (see the
retry-wrapper rationale added in the prior commit, plus
plans/inference-v1337-followup-20260514.md).
Switching the default to qwen36-deep (27B-class, also served by the same
spark1 vLLM endpoint) trades a bit of latency for a much more reliable
tool-call behaviour. Operators can still pin the smaller model explicitly via
OBOL_LLM_MODEL=qwen36-fast for fast iteration on non-agent flows.
Files changed:
- flows/lib.sh, flows/release-smoke.sh, flows/flow-{03,04,11,13,14}*.sh,
flows/buy-external.sh — default value switch
- flows/lib-dual-stack.sh — WARN box in agent_buy_with_retry now
recommends checking the model is qwen36-deep first; mentions
qwen36-35b-heretic as the next escalation
- CLAUDE.md, .agents/skills/obol-stack-dev/{SKILL.md,references/*.md}
— documentation refreshed
Not changed (intentional):
- internal/{model,hermes}/*_test.go — qwen36-fast is a test fixture for
the rank parser, not a default; switching would invalidate test
expectations without changing test intent
- plans/post-490-integration-20260513.md — historical record
**Payment assertion**: don't bypass the agent buy step with a direct script exec. If the agent times out, diagnose Hermes/LiteLLM/model routing — don't relax the assertion. Required evidence: `PurchaseRequest Ready=True` + paid HTTP 200 + on-chain `Transfer` + exact balance deltas.
49
49
50
-
**QA LLM**: full seller/buyer QA must route Alice and Bob through `OBOL_LLM_ENDPOINT` (OpenAI-compatible vLLM or llama.cpp on the QA host). Default `OBOL_LLM_MODEL=qwen36-fast`. Sequence: `obol model setup custom` → `obol model prefer` → one `obol model sync`. Local Ollama and cloud-fallback are **not** acceptable green substitutes for full-flow QA.
50
+
**QA LLM**: full seller/buyer QA must route Alice and Bob through `OBOL_LLM_ENDPOINT` (OpenAI-compatible vLLM or llama.cpp on the QA host). Default `OBOL_LLM_MODEL=qwen36-deep` (27B-class). The smaller `qwen36-fast` (~4B) was the previous default but flakes on the long single-shot agent-buy prompt at flow-13/14 step 46 — see the retry-wrapper rationale in `flows/lib-dual-stack.sh::agent_buy_with_retry`. Sequence: `obol model setup custom` → `obol model prefer` → one `obol model sync`. Local Ollama and cloud-fallback are **not** acceptable green substitutes for full-flow QA.
51
51
52
52
**Public vs private routes**: `/services/*`, `/.well-known/agent-registration.json`, `/skill.md`, and `/` (storefront) are public via the tunnel. **NEVER** remove `hostnames: ["obol.stack"]` from frontend or eRPC HTTPRoutes — exposing them publicly is a critical security flaw.
Copy file name to clipboardExpand all lines: flows/flow-04-agent.sh
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -110,8 +110,8 @@ fi
110
110
111
111
model_name=$("$OBOL" kubectl get cm hermes-config -n "$NS" -o jsonpath='{.data.config\.yaml}'2>/dev/null | sed -n 's/^[[:space:]]*default: //p'| tr -d '"'| head -1)
112
112
[ -n"$model_name" ] || model_name="qwen3.5:35b"
113
-
if [ -n"${OBOL_LLM_ENDPOINT:-}" ] && [ "$model_name"!="${OBOL_LLM_MODEL:-qwen36-fast}" ];then
114
-
fail "Hermes default model $model_name does not match QA LLM model ${OBOL_LLM_MODEL:-qwen36-fast}"
113
+
if [ -n"${OBOL_LLM_ENDPOINT:-}" ] && [ "$model_name"!="${OBOL_LLM_MODEL:-qwen36-deep}" ];then
114
+
fail "Hermes default model $model_name does not match QA LLM model ${OBOL_LLM_MODEL:-qwen36-deep}"
0 commit comments