Skip to content

Commit f148bde

Browse files
unamedkrclaude
andcommitted
state: R40 meaningful prompt + thinking-mode — new attractor (requirements loop)
User insight: meaningful questions needed for meaningful long output. Tested rich structured KR ML-paradigms prompt + TQ_ENABLE_THINKING=1. Thinking mode engages correctly (English CoT starts), but hits a NEW repetition attractor at 71 tokens: "requirements/requirements/...". EOS rank profile shows this is NOT an EOS-block (rank stays ~100-222); it's a pure semantic lock — model stuck on one word, margin peaks at 7.8 logits on it. So 35B long-gen has THREE distinct failure modes that coexist: 1. EOS-block (R39): model wants to stop, top1 overrules → alphabet walk 2. Semantic lock-in (R40): model locked on one word attractor 3. Router collapse (pre-v0.28.0): 117-tok math loop, T=2.0 fixed this Brutal honesty: 1000-tok coherent not achievable today. All three modes appear regardless of prompt quality or thinking mode. T=2.0 softened one attractor but left others. Paths for future: position-aware rep-penalty, DRY sampler, upstream chat-template work, or different base model. Diagnostic suite now covers all three modes (logit-probe + residual-probe + moe-probe + delta-probe). This session's Phase 3 ends with infrastructure complete, honest null on 1000-tok ceiling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c551293 commit f148bde

1 file changed

Lines changed: 55 additions & 0 deletions

File tree

.claude/state.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,61 @@
33
**Last updated**: 2026-04-22 (Phase 2 KV clean-bill)
44
**Session HEAD**: turbo_kv_4b per-arch per-layer clean-bill LANDED via chunked TQ_KV_PROBE. 7×/+0% PPL claim now validated element-by-element across Llama, Qwen3-0.6B, Qwen3.5-4B, Qwen3.6-35B.
55

6+
## Phase 3 R40 — Meaningful prompt + thinking-mode still hits NEW attractor (2026-04-22)
7+
8+
User follow-up: "모델이 의미있는 긴 문장을 생성하도록 유의미한 질문도 생성해야 하는거 아닌가요?"
9+
(Must we also construct meaningful questions for the model to produce
10+
meaningful long output?) — correct observation, tested.
11+
12+
Setup:
13+
- Q5_K_M model
14+
- Rich structured KR prompt: "5 ML paradigms, each with (1) core idea,
15+
(2) 2 algorithms, (3) industry case, (4) pros/cons"
16+
- `TQ_ENABLE_THINKING=1` (open `<think>\n` instead of empty)
17+
18+
Result:
19+
- Thinking mode engages, generates English CoT ("Here's a thinking
20+
process: 1. Analyze the request: The user wants...")
21+
- **Hits new attractor at 71 tokens: "requirements/requirements/requirements"**
22+
- EOS rank path (not EOS-block this time):
23+
- pos=90: EOS rank 2523 (normal, mid-CoT)
24+
- pos=120: EOS rank 178
25+
- pos=135: EOS rank 100, margin=7.8 (PEAKY "requirements" lock)
26+
- pos=150: EOS rank 222, margin=1.7 (trying to recover)
27+
28+
This is a DIFFERENT failure mode from R38-39:
29+
- R38 "alphabet walk" — EOS rank climbing (model wants to stop)
30+
- R40 "requirements loop" — EOS rank NOT particularly close; model
31+
semantically locked on one word and can't move forward
32+
33+
**Three distinct degradation modes coexist** on 35B long-gen:
34+
1. EOS-block (R39): model signals termination but raw-completion forces
35+
more tokens → alphabet walk
36+
2. Semantic lock-in (R40): model enters attractor on a specific word
37+
(like "requirements") for multiple consecutive tokens
38+
3. Router collapse (R26/pre): now default-fixed by T=2.0, but analogous
39+
attractors remain at other places
40+
41+
### Brutal honesty on the 1000-tok target
42+
43+
Not achievable with current engine + weights. All three failure modes
44+
appear regardless of prompt quality or thinking-mode state. They emerge
45+
from the TEMP=2.0 spread softening one attractor but leaving others.
46+
47+
Paths for future work:
48+
- Multi-attractor suppression: stronger rep-penalty that scales with
49+
position and peakiness (not just token match)
50+
- Fixed-k block self-attention: prevent any one token from dominating
51+
attention weight long enough to form lock-in
52+
- DRY sampler (Don't Repeat Yourself) — exists in llama.cpp, not in ours
53+
- Upstream Qwen3.6 chat template work — `<|im_start|>assistant\n<think>`
54+
may need a specific non-empty placeholder to unlock trained behavior
55+
56+
Landed diagnostic infrastructure covers all three modes:
57+
`TQ_LOGIT_PROBE` (EOS rank + margin + entropy + top5),
58+
`TQ_RESIDUAL_PROBE` (residual rms per layer/pos — added R39 prep),
59+
`TQ_MOE_PROBE` (router top-K), `TQ_DELTA_PROBE` (state norm).
60+
661
## ★ Phase 3 R39 — EOS rank diagnosis reframes the 1000-tok problem ★
762

863
**User insight**: "혹시 종료할 시점에 종료를 하지 못해서 발생하는건 아닌지?" —

0 commit comments

Comments
 (0)