Skip to content

Commit c551293

Browse files
unamedkrclaude
andcommitted
★ debug(logit): EOS rank diagnosis reframes 35B 1000-tok problem
User asked: "Could the failure happen at termination time — EOS token issue?" Adding EOS rank to TQ_LOGIT_PROBE answered decisively. On Qwen3.6-35B, "Once upon a time in a faraway land", -n 275, T=2.0: pos=25 EOS rank 511 (mid-narrative, irrelevant) pos=100 EOS rank 65 pos=125 EOS rank 47 (Sorry! loop starts) pos=175 EOS rank 13 (alphabet walk starts) pos=250 EOS rank 6 (alphabet walk continues) EOS rank climbs 511 → 6 through the degradation. The model IS trying to terminate with increasing confidence, but top1 always wins at T=0 by 3-7 logits. So "alphabet walk" is the model stuck between wanting to stop and being forced to output another token. This reframes the 1000-tok target: - 6-word "Once upon a time" prompt naturally merits ~150-200 tokens; beyond that the model signals EOS increasingly strongly - Substantive --chat prompt with Qwen3-Thinking template emits EOS IMMEDIATELY (empty <think> block is malformed, model responds with just <|im_end|>) Neither is a quant/DeltaNet/MoE bug. The 1000-tok headline metric requires chat-template work (fill <think> block or use non-thinking branch) OR a base-completion prompt with scaffold that makes 1000 tokens in-distribution. Saved user's one-line insight as permanent memory (feedback_eos_rank_diagnosis.md + MEMORY.md index): "Before chasing residual-collapse as the cause, check EOS rank first — cheap, often decisive." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 562fa34 commit c551293

2 files changed

Lines changed: 99 additions & 2 deletions

File tree

.claude/state.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,62 @@
33
**Last updated**: 2026-04-22 (Phase 2 KV clean-bill)
44
**Session HEAD**: turbo_kv_4b per-arch per-layer clean-bill LANDED via chunked TQ_KV_PROBE. 7×/+0% PPL claim now validated element-by-element across Llama, Qwen3-0.6B, Qwen3.5-4B, Qwen3.6-35B.
55

6+
## ★ Phase 3 R39 — EOS rank diagnosis reframes the 1000-tok problem ★
7+
8+
**User insight**: "혹시 종료할 시점에 종료를 하지 못해서 발생하는건 아닌지?" —
9+
could the degenerate output be the model unable to emit EOS?
10+
11+
Added EOS rank to `TQ_LOGIT_PROBE` output. Qwen3.6-35B UD-IQ4_XS,
12+
"Once upon a time in a faraway land", T=2.0 (auto-default), -n 275:
13+
14+
| pos | EOS rank | top1-EOS logit gap | observable |
15+
|---:|---:|---:|:---|
16+
| 25 | 511 | 17.5 | normal narrative |
17+
| 100 | 65 | 8.1 | normal |
18+
| 125 | 47 | 12.7 | "Sorry!" loop starts |
19+
| 175 | **13** | 6.0 | alphabet walk begins |
20+
| 200 | **13** | 5.9 | alphabet walk |
21+
| 250 | **6** | 6.7 | alphabet walk continues |
22+
23+
**EOS rank climbs 511 → 6 through the degradation**. The model IS
24+
signaling termination with increasing confidence, but top1 always wins
25+
at T=0 by 3-7 logits. So the "alphabet walk" is the model **stuck
26+
between wanting to stop and being forced to output another token**.
27+
28+
### Reframing the 1000-tok problem
29+
30+
The 6-word prompt "Once upon a time in a faraway land" doesn't merit
31+
1000 coherent tokens. The model's natural answer is ~150-200 tokens of
32+
narrative then EOS. Forcing `-n 1000` on it means most of those tokens
33+
are post-EOS-attempt confusion.
34+
35+
Tested with substantive prompt + `--chat`:
36+
- Qwen3.6-Thinking-Instruct chat template primes `<think>\n\n</think>\n\n`
37+
- Result: 0 tokens generated (model emits EOS immediately because
38+
empty `<think>` block is malformed — needs actual reasoning content
39+
in thinking mode)
40+
41+
So 1000+ coherent tokens on 35B requires one of:
42+
1. Chat-template work: let model generate filled `<think>...</think>`
43+
block before the response, OR use a non-thinking branch of Qwen3.6
44+
2. A base-completion prompt with enough structural scaffolding (system
45+
+ instruction + expected format) that 1000 tokens is in-distribution
46+
47+
Neither is a quantization / DeltaNet / MoE bug. Both are product/ergonomics
48+
work on the chat pipeline.
49+
50+
### Diagnostic deliverable
51+
52+
`TQ_LOGIT_PROBE` now reports EOS rank per probe position — a cheap,
53+
decisive check. From `feedback_eos_rank_diagnosis.md`:
54+
55+
> Rule: when a model emits degenerate output at long positions, BEFORE
56+
> assuming residual-space collapse / quantization drift / KV corruption,
57+
> ask: *is EOS rank climbing toward top-1?* If yes, the model is trying
58+
> to terminate.
59+
60+
Saved as memory `feedback_eos_rank_diagnosis.md` + indexed in MEMORY.md.
61+
662
## Phase 3 R38 — 1000-tok target diagnosis — logits peaky, residual collapse suspected (2026-04-22)
763

864
User-set breakthrough metric: **coherent generation to 1000+ tokens on

src/engine/tq_transformer.c

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3238,6 +3238,28 @@ float* tq_forward(tq_model_t* model, tq_state_t* s, int token, int pos) {
32383238
char _slot[16]; snprintf(_slot, sizeof(_slot), "h%d", l);
32393239
tq_dump_hidden(_slot, s->x, dim, pos);
32403240
}
3241+
/* TQ_RESIDUAL_PROBE=every=N prints per-layer s->x rms + max-abs at
3242+
* every N-th position. Used to localize residual-stream collapse
3243+
* that drives 35B long-gen alphabet-walk (R38 diagnosis). */
3244+
{
3245+
const char* _rp = getenv("TQ_RESIDUAL_PROBE");
3246+
if (_rp) {
3247+
int every = 0;
3248+
const char* eq = strstr(_rp, "every=");
3249+
if (eq) every = atoi(eq + 6);
3250+
if (every <= 0) every = 25;
3251+
if ((pos % every) == 0 && pos > 0) {
3252+
double ss = 0; float mx = 0;
3253+
for (int i = 0; i < dim; i++) {
3254+
ss += (double)s->x[i] * s->x[i];
3255+
float a = fabsf(s->x[i]);
3256+
if (a > mx) mx = a;
3257+
}
3258+
fprintf(stderr, "[res-probe] pos=%d L%d rms=%.3f max_abs=%.3f\n",
3259+
pos, l, (float)sqrt(ss / dim), mx);
3260+
}
3261+
}
3262+
}
32413263
/* Post-layer processing: PLE, layer_output_scale.
32423264
* GPU graph path jumps here after full-layer GPU forward. */
32433265

@@ -3388,10 +3410,29 @@ float* tq_forward(tq_model_t* model, tq_state_t* s, int token, int pos) {
33883410
double p = expf(s->logits[i] - maxl) / Z;
33893411
if (p > 1e-30) H -= p * (log(p));
33903412
}
3391-
fprintf(stderr, "[logit-probe] pos=%d top5_logits=[%.3f,%.3f,%.3f,%.3f,%.3f] top5_ids=[%d,%d,%d,%d,%d] margin_1_to_2=%.3f entropy=%.3f nats\n",
3413+
/* EOS rank + logit: is the model trying to stop but getting
3414+
* overruled by a peakier wrong token? Qwen3.6 EOS=248046,
3415+
* Qwen3.x-thinking may use <|im_end|>=151645 etc.
3416+
* We check a few common IDs and report the max-logit one. */
3417+
int eos_candidates[] = {248046, 248044, 151645, 128001, 128009, 2};
3418+
int n_eos = sizeof(eos_candidates)/sizeof(eos_candidates[0]);
3419+
float eos_logit = -1e30f; int eos_id = -1;
3420+
for (int e = 0; e < n_eos; e++) {
3421+
int id = eos_candidates[e];
3422+
if (id >= 0 && id < c->vocab_size && s->logits[id] > eos_logit) {
3423+
eos_logit = s->logits[id]; eos_id = id;
3424+
}
3425+
}
3426+
/* Compute EOS rank: how many tokens have higher logit than EOS */
3427+
int eos_rank = 0;
3428+
if (eos_id >= 0) {
3429+
for (int i = 0; i < c->vocab_size; i++)
3430+
if (s->logits[i] > eos_logit) eos_rank++;
3431+
}
3432+
fprintf(stderr, "[logit-probe] pos=%d top5_logits=[%.3f,%.3f,%.3f,%.3f,%.3f] top5_ids=[%d,%d,%d,%d,%d] margin=%.3f entropy=%.3f eos_id=%d eos_logit=%.3f eos_rank=%d\n",
33923433
pos, top[0], top[1], top[2], top[3], top[4],
33933434
top_idx[0], top_idx[1], top_idx[2], top_idx[3], top_idx[4],
3394-
top[0]-top[1], H);
3435+
top[0]-top[1], H, eos_id, eos_logit, eos_rank);
33953436
}
33963437
}
33973438
}

0 commit comments

Comments
 (0)