=== INFERENCE PIPELINE DIAGNOSTICS ===
Loading model: models--mlx-community--lille-130m-instruct-8bit/snapshots/fb2acaff28236ca4f3812032cf7f854881f45f59
Model loaded.
--- TEST 1: Basic GPU ops ---
[DIAG] matmul(ones, 2*ones) expect=8 shape=(4,4) min=8.000000 max=8.000000 mean=8.000000 |mean|=8.000000
[VALS] matmul result: [8.0000, 8.0000, 8.0000, 8.0000, 8.0000, 8.0000, 8.0000, 8.0000]
[hipBLASLt] first call
[hipBLASLt] M=4 N=4 K=4 ta=0 tb=0 lda=4 ldb=4 ldc=4
[DIAG] bf16 matmul expect=8 shape=(4,4) min=8.000000 max=8.000000 mean=8.000000 |mean|=8.000000
--- TEST 2: quantized_matmul vs dequant ---
[DIAG] q_proj weights not found (w=0 s=0 b=0)
--- TEST 3: RMS Norm ---
[DIAG] rms_norm([1,2,3,4]) shape=(1,1,4) min=0.365148 max=1.460593 mean=0.912871 |mean|=0.912871
[VALS] rms_norm([1,2,3,4]) expect≈[.365,.730,1.095,1.461]: [0.3651, 0.7303, 1.0954, 1.4606]
[DIAG] rms_norm(rand bf16 4096) shape=(1,3,4096) min=-3.843750 max=3.906250 mean=-0.006712 |mean|=0.798703
--- TEST 4: RoPE ---
[DIAG] rope(ones, off=0) shape=(1,1,1,128) min=1.000000 max=1.000000 mean=1.000000 |mean|=1.000000
[VALS] rope(ones, off=0): [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000]
[DIAG] rope(ones, off=100) shape=(1,1,1,128) min=-1.414062 max=1.406250 mean=0.586235 |mean|=0.953492
[VALS] rope(ones, off=100): [1.3672, 1.3438, -1.3672, -1.3516, 0.7305, -1.3828, -1.4062, -0.9219, 1.3594, -1.1719, 1.3750, -1.1094, -0.5898, 1.2109, 1.1406, -0.0040, -0.9805, -1.3906, -1.3516, -1.0781]
--- TEST 5: Full forward pass ---
[DIAG] logits(token=1) shape=(1,1,32768) min=0.000000 max=0.000000 mean=0.000000 |mean|=0.000000
[VALS] logits(token=1): [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]
[DIAG] Top-10:
token=32767 logit=0.0000
token=32766 logit=0.0000
token=32765 logit=0.0000
token=32764 logit=0.0000
token=32763 logit=0.0000
token=32762 logit=0.0000
token=32761 logit=0.0000
token=32760 logit=0.0000
token=32759 logit=0.0000
token=32758 logit=0.0000
[DIAG] logits(step2) shape=(1,1,32768) min=0.000000 max=0.000000 mean=0.000000 |mean|=0.000000
--- TEST 6: dequantize() sanity ---
[DIAG] dequant([0..7],s=1,b=0) shape=(1,8) min=0.000000 max=7.000000 mean=3.500000 |mean|=3.500000
[VALS] dequant expect=[0,1,2,3,4,5,6,7]: [0.0000, 1.0000, 2.0000, 3.0000, 4.0000, 5.0000, 6.0000, 7.0000]
--- TEST 6b: Warmup pass ---
[DIAG] warmup logits shape=(1,1,32768) min=0.000000 max=0.000000 mean=0.000000 |mean|=0.000000
[DIAG] Warmup complete
--- TEST 7: Token-level generation trace ---
[DIAG] encode("What is 2+2?") = [1867, 318, 362, 10, 17, 30] (6 tokens)
[DIAG] Token-by-token decode:
token 1867 -> " What"
token 318 -> " is"
token 362 -> " 2"
token 10 -> "+"
token 17 -> "2"
token 30 -> "?"
[DIAG] Chat template tokens (9): [32767, 32766, 1867, 318, 362, 10, 17, 30, 32765]
[DIAG] Chat template decoded: "<|startoftext|><|user|> What is 2+2?<|assistant|>"
[DIAG] prefill logits shape=(1,9,32768) min=0.000000 max=0.000000 mean=0.000000 |mean|=0.000000
[DIAG] Generating 20 tokens (argmax):
step=0 token=0 text="!"
step=1 token=0 text="!"
step=2 token=0 text="!"
step=3 token=0 text="!"
step=4 token=0 text="!"
step=5 token=0 text="!"
step=6 token=0 text="!"
step=7 token=0 text="!"
step=8 token=0 text="!"
step=9 token=0 text="!"
step=10 token=0 text="!"
step=11 token=0 text="!"
step=12 token=0 text="!"
step=13 token=0 text="!"
step=14 token=0 text="!"
step=15 token=0 text="!"
step=16 token=0 text="!"
step=17 token=0 text="!"
step=18 token=0 text="!"
step=19 token=0 text="!"
[DIAG] Full output (argmax): "!!!!!!!!!!!!!!!!!!!!"
[DIAG] Generating 20 tokens (categorical T=0.7):
step=0 token=893 text="ys"
step=1 token=22980 text="265"
step=2 token=28225 text=" �"
step=3 token=10011 text=" Sche"
step=4 token=5216 text="Col"
step=5 token=5724 text=" rot"
step=6 token=23778 text="hao"
step=7 token=21251 text="upiter"
step=8 token=30269 text=" lieutenant"
step=9 token=12398 text="chell"
step=10 token=27806 text=" fortunes"
step=11 token=32065 text="Language"
step=12 token=11203 text="iev"
step=13 token=15403 text=" tender"
step=14 token=22529 text=" Scout"
step=15 token=19091 text=" viewer"
step=16 token=30115 text=" ditch"
step=17 token=9403 text=" seed"
step=18 token=934 text="ular"
step=19 token=11983 text=" participating"
[DIAG] Full output (categorical): "ys265 � ScheCol rothaoupiter lieutenantchell fortunesLanguageiev tender Scout viewer ditch seedular participating"
[DIAG] Testing via generate_text (chat.cpp path):
token=23627 text=" Buc"
token=18234 text="Join"
token=22447 text=" specialists"
token=13510 text=" wooden"
token=27300 text="Defense"
token=18069 text=" mathematical"
token=673 text=" she"
token=18639 text=" hills"
token=32013 text=" finalized"
token=29209 text=" surrendered"
token=9348 text=" BE"
token=704 text="hed"
token=13620 text=" quarters"
token=23819 text="andals"
token=5755 text=" rid"
token=24000 text=" nutritional"
token=4990 text=" Ret"
token=29938 text=" neoc"
token=32171 text=" cath"
[DIAG] generate_text output: " BucJoin specialists woodenDefense mathematical she hills finalized surrendered BEhed quartersandals rid nutritional Ret neoc cath"
[DIAG] Prompt: 9 tokens, 621.521 tokens/s, 0.0144806s
Generation: 20 tokens, 275.959 tokens/s, 0.0724746s
--- TEST 8: random::categorical ---
[DIAG] categorical([..., 10, ...]) = 2 (expect 2)
[DIAG] categorical([..., 10, ...]) = 2 (expect 2)
[DIAG] categorical([..., 10, ...]) = 2 (expect 2)
[DIAG] categorical([..., 10, ...]) = 2 (expect 2)
[DIAG] categorical([..., 10, ...]) = 2 (expect 2)
[DIAG] categorical(peak@17, V=151936) = 17 (expect 17)
[DIAG] categorical(peak@17, V=151936) = 17 (expect 17)
[DIAG] categorical(peak@17, V=151936) = 17 (expect 17)
[DIAG] Testing categorical with real model logits...
hip error code: 'hipErrorIllegalAddress':700 at /__w/TheRock/TheRock/rocm-libraries/projects/rocblas/library/src/rocblas_auxiliary.cpp:899
rocBLAS error: Could not initialize Tensile host:
unordered_map::at
./test-mlx.sh: line 7: 17531 Aborted (core dumped)