Skip to content

Commit 7dbe43e

Browse files
authored
Merge pull request #118 from AdaWorldAPI/claude/setup-embedding-pipeline-Fa65C
docs: CRITICAL knowledge sync for signed session — 33% error, not cosmetic The signed session built i8 tables by converting u8 → i8 (subtract 128). This RELABELS ranks but does NOT recover gate sign information. The 33% scale error from SiLU gate nonlinearity is still baked in. MEASURED on real Qwopus BF16: 68.9% of gate weights near zero (decision boundary) 99.2% of table cells change with SiLU correction Mean change: 84.2 u8 levels (33% of 256 scale) Consistent across all 64 layers FIX: build i8 tables directly from BF16 → signed cosine → i8 NOT from u8 table → subtract 128 → i8 Also documents: - 7-lane calibration plan (u8/i8/γ+φ/signed/spiral) - BF16 bucket boundary awareness (5% rank flips) - Jina v5 as ONNX f32 ground truth (no API key) - Why CDF makes all models look identical (by design) https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
2 parents 437892c + e8e1d63 commit 7dbe43e

1 file changed

Lines changed: 233 additions & 0 deletions

File tree

Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
# KNOWLEDGE SYNC: What the Signed Session Needs to Know
2+
3+
## THE 33% ERROR — Not Cosmetic
4+
5+
### What happened
6+
7+
```
8+
Step 1 (this session, early): Synthetic gates on Jina lens
9+
Result: cos(raw, corrected) = 0.999, 83% peak agreement
10+
Verdict: "COSMETIC"
11+
12+
Step 2 (this session, later): REAL Qwopus 27B BF16 gates streamed
13+
Result: 86% material corrections, 99.2% cells change, mean Δ = 84.2 u8
14+
Verdict: "33% OF THE ENTIRE SCALE IS WRONG"
15+
16+
The synthetic test used wide gate ranges [-0.1, 0.3].
17+
The real Qwopus gates are concentrated at zero: 68.9% of |w| < 0.01.
18+
SiLU's nonlinearity lives at zero. Narrow gates = big correction.
19+
```
20+
21+
### The numbers (MEASURED, not estimated)
22+
23+
```
24+
Source: Qwopus3.5-27B-v3 BF16 GGUF (53.8 GB), streamed via HTTP range
25+
File: crates/thinking-engine/data/Qwopus3.5-27B-v3-BF16-silu/layer_stats.json
26+
27+
ffn_gate L0:
28+
Weight range: [-0.109, 0.115]
29+
Near zero (|w| < 0.01): 68.9%
30+
Cosine range: [-0.23, +0.18], std=0.022
31+
32+
ffn_up L0 (what the table encodes):
33+
Raw cosine std: 0.021
34+
SiLU(gate)×up cosine std: 0.051 ← 2.4× MORE SPREAD
35+
36+
Table comparison (256×256 u8):
37+
Cells changed: 99.2% (64,968 / 65,536)
38+
Mean |Δ|: 84.2 u8 levels (33% of 256 scale)
39+
Max |Δ|: 254 u8 levels (nearly full range)
40+
41+
Consistent across ALL 64 layers:
42+
Layer 0: gate_zero=69%, SiLU Δ=85
43+
Layer 16: gate_zero=64%, SiLU Δ=85
44+
Layer 32: gate_zero=66%, SiLU Δ=84
45+
Layer 48: gate_zero=66%, SiLU Δ=84
46+
Layer 63: gate_zero=57%, SiLU Δ=84
47+
```
48+
49+
## THE CRITICAL BUG IN signed_engine.rs
50+
51+
### What the signed session built
52+
53+
```rust
54+
// In dual_signed_experiment.rs line ~30:
55+
let signed_table: Vec<i8> = table.iter()
56+
.map(|&v| (v as i16 - 128) as i8)
57+
.collect();
58+
```
59+
60+
### Why this does NOT fix the 33% error
61+
62+
```
63+
The u8 table was built from:
64+
CLAM centroids → raw cosine → CDF percentile → u8[0,255]
65+
66+
The gate sign information was LOST during "raw cosine":
67+
cos(weight_row_i, weight_row_j) treats all dimensions equally.
68+
It doesn't know that gate[k] = -0.005 means BLOCK
69+
while gate[k] = +0.005 means PASS.
70+
Both contribute equally to cosine.
71+
72+
Converting u8 → i8 by subtracting 128:
73+
u8[156] → i8[+28] (was positive, still positive)
74+
u8[121] → i8[-7] (was below midpoint, now negative)
75+
76+
This creates signed values from the CDF RANK, not from the WEIGHT SIGNS.
77+
A u8 value of 121 means "43rd percentile of cosine distribution"
78+
NOT "the weights have opposite signs here."
79+
80+
The i8 conversion is a RELABELING of ranks, not a RECOVERY of signs.
81+
```
82+
83+
### What needs to happen instead
84+
85+
```
86+
WRONG (current signed_engine path):
87+
u8 table (gate info lost) → subtract 128 → i8 (gate info still lost)
88+
89+
RIGHT:
90+
BF16 weights → compute SIGNED cosine → encode directly as i8[-128,+127]
91+
92+
For gate-modulated roles (K, V, Up):
93+
activated = silu(gate_row) × weight_row (elementwise)
94+
cos(activated_i, activated_j) → i8 (SIGNED, preserves gate decisions)
95+
96+
For raw roles (Q, Down):
97+
cos(weight_row_i, weight_row_j) → i8 (still signed, preserves weight polarity)
98+
99+
The sign in the cosine IS the excitation/inhibition signal.
100+
Negative cosine = opposed features = inhibition.
101+
This is REAL — the models have negative cosines:
102+
Qwopus ffn_gate: cos[-0.23, +0.18]
103+
Reranker: cos[-0.886, +0.826]
104+
Reader-LM ffn_down: cos[-0.885, +0.188]
105+
```
106+
107+
## WHAT THE DUAL EXPERIMENT ACTUALLY TESTS
108+
109+
```
110+
Current dual_signed_experiment.rs tests:
111+
"Does i8(u8 - 128) produce different peaks than u8?"
112+
Answer: Yes, but the difference is from RELABELING, not from gate recovery.
113+
114+
What it SHOULD test:
115+
"Does i8(BF16 → signed cosine) produce different peaks than u8(BF16 → CDF)?"
116+
Answer: unknown — not built yet.
117+
118+
The experiment framework (DualEngine, DualResult, comparison metrics) is CORRECT.
119+
The TABLE CONTENT feeding into the signed engine is WRONG.
120+
```
121+
122+
## THE 7-LANE CALIBRATION PLAN
123+
124+
### Why we went back to the design board
125+
126+
```
127+
After measuring the 33% error, we realized:
128+
1. All 7 HDR lenses have identical statistics (Mean=127.5, Std=73.6)
129+
because CDF encoding forces uniform distribution.
130+
Model-specific topology IS preserved (99.2% bytes differ between models)
131+
but you can't see it in the statistics.
132+
133+
2. γ+φ encoding (golden ratio offset) is NOT applied to any baked table.
134+
The code exists in bgz-tensor/gamma_phi.rs but was never wired.
135+
Per-role γ offsets are DOCUMENTED (Gate=1.50, Q=0.37) but NOT USED.
136+
137+
3. Calibrating against GGUF BF16 is calibrating against TIFF, not RAW.
138+
BF16 has 7-bit mantissa → ±0.008 precision → ~5% rank flips at boundaries.
139+
Need ONNX f32 as ground truth.
140+
141+
4. Jina v5 has BOTH ONNX f32 (2.4 GB) and GGUF (1.2 GB).
142+
No API key needed. Both verified streamable.
143+
```
144+
145+
### The 7 encoding lanes to compare
146+
147+
```
148+
For each model × each role:
149+
Lane 1: u8 linear (current 64×64 codebook tables)
150+
Lane 2: u8 CDF (current 256×256 HDR lenses)
151+
Lane 3: u8 γ+φ (gamma offset + phi redistribution)
152+
Lane 4: i8 from u8 (subtract 128 — what signed_engine.rs does now)
153+
Lane 5: i8 from BF16 (signed cosine directly — NOT BUILT YET)
154+
Lane 6: i8 γ+φ signed (gamma + phi on signed range)
155+
Lane 7: highheelbgz spiral (golden ratio stride encoding)
156+
157+
Ground truth: ONNX f32 forward pass via rten
158+
Metric: Spearman ρ(lane_distances, onnx_distances)
159+
After ICC: does correction bring all lanes to ρ > 0.998?
160+
161+
The lane that needs the LEAST ICC correction = the most faithful encoding.
162+
```
163+
164+
### BF16 bucket boundary awareness
165+
166+
```
167+
When raw cosine is within ±0.008 of a HEEL bucket boundary,
168+
BF16 truncation can flip the bucket assignment.
169+
High precision refinement (HIP/TWIG) on the wrong bucket = confidently lost.
170+
171+
Fix: boundary_risk metadata per centroid pair.
172+
95% safe → fast cascade
173+
5% uncertain → skip cascade, validate at LEAF or compute directly
174+
175+
γ+φ golden ratio stride reduces boundary risk by placing bucket
176+
edges at irrational positions that don't align with BF16 quant steps.
177+
```
178+
179+
## ACTION ITEMS FOR THE SIGNED SESSION
180+
181+
```
182+
1. DO NOT trust the current i8 tables (u8 - 128 = relabeled ranks, not gate signs)
183+
184+
2. BUILD i8 tables directly from BF16 weights:
185+
Stream Qwopus BF16 → silu(gate) × up → cosine → round(cos × 127) → i8
186+
Use the existing streaming pipeline (Python scripts in this session,
187+
or the Rust stream_hdr_lens.rs pattern)
188+
189+
3. RE-RUN dual_signed_experiment with BOTH table types:
190+
DualEngine with:
191+
unsigned = u8 CDF (current, from raw cosine)
192+
signed = i8 from BF16 signed cosine (NEW, from silu(gate)×up)
193+
194+
THEN compare. The agreement metric will be meaningful.
195+
196+
4. For calibration:
197+
Download Jina v5 ONNX (2.4 GB) — the f32 ground truth
198+
Download Jina v5 GGUF (1.2 GB) — our streaming source
199+
Run rten on ONNX → compute f32 embedding cosines for test sentences
200+
Compare: baked table distances vs ONNX distances → Spearman ρ
201+
Build ICC profile → corrected ρ should reach > 0.998
202+
203+
5. Temperature + nucleus sampling:
204+
This UNBLOCKS coherent output. Without it, even perfect tables collapse.
205+
See HANDOVER_MAVERICK_SESSION.md for the 10-line implementation.
206+
Wire INTO thinking styles (Analytical=top_p 0.3, Creative=0.95).
207+
```
208+
209+
## FILES THAT MATTER
210+
211+
```
212+
MEASURED DATA (this session):
213+
crates/thinking-engine/data/Qwopus3.5-27B-v3-BF16-silu/layer_stats.json
214+
→ per-layer gate near-zero %, cosine ranges, SiLU correction stats
215+
crates/thinking-engine/data/Qwopus3.5-27B-v3-BF16-silu/gate_raw_256x256.u8
216+
→ L0 gate table WITHOUT SiLU
217+
crates/thinking-engine/data/Qwopus3.5-27B-v3-BF16-silu/gate_silu_corrected_256x256.u8
218+
→ L0 gate table WITH SiLU (compare: 99.2% cells differ, mean Δ=84.2)
219+
220+
SILU CORRECTION CODE:
221+
crates/thinking-engine/src/silu_correction.rs
222+
→ generate_training_data(), gate_modulate_centroids(), apply_corrections()
223+
→ May be OBSOLETE if i8-from-BF16 path works (sign preserves gate natively)
224+
→ But the MEASUREMENT code (correction_stats()) is still valuable for analysis
225+
226+
CALIBRATION HARNESS:
227+
crates/thinking-engine/examples/calibrate_lenses.rs → Spearman + ICC builder
228+
crates/lance-graph-contract/src/high_heel.rs → LensProfile, LensConfig, LENS_REGISTRY
229+
230+
HANDOVER DOCS:
231+
.claude/HANDOVER_MAVERICK_SESSION.md → i8 architecture, Maverick plan, temperature fix
232+
.claude/HANDOVER_CALIBRATION_SESSION.md → H1-H5 hypotheses, Cronbach α protocol
233+
```

0 commit comments

Comments
 (0)