|
| 1 | +# KNOWLEDGE SYNC: What the Signed Session Needs to Know |
| 2 | + |
| 3 | +## THE 33% ERROR — Not Cosmetic |
| 4 | + |
| 5 | +### What happened |
| 6 | + |
| 7 | +``` |
| 8 | +Step 1 (this session, early): Synthetic gates on Jina lens |
| 9 | + Result: cos(raw, corrected) = 0.999, 83% peak agreement |
| 10 | + Verdict: "COSMETIC" |
| 11 | + |
| 12 | +Step 2 (this session, later): REAL Qwopus 27B BF16 gates streamed |
| 13 | + Result: 86% material corrections, 99.2% cells change, mean Δ = 84.2 u8 |
| 14 | + Verdict: "33% OF THE ENTIRE SCALE IS WRONG" |
| 15 | + |
| 16 | +The synthetic test used wide gate ranges [-0.1, 0.3]. |
| 17 | +The real Qwopus gates are concentrated at zero: 68.9% of |w| < 0.01. |
| 18 | +SiLU's nonlinearity lives at zero. Narrow gates = big correction. |
| 19 | +``` |
| 20 | + |
| 21 | +### The numbers (MEASURED, not estimated) |
| 22 | + |
| 23 | +``` |
| 24 | +Source: Qwopus3.5-27B-v3 BF16 GGUF (53.8 GB), streamed via HTTP range |
| 25 | +File: crates/thinking-engine/data/Qwopus3.5-27B-v3-BF16-silu/layer_stats.json |
| 26 | +
|
| 27 | +ffn_gate L0: |
| 28 | + Weight range: [-0.109, 0.115] |
| 29 | + Near zero (|w| < 0.01): 68.9% |
| 30 | + Cosine range: [-0.23, +0.18], std=0.022 |
| 31 | +
|
| 32 | +ffn_up L0 (what the table encodes): |
| 33 | + Raw cosine std: 0.021 |
| 34 | + SiLU(gate)×up cosine std: 0.051 ← 2.4× MORE SPREAD |
| 35 | + |
| 36 | + Table comparison (256×256 u8): |
| 37 | + Cells changed: 99.2% (64,968 / 65,536) |
| 38 | + Mean |Δ|: 84.2 u8 levels (33% of 256 scale) |
| 39 | + Max |Δ|: 254 u8 levels (nearly full range) |
| 40 | +
|
| 41 | +Consistent across ALL 64 layers: |
| 42 | + Layer 0: gate_zero=69%, SiLU Δ=85 |
| 43 | + Layer 16: gate_zero=64%, SiLU Δ=85 |
| 44 | + Layer 32: gate_zero=66%, SiLU Δ=84 |
| 45 | + Layer 48: gate_zero=66%, SiLU Δ=84 |
| 46 | + Layer 63: gate_zero=57%, SiLU Δ=84 |
| 47 | +``` |
| 48 | + |
| 49 | +## THE CRITICAL BUG IN signed_engine.rs |
| 50 | + |
| 51 | +### What the signed session built |
| 52 | + |
| 53 | +```rust |
| 54 | +// In dual_signed_experiment.rs line ~30: |
| 55 | +let signed_table: Vec<i8> = table.iter() |
| 56 | + .map(|&v| (v as i16 - 128) as i8) |
| 57 | + .collect(); |
| 58 | +``` |
| 59 | + |
| 60 | +### Why this does NOT fix the 33% error |
| 61 | + |
| 62 | +``` |
| 63 | +The u8 table was built from: |
| 64 | + CLAM centroids → raw cosine → CDF percentile → u8[0,255] |
| 65 | + |
| 66 | +The gate sign information was LOST during "raw cosine": |
| 67 | + cos(weight_row_i, weight_row_j) treats all dimensions equally. |
| 68 | + It doesn't know that gate[k] = -0.005 means BLOCK |
| 69 | + while gate[k] = +0.005 means PASS. |
| 70 | + Both contribute equally to cosine. |
| 71 | +
|
| 72 | +Converting u8 → i8 by subtracting 128: |
| 73 | + u8[156] → i8[+28] (was positive, still positive) |
| 74 | + u8[121] → i8[-7] (was below midpoint, now negative) |
| 75 | + |
| 76 | + This creates signed values from the CDF RANK, not from the WEIGHT SIGNS. |
| 77 | + A u8 value of 121 means "43rd percentile of cosine distribution" |
| 78 | + NOT "the weights have opposite signs here." |
| 79 | + |
| 80 | + The i8 conversion is a RELABELING of ranks, not a RECOVERY of signs. |
| 81 | +``` |
| 82 | + |
| 83 | +### What needs to happen instead |
| 84 | + |
| 85 | +``` |
| 86 | +WRONG (current signed_engine path): |
| 87 | + u8 table (gate info lost) → subtract 128 → i8 (gate info still lost) |
| 88 | +
|
| 89 | +RIGHT: |
| 90 | + BF16 weights → compute SIGNED cosine → encode directly as i8[-128,+127] |
| 91 | + |
| 92 | + For gate-modulated roles (K, V, Up): |
| 93 | + activated = silu(gate_row) × weight_row (elementwise) |
| 94 | + cos(activated_i, activated_j) → i8 (SIGNED, preserves gate decisions) |
| 95 | + |
| 96 | + For raw roles (Q, Down): |
| 97 | + cos(weight_row_i, weight_row_j) → i8 (still signed, preserves weight polarity) |
| 98 | + |
| 99 | + The sign in the cosine IS the excitation/inhibition signal. |
| 100 | + Negative cosine = opposed features = inhibition. |
| 101 | + This is REAL — the models have negative cosines: |
| 102 | + Qwopus ffn_gate: cos[-0.23, +0.18] |
| 103 | + Reranker: cos[-0.886, +0.826] |
| 104 | + Reader-LM ffn_down: cos[-0.885, +0.188] |
| 105 | +``` |
| 106 | + |
| 107 | +## WHAT THE DUAL EXPERIMENT ACTUALLY TESTS |
| 108 | + |
| 109 | +``` |
| 110 | +Current dual_signed_experiment.rs tests: |
| 111 | + "Does i8(u8 - 128) produce different peaks than u8?" |
| 112 | + Answer: Yes, but the difference is from RELABELING, not from gate recovery. |
| 113 | + |
| 114 | +What it SHOULD test: |
| 115 | + "Does i8(BF16 → signed cosine) produce different peaks than u8(BF16 → CDF)?" |
| 116 | + Answer: unknown — not built yet. |
| 117 | + |
| 118 | +The experiment framework (DualEngine, DualResult, comparison metrics) is CORRECT. |
| 119 | +The TABLE CONTENT feeding into the signed engine is WRONG. |
| 120 | +``` |
| 121 | + |
| 122 | +## THE 7-LANE CALIBRATION PLAN |
| 123 | + |
| 124 | +### Why we went back to the design board |
| 125 | + |
| 126 | +``` |
| 127 | +After measuring the 33% error, we realized: |
| 128 | +1. All 7 HDR lenses have identical statistics (Mean=127.5, Std=73.6) |
| 129 | + because CDF encoding forces uniform distribution. |
| 130 | + Model-specific topology IS preserved (99.2% bytes differ between models) |
| 131 | + but you can't see it in the statistics. |
| 132 | +
|
| 133 | +2. γ+φ encoding (golden ratio offset) is NOT applied to any baked table. |
| 134 | + The code exists in bgz-tensor/gamma_phi.rs but was never wired. |
| 135 | + Per-role γ offsets are DOCUMENTED (Gate=1.50, Q=0.37) but NOT USED. |
| 136 | +
|
| 137 | +3. Calibrating against GGUF BF16 is calibrating against TIFF, not RAW. |
| 138 | + BF16 has 7-bit mantissa → ±0.008 precision → ~5% rank flips at boundaries. |
| 139 | + Need ONNX f32 as ground truth. |
| 140 | +
|
| 141 | +4. Jina v5 has BOTH ONNX f32 (2.4 GB) and GGUF (1.2 GB). |
| 142 | + No API key needed. Both verified streamable. |
| 143 | +``` |
| 144 | + |
| 145 | +### The 7 encoding lanes to compare |
| 146 | + |
| 147 | +``` |
| 148 | +For each model × each role: |
| 149 | + Lane 1: u8 linear (current 64×64 codebook tables) |
| 150 | + Lane 2: u8 CDF (current 256×256 HDR lenses) |
| 151 | + Lane 3: u8 γ+φ (gamma offset + phi redistribution) |
| 152 | + Lane 4: i8 from u8 (subtract 128 — what signed_engine.rs does now) |
| 153 | + Lane 5: i8 from BF16 (signed cosine directly — NOT BUILT YET) |
| 154 | + Lane 6: i8 γ+φ signed (gamma + phi on signed range) |
| 155 | + Lane 7: highheelbgz spiral (golden ratio stride encoding) |
| 156 | +
|
| 157 | +Ground truth: ONNX f32 forward pass via rten |
| 158 | +Metric: Spearman ρ(lane_distances, onnx_distances) |
| 159 | +After ICC: does correction bring all lanes to ρ > 0.998? |
| 160 | +
|
| 161 | +The lane that needs the LEAST ICC correction = the most faithful encoding. |
| 162 | +``` |
| 163 | + |
| 164 | +### BF16 bucket boundary awareness |
| 165 | + |
| 166 | +``` |
| 167 | +When raw cosine is within ±0.008 of a HEEL bucket boundary, |
| 168 | +BF16 truncation can flip the bucket assignment. |
| 169 | +High precision refinement (HIP/TWIG) on the wrong bucket = confidently lost. |
| 170 | +
|
| 171 | +Fix: boundary_risk metadata per centroid pair. |
| 172 | + 95% safe → fast cascade |
| 173 | + 5% uncertain → skip cascade, validate at LEAF or compute directly |
| 174 | +
|
| 175 | +γ+φ golden ratio stride reduces boundary risk by placing bucket |
| 176 | +edges at irrational positions that don't align with BF16 quant steps. |
| 177 | +``` |
| 178 | + |
| 179 | +## ACTION ITEMS FOR THE SIGNED SESSION |
| 180 | + |
| 181 | +``` |
| 182 | +1. DO NOT trust the current i8 tables (u8 - 128 = relabeled ranks, not gate signs) |
| 183 | +
|
| 184 | +2. BUILD i8 tables directly from BF16 weights: |
| 185 | + Stream Qwopus BF16 → silu(gate) × up → cosine → round(cos × 127) → i8 |
| 186 | + Use the existing streaming pipeline (Python scripts in this session, |
| 187 | + or the Rust stream_hdr_lens.rs pattern) |
| 188 | +
|
| 189 | +3. RE-RUN dual_signed_experiment with BOTH table types: |
| 190 | + DualEngine with: |
| 191 | + unsigned = u8 CDF (current, from raw cosine) |
| 192 | + signed = i8 from BF16 signed cosine (NEW, from silu(gate)×up) |
| 193 | + |
| 194 | + THEN compare. The agreement metric will be meaningful. |
| 195 | +
|
| 196 | +4. For calibration: |
| 197 | + Download Jina v5 ONNX (2.4 GB) — the f32 ground truth |
| 198 | + Download Jina v5 GGUF (1.2 GB) — our streaming source |
| 199 | + Run rten on ONNX → compute f32 embedding cosines for test sentences |
| 200 | + Compare: baked table distances vs ONNX distances → Spearman ρ |
| 201 | + Build ICC profile → corrected ρ should reach > 0.998 |
| 202 | +
|
| 203 | +5. Temperature + nucleus sampling: |
| 204 | + This UNBLOCKS coherent output. Without it, even perfect tables collapse. |
| 205 | + See HANDOVER_MAVERICK_SESSION.md for the 10-line implementation. |
| 206 | + Wire INTO thinking styles (Analytical=top_p 0.3, Creative=0.95). |
| 207 | +``` |
| 208 | + |
| 209 | +## FILES THAT MATTER |
| 210 | + |
| 211 | +``` |
| 212 | +MEASURED DATA (this session): |
| 213 | + crates/thinking-engine/data/Qwopus3.5-27B-v3-BF16-silu/layer_stats.json |
| 214 | + → per-layer gate near-zero %, cosine ranges, SiLU correction stats |
| 215 | + crates/thinking-engine/data/Qwopus3.5-27B-v3-BF16-silu/gate_raw_256x256.u8 |
| 216 | + → L0 gate table WITHOUT SiLU |
| 217 | + crates/thinking-engine/data/Qwopus3.5-27B-v3-BF16-silu/gate_silu_corrected_256x256.u8 |
| 218 | + → L0 gate table WITH SiLU (compare: 99.2% cells differ, mean Δ=84.2) |
| 219 | +
|
| 220 | +SILU CORRECTION CODE: |
| 221 | + crates/thinking-engine/src/silu_correction.rs |
| 222 | + → generate_training_data(), gate_modulate_centroids(), apply_corrections() |
| 223 | + → May be OBSOLETE if i8-from-BF16 path works (sign preserves gate natively) |
| 224 | + → But the MEASUREMENT code (correction_stats()) is still valuable for analysis |
| 225 | +
|
| 226 | +CALIBRATION HARNESS: |
| 227 | + crates/thinking-engine/examples/calibrate_lenses.rs → Spearman + ICC builder |
| 228 | + crates/lance-graph-contract/src/high_heel.rs → LensProfile, LensConfig, LENS_REGISTRY |
| 229 | +
|
| 230 | +HANDOVER DOCS: |
| 231 | + .claude/HANDOVER_MAVERICK_SESSION.md → i8 architecture, Maverick plan, temperature fix |
| 232 | + .claude/HANDOVER_CALIBRATION_SESSION.md → H1-H5 hypotheses, Cronbach α protocol |
| 233 | +``` |
0 commit comments