@@ -231,3 +231,199 @@ HANDOVER DOCS:
231231 .claude/HANDOVER_MAVERICK_SESSION.md → i8 architecture, Maverick plan, temperature fix
232232 .claude/HANDOVER_CALIBRATION_SESSION.md → H1-H5 hypotheses, Cronbach α protocol
233233```
234+
235+ ---
236+
237+ ## SIGNED i8 FORMULAS PER ROLE
238+
239+ ### The encoding formula
240+
241+ For each weight row, the signed i8 value preserves the ACTUAL cosine polarity:
242+
243+ ```
244+ scale_factor = 127.0 / max(|cosine_values|)
245+ i8_value = round(cosine × scale_factor).clamp(-128, +127)
246+ ```
247+
248+ Per-role scale factors (from Qwopus 27B L0 measured cosine ranges):
249+
250+ ```
251+ Role Cosine Range max(|cos|) Scale Factor
252+ ──── ──────────── ────────── ────────────
253+ attn_qkv [-0.62, +0.69] 0.69 184.1
254+ ffn_gate [-0.23, +0.18] 0.23 552.2 ← HIGHEST RESOLUTION
255+ ffn_up [-0.08, +0.08] 0.08 1587.5 ← (but tiny range)
256+ ffn_down [-0.18, +0.10] 0.18 705.6
257+ ssm_out [-0.20, +0.28] 0.28 453.6
258+ ```
259+
260+ Gate gets the most resolution because its range is narrow and centered at zero —
261+ exactly where the SiLU decision boundary lives.
262+
263+ ### What each role's sign MEANS
264+
265+ ```
266+ Q (Query) — "what is the world asking?"
267+ EXTERN. Input-dependent. The world asks what it asks.
268+ i8 encoding: round(cos(Q_row_i, Q_row_j) × scale) → i8
269+
270+ +i8: "query i and query j ask SIMILAR things"
271+ -i8: "query i and query j ask OPPOSITE things"
272+ 0: "unrelated queries"
273+
274+ NO gate modulation. Q is raw.
275+ Formula: table_Q[i][j] = i8(cos(Q_centroid_i, Q_centroid_j) × scale_Q)
276+
277+
278+ K (Key) — "what do I know?" (gate-modulated)
279+ INTERN. Self-filtered knowledge index.
280+ i8 encoding: silu(gate) × K, THEN cosine, THEN i8
281+
282+ +i8: "knowledge i and knowledge j are CO-ACCESSIBLE through the gate"
283+ -i8: "gate opens i but BLOCKS j (or vice versa)"
284+ 0: "no gate relationship"
285+
286+ Formula:
287+ activated_K_i = silu(gate_centroid_i) ⊙ K_centroid_i (elementwise)
288+ activated_K_j = silu(gate_centroid_j) ⊙ K_centroid_j
289+ table_K[i][j] = i8(cos(activated_K_i, activated_K_j) × scale_K)
290+
291+ WHY silu(gate) × K:
292+ gate[d] = +0.3 → silu(0.3) = 0.16 → K[d] × 0.16 → feature d PASSES (reduced)
293+ gate[d] = -0.1 → silu(-0.1) = -0.047 → K[d] × -0.047 → feature d INVERTED
294+ gate[d] = 0.0 → silu(0.0) = 0.0 → K[d] × 0.0 → feature d MASKED
295+
296+ Two keys with SAME gate opening pattern → positive cosine → excitation
297+ Two keys where gate opens OPPOSITE features → negative cosine → inhibition
298+
299+
300+ V (Value) — "what do I give?" (gate-modulated)
301+ Same as K but for content:
302+ Formula:
303+ activated_V_i = silu(gate_centroid_i) ⊙ V_centroid_i
304+ table_V[i][j] = i8(cos(activated_V_i, activated_V_j) × scale_V)
305+
306+
307+ Gate — "what am I ALLOWED to activate?"
308+ The gate IS the lens. Not a codebook entry.
309+ i8 encoding: raw gate-to-gate cosine (how similar are two gate patterns?)
310+
311+ +i8: "same gate opening pattern" (same features allowed)
312+ -i8: "OPPOSITE gate patterns" (what one allows, the other blocks)
313+ 0: "unrelated gate patterns"
314+
315+ Formula: table_Gate[i][j] = i8(cos(Gate_centroid_i, Gate_centroid_j) × scale_Gate)
316+
317+ NOTE: 68.9% of gate values are near zero.
318+ This means most gate dimensions are in the SiLU decision zone.
319+ The SIGN of these near-zero values is the entire gate decision.
320+ i8 preserves this sign. u8 destroys it.
321+
322+
323+ Up — "how do I expand?" (gate × SiLU modulated)
324+ INTERN. The FFN expansion. Gate × SiLU × Up is the activation.
325+ i8 encoding: silu(gate) × up, THEN cosine, THEN i8
326+
327+ Formula:
328+ activated_Up_i = silu(gate_centroid_i) ⊙ Up_centroid_i
329+ activated_Up_j = silu(gate_centroid_j) ⊙ Up_centroid_j
330+ table_Up[i][j] = i8(cos(activated_Up_i, activated_Up_j) × scale_Up)
331+
332+ This is where the 33% error lives:
333+ Raw cos(Up_i, Up_j) std = 0.021
334+ cos(silu(gate)×Up_i, silu(gate)×Up_j) std = 0.051 ← 2.4× MORE SPREAD
335+ 99.2% of table cells change. Mean Δ = 84.2 u8 levels.
336+
337+ Without gate modulation: Up table is WRONG by 33%.
338+ With gate modulation: Up table captures actual FFN activation topology.
339+
340+
341+ Down — "how do I compress?"
342+ EXTERN (funnel). Receives gate×up result, compresses back.
343+ i8 encoding: raw cosine (no gate modulation needed)
344+
345+ Formula: table_Down[i][j] = i8(cos(Down_centroid_i, Down_centroid_j) × scale_Down)
346+
347+ NO gate modulation. Down receives the already-gated signal.
348+ Like Q, it's a raw cosine encoding.
349+ ```
350+
351+ ### The MatVec with signed tables
352+
353+ ``` rust
354+ /// Signed MatVec: positive entries excite, negative entries inhibit.
355+ fn signed_matvec (table : & [i8 ], energy : & [f32 ], n : usize ) -> Vec <f32 > {
356+ let mut next = vec! [0.0f32 ; n ];
357+ for i in 0 .. n {
358+ if energy [i ]. abs () < 1e - 8 { continue ; }
359+ let row = & table [i * n .. (i + 1 ) * n ];
360+ for j in 0 .. n {
361+ // SIGNED: table[i][j] > 0 = excitation, < 0 = inhibition
362+ next [j ] += (row [j ] as f32 / 127.0 ) * energy [i ];
363+ }
364+ }
365+ // CLAMP: inhibited atoms die (negative energy → 0)
366+ for e in & mut next {
367+ * e = e . max (0.0 );
368+ }
369+ next
370+ }
371+ ```
372+
373+ ### The complete forward pass per layer
374+
375+ ``` rust
376+ fn layer_forward_signed (
377+ hidden : & mut [f32 ],
378+ table_q : & [i8 ], // raw (extern)
379+ table_gate : & [i8 ], // raw gate topology
380+ table_up : & [i8 ], // silu(gate)×up (intern, gate-modulated)
381+ table_down : & [i8 ], // raw (funnel)
382+ residual_scale : f32 , // 0.1 typical
383+ ) {
384+ let n = hidden . len ();
385+
386+ // 1. Attention sublayer (Q topology routes)
387+ let mut attn = hidden . to_vec ();
388+ rms_norm (& mut attn );
389+ attn = signed_matvec (table_q , & attn , n );
390+
391+ // 2. Gate modulates attention via NARS truth
392+ // (gate topology tells us which attention paths to trust)
393+ let gate_energy = signed_matvec (table_gate , & hidden , n );
394+ for i in 0 .. n {
395+ // Gate as confidence: high gate energy = trust this attention path
396+ let gate_trust = gate_energy [i ]. max (0.0 ) / (gate_energy [i ]. abs () + 1.0 );
397+ attn [i ] *= gate_trust ;
398+ }
399+
400+ // 3. Residual connection
401+ for i in 0 .. n { hidden [i ] += attn [i ] * residual_scale ; }
402+
403+ // 4. FFN sublayer (up is gate-modulated, down is raw)
404+ let mut ffn_in = hidden . to_vec ();
405+ rms_norm (& mut ffn_in );
406+ let up_out = signed_matvec (table_up , & ffn_in , n ); // ALREADY gate-corrected
407+ let ffn_out = signed_matvec (table_down , & up_out , n );
408+
409+ // 5. Residual connection
410+ for i in 0 .. n { hidden [i ] += ffn_out [i ] * residual_scale ; }
411+ }
412+ ```
413+
414+ ### Summary: which roles get gate × SiLU, which don't
415+
416+ ```
417+ Role Gate Modulation Formula for i8 table
418+ ──── ─────────────── ────────────────────
419+ Q NONE (extern) i8(cos(Q_i, Q_j) × scale)
420+ Gate NONE (IS the gate) i8(cos(Gate_i, Gate_j) × scale)
421+ K silu(gate) × K i8(cos(silu(g)⊙K_i, silu(g)⊙K_j) × scale)
422+ V silu(gate) × V i8(cos(silu(g)⊙V_i, silu(g)⊙V_j) × scale)
423+ Up silu(gate) × Up i8(cos(silu(g)⊙Up_i, silu(g)⊙Up_j) × scale)
424+ Down NONE (funnel) i8(cos(Down_i, Down_j) × scale)
425+
426+ ⊙ = elementwise multiply
427+ silu(x) = x / (1 + exp(-x))
428+ scale = 127.0 / max(|cosine_values|)
429+ ```
0 commit comments