Skip to content

Commit 03d9aa5

Browse files
committed
docs: signed i8 formulas per role — Q/K/V/Gate/Up/Down encoding + MatVec
Complete formulas for building signed i8 distance tables: Q: raw cosine → i8 (extern, no gate) K: silu(gate) × K → cosine → i8 (intern, gate-modulated) V: silu(gate) × V → cosine → i8 (intern, gate-modulated) Gate: raw cosine → i8 (IS the gate, topology reference) Up: silu(gate) × Up → cosine → i8 (strongest effect, 33% Δ) Down: raw cosine → i8 (funnel, receives gated result) Per-role scale factors from Qwopus BF16 measured ranges. Gate gets highest resolution (scale=552) because range is narrowest. Signed MatVec + clamp(0) = excitation/inhibition dynamics. Complete layer_forward_signed() showing gate as NARS trust modulator. https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
1 parent e8e1d63 commit 03d9aa5

1 file changed

Lines changed: 196 additions & 0 deletions

File tree

.claude/KNOWLEDGE_SYNC_SIGNED_SESSION.md

Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,3 +231,199 @@ HANDOVER DOCS:
231231
.claude/HANDOVER_MAVERICK_SESSION.md → i8 architecture, Maverick plan, temperature fix
232232
.claude/HANDOVER_CALIBRATION_SESSION.md → H1-H5 hypotheses, Cronbach α protocol
233233
```
234+
235+
---
236+
237+
## SIGNED i8 FORMULAS PER ROLE
238+
239+
### The encoding formula
240+
241+
For each weight row, the signed i8 value preserves the ACTUAL cosine polarity:
242+
243+
```
244+
scale_factor = 127.0 / max(|cosine_values|)
245+
i8_value = round(cosine × scale_factor).clamp(-128, +127)
246+
```
247+
248+
Per-role scale factors (from Qwopus 27B L0 measured cosine ranges):
249+
250+
```
251+
Role Cosine Range max(|cos|) Scale Factor
252+
──── ──────────── ────────── ────────────
253+
attn_qkv [-0.62, +0.69] 0.69 184.1
254+
ffn_gate [-0.23, +0.18] 0.23 552.2 ← HIGHEST RESOLUTION
255+
ffn_up [-0.08, +0.08] 0.08 1587.5 ← (but tiny range)
256+
ffn_down [-0.18, +0.10] 0.18 705.6
257+
ssm_out [-0.20, +0.28] 0.28 453.6
258+
```
259+
260+
Gate gets the most resolution because its range is narrow and centered at zero —
261+
exactly where the SiLU decision boundary lives.
262+
263+
### What each role's sign MEANS
264+
265+
```
266+
Q (Query) — "what is the world asking?"
267+
EXTERN. Input-dependent. The world asks what it asks.
268+
i8 encoding: round(cos(Q_row_i, Q_row_j) × scale) → i8
269+
270+
+i8: "query i and query j ask SIMILAR things"
271+
-i8: "query i and query j ask OPPOSITE things"
272+
0: "unrelated queries"
273+
274+
NO gate modulation. Q is raw.
275+
Formula: table_Q[i][j] = i8(cos(Q_centroid_i, Q_centroid_j) × scale_Q)
276+
277+
278+
K (Key) — "what do I know?" (gate-modulated)
279+
INTERN. Self-filtered knowledge index.
280+
i8 encoding: silu(gate) × K, THEN cosine, THEN i8
281+
282+
+i8: "knowledge i and knowledge j are CO-ACCESSIBLE through the gate"
283+
-i8: "gate opens i but BLOCKS j (or vice versa)"
284+
0: "no gate relationship"
285+
286+
Formula:
287+
activated_K_i = silu(gate_centroid_i) ⊙ K_centroid_i (elementwise)
288+
activated_K_j = silu(gate_centroid_j) ⊙ K_centroid_j
289+
table_K[i][j] = i8(cos(activated_K_i, activated_K_j) × scale_K)
290+
291+
WHY silu(gate) × K:
292+
gate[d] = +0.3 → silu(0.3) = 0.16 → K[d] × 0.16 → feature d PASSES (reduced)
293+
gate[d] = -0.1 → silu(-0.1) = -0.047 → K[d] × -0.047 → feature d INVERTED
294+
gate[d] = 0.0 → silu(0.0) = 0.0 → K[d] × 0.0 → feature d MASKED
295+
296+
Two keys with SAME gate opening pattern → positive cosine → excitation
297+
Two keys where gate opens OPPOSITE features → negative cosine → inhibition
298+
299+
300+
V (Value) — "what do I give?" (gate-modulated)
301+
Same as K but for content:
302+
Formula:
303+
activated_V_i = silu(gate_centroid_i) ⊙ V_centroid_i
304+
table_V[i][j] = i8(cos(activated_V_i, activated_V_j) × scale_V)
305+
306+
307+
Gate — "what am I ALLOWED to activate?"
308+
The gate IS the lens. Not a codebook entry.
309+
i8 encoding: raw gate-to-gate cosine (how similar are two gate patterns?)
310+
311+
+i8: "same gate opening pattern" (same features allowed)
312+
-i8: "OPPOSITE gate patterns" (what one allows, the other blocks)
313+
0: "unrelated gate patterns"
314+
315+
Formula: table_Gate[i][j] = i8(cos(Gate_centroid_i, Gate_centroid_j) × scale_Gate)
316+
317+
NOTE: 68.9% of gate values are near zero.
318+
This means most gate dimensions are in the SiLU decision zone.
319+
The SIGN of these near-zero values is the entire gate decision.
320+
i8 preserves this sign. u8 destroys it.
321+
322+
323+
Up — "how do I expand?" (gate × SiLU modulated)
324+
INTERN. The FFN expansion. Gate × SiLU × Up is the activation.
325+
i8 encoding: silu(gate) × up, THEN cosine, THEN i8
326+
327+
Formula:
328+
activated_Up_i = silu(gate_centroid_i) ⊙ Up_centroid_i
329+
activated_Up_j = silu(gate_centroid_j) ⊙ Up_centroid_j
330+
table_Up[i][j] = i8(cos(activated_Up_i, activated_Up_j) × scale_Up)
331+
332+
This is where the 33% error lives:
333+
Raw cos(Up_i, Up_j) std = 0.021
334+
cos(silu(gate)×Up_i, silu(gate)×Up_j) std = 0.051 ← 2.4× MORE SPREAD
335+
99.2% of table cells change. Mean Δ = 84.2 u8 levels.
336+
337+
Without gate modulation: Up table is WRONG by 33%.
338+
With gate modulation: Up table captures actual FFN activation topology.
339+
340+
341+
Down — "how do I compress?"
342+
EXTERN (funnel). Receives gate×up result, compresses back.
343+
i8 encoding: raw cosine (no gate modulation needed)
344+
345+
Formula: table_Down[i][j] = i8(cos(Down_centroid_i, Down_centroid_j) × scale_Down)
346+
347+
NO gate modulation. Down receives the already-gated signal.
348+
Like Q, it's a raw cosine encoding.
349+
```
350+
351+
### The MatVec with signed tables
352+
353+
```rust
354+
/// Signed MatVec: positive entries excite, negative entries inhibit.
355+
fn signed_matvec(table: &[i8], energy: &[f32], n: usize) -> Vec<f32> {
356+
let mut next = vec![0.0f32; n];
357+
for i in 0..n {
358+
if energy[i].abs() < 1e-8 { continue; }
359+
let row = &table[i * n..(i + 1) * n];
360+
for j in 0..n {
361+
// SIGNED: table[i][j] > 0 = excitation, < 0 = inhibition
362+
next[j] += (row[j] as f32 / 127.0) * energy[i];
363+
}
364+
}
365+
// CLAMP: inhibited atoms die (negative energy → 0)
366+
for e in &mut next {
367+
*e = e.max(0.0);
368+
}
369+
next
370+
}
371+
```
372+
373+
### The complete forward pass per layer
374+
375+
```rust
376+
fn layer_forward_signed(
377+
hidden: &mut [f32],
378+
table_q: &[i8], // raw (extern)
379+
table_gate: &[i8], // raw gate topology
380+
table_up: &[i8], // silu(gate)×up (intern, gate-modulated)
381+
table_down: &[i8], // raw (funnel)
382+
residual_scale: f32, // 0.1 typical
383+
) {
384+
let n = hidden.len();
385+
386+
// 1. Attention sublayer (Q topology routes)
387+
let mut attn = hidden.to_vec();
388+
rms_norm(&mut attn);
389+
attn = signed_matvec(table_q, &attn, n);
390+
391+
// 2. Gate modulates attention via NARS truth
392+
// (gate topology tells us which attention paths to trust)
393+
let gate_energy = signed_matvec(table_gate, &hidden, n);
394+
for i in 0..n {
395+
// Gate as confidence: high gate energy = trust this attention path
396+
let gate_trust = gate_energy[i].max(0.0) / (gate_energy[i].abs() + 1.0);
397+
attn[i] *= gate_trust;
398+
}
399+
400+
// 3. Residual connection
401+
for i in 0..n { hidden[i] += attn[i] * residual_scale; }
402+
403+
// 4. FFN sublayer (up is gate-modulated, down is raw)
404+
let mut ffn_in = hidden.to_vec();
405+
rms_norm(&mut ffn_in);
406+
let up_out = signed_matvec(table_up, &ffn_in, n); // ALREADY gate-corrected
407+
let ffn_out = signed_matvec(table_down, &up_out, n);
408+
409+
// 5. Residual connection
410+
for i in 0..n { hidden[i] += ffn_out[i] * residual_scale; }
411+
}
412+
```
413+
414+
### Summary: which roles get gate × SiLU, which don't
415+
416+
```
417+
Role Gate Modulation Formula for i8 table
418+
──── ─────────────── ────────────────────
419+
Q NONE (extern) i8(cos(Q_i, Q_j) × scale)
420+
Gate NONE (IS the gate) i8(cos(Gate_i, Gate_j) × scale)
421+
K silu(gate) × K i8(cos(silu(g)⊙K_i, silu(g)⊙K_j) × scale)
422+
V silu(gate) × V i8(cos(silu(g)⊙V_i, silu(g)⊙V_j) × scale)
423+
Up silu(gate) × Up i8(cos(silu(g)⊙Up_i, silu(g)⊙Up_j) × scale)
424+
Down NONE (funnel) i8(cos(Down_i, Down_j) × scale)
425+
426+
⊙ = elementwise multiply
427+
silu(x) = x / (1 + exp(-x))
428+
scale = 127.0 / max(|cosine_values|)
429+
```

0 commit comments

Comments
 (0)