|
| 1 | +# SESSION: Reverse-Engineer Reasoning via Causal Edge Diffing |
| 2 | + |
| 3 | +## MISSION |
| 4 | + |
| 5 | +Extract the structural geometry of "how to think" from: |
| 6 | +1. Llama 4 Maverick MoE gate projections (routing topology) |
| 7 | +2. Qwen3.5 base→distilled attention diffs (reasoning circuit) |
| 8 | +3. Cross-model comparison (scale-invariant reasoning atoms) |
| 9 | + |
| 10 | +Feed into NARS truth values on causal edges. First real training data |
| 11 | +for the NARS stack. |
| 12 | + |
| 13 | +## READ FIRST |
| 14 | + |
| 15 | +```bash |
| 16 | +cat src/hpc/gguf_indexer.rs # stream_index_gguf_bf16, classify_tensor |
| 17 | +cat src/hpc/nars.rs # TruthValue, revision, evidence |
| 18 | +cat src/hpc/bgz17_bridge.rs # Base17 type, L1 distance |
| 19 | +cat src/hpc/causality.rs # CausalEdge if it exists |
| 20 | +``` |
| 21 | + |
| 22 | +## PHASE 1: Index All Models (Q8_0, streaming) |
| 23 | + |
| 24 | +Five GGUF files, all single-shard, ~105 GB total: |
| 25 | + |
| 26 | +``` |
| 27 | +unsloth/Qwen3.5-27B-GGUF |
| 28 | + → Qwen3.5-27B-Q8_0.gguf 28.59 GB (base) |
| 29 | +
|
| 30 | +Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF |
| 31 | + → Qwen3.5-27B.Q8_0.gguf 28.59 GB (distilled v1) |
| 32 | +
|
| 33 | +Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF |
| 34 | + → Qwen3.5-27B.Q8_0.gguf 28.59 GB (distilled v2) |
| 35 | +
|
| 36 | +unsloth/Qwen3.5-9B-GGUF |
| 37 | + → Qwen3.5-9B-Q8_0.gguf 9.52 GB (base 9B) |
| 38 | +
|
| 39 | +Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-GGUF |
| 40 | + → Qwen3.5-9B.Q8_0.gguf 9.52 GB (distilled 9B) |
| 41 | +``` |
| 42 | + |
| 43 | +Use `stream_index_gguf` (f32 path — Q8_0 needs actual dequantization). |
| 44 | +Output: 5 bgz7 files with per-tensor, per-row Base17 projections. |
| 45 | + |
| 46 | +## PHASE 2: Attention Diff (the reasoning circuit) |
| 47 | + |
| 48 | +For each tensor pair (base vs distilled), matched by name: |
| 49 | + |
| 50 | +```rust |
| 51 | +// Pseudocode — actual implementation in causal_diff.rs |
| 52 | +for (name, base_rows, dist_rows) in matched_tensors(base_bgz7, dist_bgz7) { |
| 53 | + let layer_type = classify_tensor(name); |
| 54 | + |
| 55 | + for (row_idx, (b, d)) in base_rows.zip(dist_rows).enumerate() { |
| 56 | + let distance = b.l1(&d); |
| 57 | + |
| 58 | + if distance > threshold { |
| 59 | + let edge = CausalEdge64 { |
| 60 | + subject: palette_index(b), // base archetype |
| 61 | + verb: BECOMES, // structural transformation |
| 62 | + object: palette_index(d), // distilled archetype |
| 63 | + truth: TruthValue { |
| 64 | + frequency: distance as f32 / max_l1 as f32, |
| 65 | + confidence: 1.0 / (1.0 + row_count as f32), // NARS evidence |
| 66 | + }, |
| 67 | + }; |
| 68 | + |
| 69 | + // Tag with attention-specific metadata |
| 70 | + match classify_projection(name) { |
| 71 | + Q => emit_q_edge(edge, layer, head), |
| 72 | + K => emit_k_edge(edge, layer, head), |
| 73 | + V => emit_v_edge(edge, layer, head), |
| 74 | + O => emit_o_edge(edge, layer, head), |
| 75 | + Gate => emit_gate_edge(edge, layer), |
| 76 | + _ => emit_generic(edge), |
| 77 | + } |
| 78 | + } |
| 79 | + } |
| 80 | +} |
| 81 | +``` |
| 82 | + |
| 83 | +### What Each Projection Shift Means |
| 84 | + |
| 85 | +``` |
| 86 | +Q shifted, K stable → model asks NEW questions of SAME information |
| 87 | + = learned to LOOK for reasoning structure |
| 88 | + NARS: high frequency, high confidence |
| 89 | +
|
| 90 | +K shifted → model EXPOSES different features to attention |
| 91 | + = deeper change, new token-level signals |
| 92 | + NARS: moderate frequency, lower confidence (rarer) |
| 93 | +
|
| 94 | +V shifted → WHAT gets retrieved changed |
| 95 | + = content-level reasoning substrate |
| 96 | + NARS: varies by layer depth |
| 97 | +
|
| 98 | +O shifted → HOW multi-head outputs COMBINE |
| 99 | + = synthesis/integration change |
| 100 | + NARS: if high → distillation core is integration |
| 101 | +
|
| 102 | +Q+O shift, K stable → REASONING SCAFFOLD CIRCUIT |
| 103 | + = the minimal structural change for reasoning |
| 104 | + These heads ARE the distillation's value |
| 105 | +``` |
| 106 | + |
| 107 | +### Attention Head Clustering |
| 108 | + |
| 109 | +``` |
| 110 | +Cluster 1: Q+O shift, K stable → "reasoning scaffold" heads |
| 111 | +Cluster 2: K+V shift → "representation change" heads |
| 112 | +Cluster 3: all stable → "unchanged capability" heads |
| 113 | +Cluster 4: Q shift only → "query refinement" heads |
| 114 | +
|
| 115 | +Each cluster → one Sigma concept node |
| 116 | +Cross-model same cluster → SUPPORTS edge (scale-invariant) |
| 117 | +Cross-model different cluster → CONTRADICTS edge (scale-dependent) |
| 118 | +``` |
| 119 | + |
| 120 | +## PHASE 3: MoE Gate Topology (from Maverick bgz7) |
| 121 | + |
| 122 | +The Maverick bgz7 already has gate projections indexed. |
| 123 | +Extract the gate tensor Base17 patterns separately: |
| 124 | + |
| 125 | +``` |
| 126 | +blk.{N}.ffn_gate_inp → router gate [n_experts, hidden_dim] |
| 127 | + Each ROW = one expert's activation pattern |
| 128 | + Base17 of that row = expert's structural identity |
| 129 | +``` |
| 130 | + |
| 131 | +Expert identity in Base17 space: |
| 132 | +- Experts with similar Base17 → structurally redundant (SUPPORTS) |
| 133 | +- Experts with distant Base17 → specialized (distinct concept nodes) |
| 134 | +- Cluster the 128 expert fingerprints → find natural expert groups |
| 135 | + |
| 136 | +Cross with attention: which attention heads' Q projections align |
| 137 | +with which expert gate patterns? That alignment = the routing circuit. |
| 138 | + |
| 139 | +``` |
| 140 | +head_17_Q_pattern ──CAUSES──→ expert_37_gate_pattern |
| 141 | + (this head's queries activate this expert) |
| 142 | + truth: cosine(head_Q_base17, expert_gate_base17) |
| 143 | +``` |
| 144 | + |
| 145 | +## PHASE 4: NARS Truth Population |
| 146 | + |
| 147 | +Every edge from phases 2-3 carries a TruthValue: |
| 148 | + |
| 149 | +```rust |
| 150 | +TruthValue { |
| 151 | + frequency: f32, // how often this transformation occurs |
| 152 | + // = proportion of rows in this tensor that shifted |
| 153 | + confidence: f32, // evidence strength |
| 154 | + // = 1 - 1/(1+k) where k = number of observed rows |
| 155 | +} |
| 156 | +``` |
| 157 | + |
| 158 | +NARS revision across models: |
| 159 | +``` |
| 160 | +evidence_27b_v1: (f=0.7, c=0.92) // 70% of Q rows shifted in 27B v1 |
| 161 | +evidence_27b_v2: (f=0.8, c=0.92) // 80% shifted in v2 (more distillation) |
| 162 | +evidence_9b: (f=0.5, c=0.88) // only 50% shifted in 9B (capacity limit) |
| 163 | +
|
| 164 | +revised = nars_revision(evidence_27b_v1, evidence_9b) |
| 165 | + → (f=0.62, c=0.95) // integrated belief about reasoning scaffold |
| 166 | +``` |
| 167 | + |
| 168 | +The revised truth tells you: "reasoning scaffold changes affect ~62% of |
| 169 | +Q projection rows, with 95% confidence, scale-dependent (27B > 9B)." |
| 170 | + |
| 171 | +## PHASE 5: Sigma Concept Nodes (Ada Integration) |
| 172 | + |
| 173 | +Each cluster of edges becomes a concept in Ada's graph: |
| 174 | + |
| 175 | +``` |
| 176 | +Σ.reasoning_scaffold = { |
| 177 | + evidence: [27b_v1_edges, 27b_v2_edges, 9b_edges], |
| 178 | + truth: revised_truth, |
| 179 | + composition: {Q_shift: 0.73, O_shift: 0.82, K_stable: 0.95}, |
| 180 | + heads: [17, 23, 24, 31], // discovered by clustering |
| 181 | + scale_invariant: false, // 9B diverges |
| 182 | + source: "Qwen3.5 → Claude-Opus distillation" |
| 183 | +} |
| 184 | +
|
| 185 | +Σ.expert_redundancy = { |
| 186 | + evidence: [maverick_gate_similarities], |
| 187 | + truth: (f=0.96, c=0.99), // 96% structurally interchangeable |
| 188 | + meaning: "MoE expert weights are commodity, routing is intelligence" |
| 189 | +} |
| 190 | +
|
| 191 | +Σ.reasoning_scaffold ──CAUSES──→ Σ.expert_redundancy |
| 192 | + // reasoning heads SHAPE what the router sees |
| 193 | + // truth: to be discovered by cross-model alignment |
| 194 | +``` |
| 195 | + |
| 196 | +## DELIVERABLES |
| 197 | + |
| 198 | +1. `causal_diff.rs` — load two bgz7 files, emit CausalEdge64 per shifted row |
| 199 | +2. `attention_cluster.rs` — cluster edges by projection type per head |
| 200 | +3. Test: `test_qwen35_reasoning_diff` — run the full 5-model pipeline |
| 201 | +4. Test: `test_maverick_gate_topology` — extract gate patterns from existing bgz7 |
| 202 | +5. Output: `.claude/knowledge/reasoning_reverse_eng_results.md` |
| 203 | + |
| 204 | +## WHY THIS MATTERS |
| 205 | + |
| 206 | +The NARS stack has: |
| 207 | +- TruthValue with frequency + confidence ✓ |
| 208 | +- Revision (evidence integration) ✓ |
| 209 | +- Inference rules ✓ |
| 210 | +- Graph storage ✓ |
| 211 | + |
| 212 | +What it's MISSING: real evidence. Every truth value is currently |
| 213 | +manufactured. This pipeline generates the first OBSERVED truth values |
| 214 | +from actual model weight differences. The NARS stack goes from |
| 215 | +theoretical to empirical in one session. |
| 216 | + |
| 217 | +The thinking orchestration atoms (mcp-orchestrator-vsa) can then |
| 218 | +CONSTRUCT reasoning patterns from the observed evidence: |
| 219 | +"To add structured reasoning to a model, shift Q+O projections |
| 220 | +in heads [17,23,24,31] by palette distance 3-7. Expected improvement: |
| 221 | +f=0.62±0.15 at c=0.95." |
| 222 | + |
| 223 | +That's not prompt engineering. That's weight-space surgery |
| 224 | +informed by causal evidence. Programming AGI by observation. |
0 commit comments