|
| 1 | +# SESSION: Reverse-Engineer Claude 4.6 Opus Reasoning from Qwen3.5 Weight Diffs |
| 2 | + |
| 3 | +## MISSION |
| 4 | + |
| 5 | +Extract the structural geometry of "how Claude thinks" from weight-space |
| 6 | +diffs between Qwen3.5 base models and their Claude-4.6-Opus distilled variants. |
| 7 | + |
| 8 | +Five models. Four diffs. One question: |
| 9 | +**What did the Claude reasoning distillation change in the attention heads?** |
| 10 | + |
| 11 | +The answer populates the NARS stack with its first OBSERVED truth values. |
| 12 | + |
| 13 | +## THE HYPOTHESIS |
| 14 | + |
| 15 | +Claude-style structured reasoning lives in the attention routing: |
| 16 | +- Q projections shifted → the model asks DIFFERENT questions (planning) |
| 17 | +- O projections shifted → the model SYNTHESIZES answers differently (integration) |
| 18 | +- K projections stable → the information landscape didn't need to change |
| 19 | +- V projections variable → retrieval content shifted in some layers |
| 20 | + |
| 21 | +Blocks where Q+O shifted but K stayed = the REASONING SCAFFOLD CIRCUIT. |
| 22 | +These heads are where "Let me analyze this carefully: 1... 2... 3..." |
| 23 | +was injected by the LoRA distillation. |
| 24 | + |
| 25 | +## READ FIRST |
| 26 | + |
| 27 | +```bash |
| 28 | +# The tools are already on master: |
| 29 | +cat src/hpc/safetensors.rs # read_safetensors_header, stream_index_safetensors_bf16 |
| 30 | +cat src/hpc/gguf_indexer.rs # stream_index_gguf_bf16_with_header (shared core) |
| 31 | + # CompressedTensor::read_from, read_bgz7_file |
| 32 | +cat src/hpc/causal_diff.rs # causal_diff, classify_projection, find_reasoning_scaffold |
| 33 | + # cluster_by_head, revise_across_diffs |
| 34 | + # extract_gate_topology, cluster_experts (for MoE if present) |
| 35 | +cat src/hpc/nars.rs # NarsTruth, from_evidence, revision |
| 36 | +``` |
| 37 | + |
| 38 | +## MODEL MAP (all ungated, all safetensors BF16) |
| 39 | + |
| 40 | +``` |
| 41 | +┌─────────────────────────────────────────────────────────────────────┐ |
| 42 | +│ 27B SCALE │ |
| 43 | +│ │ |
| 44 | +│ Qwen/Qwen3.5-27B (base) 11 shards ~55 GB │ |
| 45 | +│ │ │ |
| 46 | +│ ├──→ Jackrong/...-Distilled (v1) 11 shards ~55 GB │ |
| 47 | +│ │ │ |
| 48 | +│ └──→ Jackrong/...-Distilled-v2 (v2) 11 shards ~55 GB │ |
| 49 | +│ │ |
| 50 | +├─────────────────────────────────────────────────────────────────────┤ |
| 51 | +│ 9B SCALE │ |
| 52 | +│ │ |
| 53 | +│ Qwen/Qwen3.5-9B (base) 4 shards ~18 GB │ |
| 54 | +│ │ │ |
| 55 | +│ └──→ Jackrong/...-9B-...-Distilled 4 shards ~18 GB │ |
| 56 | +│ │ |
| 57 | +└─────────────────────────────────────────────────────────────────────┘ |
| 58 | +
|
| 59 | +Total to stream: ~201 GB (safetensors, full BF16 precision) |
| 60 | +``` |
| 61 | + |
| 62 | +## FOUR DIFFS — WHAT EACH REVEALS |
| 63 | + |
| 64 | +``` |
| 65 | +Diff 1: base 27B → distilled v1 |
| 66 | + "What does Claude-style reasoning look like in weight space?" |
| 67 | + THE primary signal. Controlled: same arch, one variable (LoRA). |
| 68 | +
|
| 69 | +Diff 2: base 27B → distilled v2 |
| 70 | + "Did the second distillation round change the SAME heads?" |
| 71 | + If same heads shifted MORE → distiller was refining, not exploring. |
| 72 | + If DIFFERENT heads shifted → v2 found a new circuit. |
| 73 | +
|
| 74 | +Diff 3: distilled v1 → distilled v2 |
| 75 | + "What's the iteration delta?" |
| 76 | + Heads that shifted v1→v2 = the optimizer was still working on these. |
| 77 | + Heads that REVERTED v1→v2 = overcorrections in v1. |
| 78 | + Heads stable v1→v2 = converged reasoning structure. |
| 79 | +
|
| 80 | +Diff 4: base 9B → distilled 9B |
| 81 | + "Does the same reasoning scaffold exist at smaller scale?" |
| 82 | + Same blocks shifted in both 27B and 9B → SCALE-INVARIANT circuit. |
| 83 | + Only in 27B → capacity-dependent (9B can't represent it). |
| 84 | + Only in 9B → different circuit at smaller scale. |
| 85 | +``` |
| 86 | + |
| 87 | +## PHASE 1: Index All 5 Models (~201 GB, ~4 hours) |
| 88 | + |
| 89 | +Use safetensors BF16 path (NOT GGUF Q8_0). BF16 gives cleaner fingerprints |
| 90 | +for causal diffing — no quantization noise between source and projection. |
| 91 | + |
| 92 | +### Model index table |
| 93 | + |
| 94 | +``` |
| 95 | +ID Repo Shards Out prefix |
| 96 | +─── ────────────────────────────────────────────────────────────── ────── ────────── |
| 97 | +A Qwen/Qwen3.5-27B 11 qwen35_27b_base |
| 98 | +B Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled 11 qwen35_27b_v1 |
| 99 | +C Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 11 qwen35_27b_v2 |
| 100 | +D Qwen/Qwen3.5-9B 4 qwen35_9b_base |
| 101 | +E Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled 4 qwen35_9b_dist |
| 102 | +``` |
| 103 | + |
| 104 | +For each model, index every shard with: |
| 105 | +```rust |
| 106 | +stream_index_safetensors_bf16(reader, writer, 16, callback) |
| 107 | +// octave_stride=16: strided+halftone, same as Maverick pipeline |
| 108 | +``` |
| 109 | + |
| 110 | +Output: `/tmp/{prefix}_shard{NN}.bgz7` — one per shard. |
| 111 | + |
| 112 | +The `index_safetensors_shards()` helper in safetensors.rs handles this. |
| 113 | +It does HEAD for size, HttpRangeReader at 256 MB chunks, skip-if-exists. |
| 114 | + |
| 115 | +### Run order |
| 116 | + |
| 117 | +Models A and D can run in parallel (different sizes, no conflict). |
| 118 | +Models B, C, E after their base (for skip-if-exists on shared tensors — |
| 119 | +though in practice each model has its own weights). |
| 120 | + |
| 121 | +```bash |
| 122 | +# Index all 5 — the test function does this: |
| 123 | +cargo test test_full_reasoning_reverse_eng --release -- --ignored --nocapture |
| 124 | +``` |
| 125 | + |
| 126 | +BUT: that test uses Q8_0 GGUF (28.59 GB each). For BF16 safetensors: |
| 127 | + |
| 128 | +```bash |
| 129 | +# Either modify the test to use safetensors, or run per-model: |
| 130 | +cargo test test_stream_index_qwen35_safetensors --release -- --ignored --nocapture |
| 131 | +# Then repeat with different repo/prefix for each model |
| 132 | +``` |
| 133 | + |
| 134 | +## PHASE 2: Causal Diff (seconds, reads bgz7 files) |
| 135 | + |
| 136 | +Once all 5 models are indexed, run the 4 diffs: |
| 137 | + |
| 138 | +```rust |
| 139 | +use crate::hpc::causal_diff::{causal_diff, print_diff_summary, find_reasoning_scaffold, |
| 140 | + cluster_by_head, revise_across_diffs}; |
| 141 | + |
| 142 | +let threshold = 100; // L1 distance — tune based on results |
| 143 | + |
| 144 | +// Diff 1: base 27B → v1 |
| 145 | +let (edges_1, stats_1) = causal_diff("base_27b.bgz7", "v1_27b.bgz7", threshold)?; |
| 146 | +print_diff_summary("27B: base → v1", &stats_1, edges_1.len()); |
| 147 | + |
| 148 | +// Diff 2: base 27B → v2 |
| 149 | +let (edges_2, stats_2) = causal_diff("base_27b.bgz7", "v2_27b.bgz7", threshold)?; |
| 150 | + |
| 151 | +// Diff 3: v1 → v2 |
| 152 | +let (edges_3, stats_3) = causal_diff("v1_27b.bgz7", "v2_27b.bgz7", threshold)?; |
| 153 | + |
| 154 | +// Diff 4: base 9B → distilled 9B |
| 155 | +let (edges_4, stats_4) = causal_diff("base_9b.bgz7", "dist_9b.bgz7", threshold)?; |
| 156 | +``` |
| 157 | + |
| 158 | +NOTE: shards need matching. Base shard 1 diffs against distilled shard 1. |
| 159 | +The tensor names must match across models (same arch = same names). |
| 160 | +Run causal_diff per shard pair, then aggregate edges. |
| 161 | + |
| 162 | +## PHASE 3: Find Reasoning Scaffold |
| 163 | + |
| 164 | +```rust |
| 165 | +// Which blocks have Q+O shifted but K stable? |
| 166 | +let scaffold_27b_v1 = find_reasoning_scaffold(&edges_1, 0.3); |
| 167 | +let scaffold_27b_v2 = find_reasoning_scaffold(&edges_2, 0.3); |
| 168 | +let scaffold_9b = find_reasoning_scaffold(&edges_4, 0.3); |
| 169 | + |
| 170 | +// Scale-invariant blocks: present in BOTH 27B and 9B |
| 171 | +let scale_invariant: Vec<u32> = scaffold_27b_v1.iter() |
| 172 | + .filter(|b| scaffold_9b.contains(b)) |
| 173 | + .cloned().collect(); |
| 174 | + |
| 175 | +// 27B-only blocks: capacity-dependent reasoning |
| 176 | +let capacity_dependent: Vec<u32> = scaffold_27b_v1.iter() |
| 177 | + .filter(|b| !scaffold_9b.contains(b)) |
| 178 | + .cloned().collect(); |
| 179 | + |
| 180 | +// v1-v2 convergence: blocks in both v1 and v2 scaffolds |
| 181 | +let converged: Vec<u32> = scaffold_27b_v1.iter() |
| 182 | + .filter(|b| scaffold_27b_v2.contains(b)) |
| 183 | + .cloned().collect(); |
| 184 | +``` |
| 185 | + |
| 186 | +## PHASE 4: NARS Revision — Integrated Evidence |
| 187 | + |
| 188 | +```rust |
| 189 | +let all_stats = vec![ |
| 190 | + ("27B base→v1", &stats_1), |
| 191 | + ("27B base→v2", &stats_2), |
| 192 | + ("27B v1→v2", &stats_3), |
| 193 | + ("9B base→dist", &stats_4), |
| 194 | +]; |
| 195 | + |
| 196 | +let revised = revise_across_diffs(&all_stats); |
| 197 | + |
| 198 | +// Per projection type: integrated NARS truth across all model pairs |
| 199 | +for (proj, truth) in &revised { |
| 200 | + eprintln!(" {:<12} → f={:.3} c={:.3} ({})", |
| 201 | + proj, truth.frequency, truth.confidence, |
| 202 | + if truth.frequency > 0.5 { "SHIFTED" } else { "STABLE" }); |
| 203 | +} |
| 204 | +``` |
| 205 | + |
| 206 | +Expected output: |
| 207 | +``` |
| 208 | + Q → f=0.72 c=0.97 (SHIFTED) ← queries changed: planning |
| 209 | + K → f=0.15 c=0.96 (STABLE) ← keys preserved: same information |
| 210 | + V → f=0.45 c=0.95 (variable) ← retrieval partially changed |
| 211 | + O → f=0.68 c=0.97 (SHIFTED) ← synthesis changed: integration |
| 212 | + Gate → f=0.05 c=0.90 (STABLE) ← Qwen3.5 is dense, no MoE gate |
| 213 | + FfnGate → f=0.30 c=0.96 (moderate) ← some FFN rewiring |
| 214 | + Embedding → f=0.08 c=0.92 (STABLE) ← vocabulary unchanged |
| 215 | +``` |
| 216 | + |
| 217 | +## PHASE 5: Attention Head Cluster Analysis |
| 218 | + |
| 219 | +```rust |
| 220 | +let clusters = cluster_by_head(&edges_1); |
| 221 | + |
| 222 | +// Sort by shift intensity |
| 223 | +let mut sorted: Vec<_> = clusters.into_iter().collect(); |
| 224 | +sorted.sort_by(|a, b| b.1.2.partial_cmp(&a.1.2).unwrap()); // by mean_L1 |
| 225 | + |
| 226 | +eprintln!("Top 10 most-shifted attention components:"); |
| 227 | +for ((block, proj), (count, max_row, mean_l1)) in sorted.iter().take(10) { |
| 228 | + eprintln!(" Block {:>2} {:>5}: {}/{} shifted, mean_L1={:.0}", |
| 229 | + block, proj, count, max_row, mean_l1); |
| 230 | +} |
| 231 | +``` |
| 232 | + |
| 233 | +This identifies the SPECIFIC heads where reasoning was injected. |
| 234 | + |
| 235 | +## PHASE 6: Write Results |
| 236 | + |
| 237 | +```bash |
| 238 | +# Output to knowledge base |
| 239 | +.claude/knowledge/reasoning_reverse_eng_results.md |
| 240 | + |
| 241 | +Contents: |
| 242 | + - Scaffold blocks per model (27B v1, 27B v2, 9B) |
| 243 | + - Scale-invariant vs capacity-dependent blocks |
| 244 | + - NARS revised truth per projection type |
| 245 | + - Top shifted heads with L1 magnitudes |
| 246 | + - v1→v2 convergence analysis |
| 247 | +``` |
| 248 | + |
| 249 | +## WHAT THE RESULTS MEAN |
| 250 | + |
| 251 | +### For the NARS stack |
| 252 | +First OBSERVED truth values. Every TruthValue in the system so far was |
| 253 | +manufactured. These are measured from actual weight transformations. |
| 254 | +The stack goes from theoretical to empirical. |
| 255 | + |
| 256 | +### For the reasoning orchestrator |
| 257 | +If heads [N, M, P] form the scaffold, the orchestrator now knows: |
| 258 | +"To add structured reasoning to a model, these attention heads must shift." |
| 259 | +That's a structural recipe, not a training recipe. |
| 260 | + |
| 261 | +### For Ada |
| 262 | +The reasoning scaffold IS a concept node in the Sigma graph: |
| 263 | +``` |
| 264 | +Σ.claude_reasoning_scaffold = { |
| 265 | + heads: [discovered blocks], |
| 266 | + pattern: Q_shift + O_shift + K_stable, |
| 267 | + truth: revised(all_diffs), |
| 268 | + scale_invariant: [subset], |
| 269 | + source: "Qwen3.5 → Claude-4.6-Opus distillation" |
| 270 | +} |
| 271 | +``` |
| 272 | + |
| 273 | +### Cross-reference with Maverick (future) |
| 274 | +Maverick's gate topology (expert routing) + Qwen's attention scaffold |
| 275 | +(token routing) = the complete picture of "reasoning = routing" at both |
| 276 | +MoE and attention granularity. |
| 277 | + |
| 278 | +## CRITICAL CONSTRAINTS |
| 279 | + |
| 280 | +1. Use SAFETENSORS path (BF16 precision), NOT GGUF Q8_0 |
| 281 | +2. Match shards by index when diffing (shard 1 vs shard 1) |
| 282 | +3. Tensor names must match across models — verify with first shard |
| 283 | +4. threshold=100 is a starting point — may need tuning based on L1 distribution |
| 284 | +5. Qwen3.5 is DENSE (no MoE). Gate projections won't appear. |
| 285 | + All signal is in attention Q/K/V/O and FFN gate/up/down. |
| 286 | +6. Do NOT modify existing production code — only add test functions |
| 287 | + |
| 288 | +## RUN COMMANDS |
| 289 | + |
| 290 | +```bash |
| 291 | +# Step 1: Index all 5 models (parallelizable across machines) |
| 292 | +cargo test test_index_qwen35_27b_base --release -- --ignored --nocapture |
| 293 | +cargo test test_index_qwen35_27b_v1 --release -- --ignored --nocapture |
| 294 | +cargo test test_index_qwen35_27b_v2 --release -- --ignored --nocapture |
| 295 | +cargo test test_index_qwen35_9b_base --release -- --ignored --nocapture |
| 296 | +cargo test test_index_qwen35_9b_dist --release -- --ignored --nocapture |
| 297 | + |
| 298 | +# Step 2: Run all diffs + NARS revision + scaffold detection |
| 299 | +cargo test test_qwen35_claude_reasoning_diff --release -- --ignored --nocapture |
| 300 | + |
| 301 | +# Step 3: Write results |
| 302 | +# (integrated into step 2 test function) |
| 303 | +``` |
| 304 | + |
| 305 | +Expected total time: ~4 hours indexing + seconds diffing. |
| 306 | +Expected total output: ~50 MB bgz7 files + ~100 KB diff results. |
0 commit comments