Skip to content

Commit a7b1a23

Browse files
committed
session: reverse-engineer Claude 4.6 Opus reasoning from Qwen3.5 weight diffs (BF16 safetensors)
5 models, 4 diffs, NARS revision, reasoning scaffold detection. Uses safetensors BF16 path for clean fingerprints. All models ungated. ~201 GB total.
1 parent a1d5049 commit a7b1a23

1 file changed

Lines changed: 306 additions & 0 deletions

File tree

Lines changed: 306 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,306 @@
1+
# SESSION: Reverse-Engineer Claude 4.6 Opus Reasoning from Qwen3.5 Weight Diffs
2+
3+
## MISSION
4+
5+
Extract the structural geometry of "how Claude thinks" from weight-space
6+
diffs between Qwen3.5 base models and their Claude-4.6-Opus distilled variants.
7+
8+
Five models. Four diffs. One question:
9+
**What did the Claude reasoning distillation change in the attention heads?**
10+
11+
The answer populates the NARS stack with its first OBSERVED truth values.
12+
13+
## THE HYPOTHESIS
14+
15+
Claude-style structured reasoning lives in the attention routing:
16+
- Q projections shifted → the model asks DIFFERENT questions (planning)
17+
- O projections shifted → the model SYNTHESIZES answers differently (integration)
18+
- K projections stable → the information landscape didn't need to change
19+
- V projections variable → retrieval content shifted in some layers
20+
21+
Blocks where Q+O shifted but K stayed = the REASONING SCAFFOLD CIRCUIT.
22+
These heads are where "Let me analyze this carefully: 1... 2... 3..."
23+
was injected by the LoRA distillation.
24+
25+
## READ FIRST
26+
27+
```bash
28+
# The tools are already on master:
29+
cat src/hpc/safetensors.rs # read_safetensors_header, stream_index_safetensors_bf16
30+
cat src/hpc/gguf_indexer.rs # stream_index_gguf_bf16_with_header (shared core)
31+
# CompressedTensor::read_from, read_bgz7_file
32+
cat src/hpc/causal_diff.rs # causal_diff, classify_projection, find_reasoning_scaffold
33+
# cluster_by_head, revise_across_diffs
34+
# extract_gate_topology, cluster_experts (for MoE if present)
35+
cat src/hpc/nars.rs # NarsTruth, from_evidence, revision
36+
```
37+
38+
## MODEL MAP (all ungated, all safetensors BF16)
39+
40+
```
41+
┌─────────────────────────────────────────────────────────────────────┐
42+
│ 27B SCALE │
43+
│ │
44+
│ Qwen/Qwen3.5-27B (base) 11 shards ~55 GB │
45+
│ │ │
46+
│ ├──→ Jackrong/...-Distilled (v1) 11 shards ~55 GB │
47+
│ │ │
48+
│ └──→ Jackrong/...-Distilled-v2 (v2) 11 shards ~55 GB │
49+
│ │
50+
├─────────────────────────────────────────────────────────────────────┤
51+
│ 9B SCALE │
52+
│ │
53+
│ Qwen/Qwen3.5-9B (base) 4 shards ~18 GB │
54+
│ │ │
55+
│ └──→ Jackrong/...-9B-...-Distilled 4 shards ~18 GB │
56+
│ │
57+
└─────────────────────────────────────────────────────────────────────┘
58+
59+
Total to stream: ~201 GB (safetensors, full BF16 precision)
60+
```
61+
62+
## FOUR DIFFS — WHAT EACH REVEALS
63+
64+
```
65+
Diff 1: base 27B → distilled v1
66+
"What does Claude-style reasoning look like in weight space?"
67+
THE primary signal. Controlled: same arch, one variable (LoRA).
68+
69+
Diff 2: base 27B → distilled v2
70+
"Did the second distillation round change the SAME heads?"
71+
If same heads shifted MORE → distiller was refining, not exploring.
72+
If DIFFERENT heads shifted → v2 found a new circuit.
73+
74+
Diff 3: distilled v1 → distilled v2
75+
"What's the iteration delta?"
76+
Heads that shifted v1→v2 = the optimizer was still working on these.
77+
Heads that REVERTED v1→v2 = overcorrections in v1.
78+
Heads stable v1→v2 = converged reasoning structure.
79+
80+
Diff 4: base 9B → distilled 9B
81+
"Does the same reasoning scaffold exist at smaller scale?"
82+
Same blocks shifted in both 27B and 9B → SCALE-INVARIANT circuit.
83+
Only in 27B → capacity-dependent (9B can't represent it).
84+
Only in 9B → different circuit at smaller scale.
85+
```
86+
87+
## PHASE 1: Index All 5 Models (~201 GB, ~4 hours)
88+
89+
Use safetensors BF16 path (NOT GGUF Q8_0). BF16 gives cleaner fingerprints
90+
for causal diffing — no quantization noise between source and projection.
91+
92+
### Model index table
93+
94+
```
95+
ID Repo Shards Out prefix
96+
─── ────────────────────────────────────────────────────────────── ────── ──────────
97+
A Qwen/Qwen3.5-27B 11 qwen35_27b_base
98+
B Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled 11 qwen35_27b_v1
99+
C Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 11 qwen35_27b_v2
100+
D Qwen/Qwen3.5-9B 4 qwen35_9b_base
101+
E Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled 4 qwen35_9b_dist
102+
```
103+
104+
For each model, index every shard with:
105+
```rust
106+
stream_index_safetensors_bf16(reader, writer, 16, callback)
107+
// octave_stride=16: strided+halftone, same as Maverick pipeline
108+
```
109+
110+
Output: `/tmp/{prefix}_shard{NN}.bgz7` — one per shard.
111+
112+
The `index_safetensors_shards()` helper in safetensors.rs handles this.
113+
It does HEAD for size, HttpRangeReader at 256 MB chunks, skip-if-exists.
114+
115+
### Run order
116+
117+
Models A and D can run in parallel (different sizes, no conflict).
118+
Models B, C, E after their base (for skip-if-exists on shared tensors —
119+
though in practice each model has its own weights).
120+
121+
```bash
122+
# Index all 5 — the test function does this:
123+
cargo test test_full_reasoning_reverse_eng --release -- --ignored --nocapture
124+
```
125+
126+
BUT: that test uses Q8_0 GGUF (28.59 GB each). For BF16 safetensors:
127+
128+
```bash
129+
# Either modify the test to use safetensors, or run per-model:
130+
cargo test test_stream_index_qwen35_safetensors --release -- --ignored --nocapture
131+
# Then repeat with different repo/prefix for each model
132+
```
133+
134+
## PHASE 2: Causal Diff (seconds, reads bgz7 files)
135+
136+
Once all 5 models are indexed, run the 4 diffs:
137+
138+
```rust
139+
use crate::hpc::causal_diff::{causal_diff, print_diff_summary, find_reasoning_scaffold,
140+
cluster_by_head, revise_across_diffs};
141+
142+
let threshold = 100; // L1 distance — tune based on results
143+
144+
// Diff 1: base 27B → v1
145+
let (edges_1, stats_1) = causal_diff("base_27b.bgz7", "v1_27b.bgz7", threshold)?;
146+
print_diff_summary("27B: base → v1", &stats_1, edges_1.len());
147+
148+
// Diff 2: base 27B → v2
149+
let (edges_2, stats_2) = causal_diff("base_27b.bgz7", "v2_27b.bgz7", threshold)?;
150+
151+
// Diff 3: v1 → v2
152+
let (edges_3, stats_3) = causal_diff("v1_27b.bgz7", "v2_27b.bgz7", threshold)?;
153+
154+
// Diff 4: base 9B → distilled 9B
155+
let (edges_4, stats_4) = causal_diff("base_9b.bgz7", "dist_9b.bgz7", threshold)?;
156+
```
157+
158+
NOTE: shards need matching. Base shard 1 diffs against distilled shard 1.
159+
The tensor names must match across models (same arch = same names).
160+
Run causal_diff per shard pair, then aggregate edges.
161+
162+
## PHASE 3: Find Reasoning Scaffold
163+
164+
```rust
165+
// Which blocks have Q+O shifted but K stable?
166+
let scaffold_27b_v1 = find_reasoning_scaffold(&edges_1, 0.3);
167+
let scaffold_27b_v2 = find_reasoning_scaffold(&edges_2, 0.3);
168+
let scaffold_9b = find_reasoning_scaffold(&edges_4, 0.3);
169+
170+
// Scale-invariant blocks: present in BOTH 27B and 9B
171+
let scale_invariant: Vec<u32> = scaffold_27b_v1.iter()
172+
.filter(|b| scaffold_9b.contains(b))
173+
.cloned().collect();
174+
175+
// 27B-only blocks: capacity-dependent reasoning
176+
let capacity_dependent: Vec<u32> = scaffold_27b_v1.iter()
177+
.filter(|b| !scaffold_9b.contains(b))
178+
.cloned().collect();
179+
180+
// v1-v2 convergence: blocks in both v1 and v2 scaffolds
181+
let converged: Vec<u32> = scaffold_27b_v1.iter()
182+
.filter(|b| scaffold_27b_v2.contains(b))
183+
.cloned().collect();
184+
```
185+
186+
## PHASE 4: NARS Revision — Integrated Evidence
187+
188+
```rust
189+
let all_stats = vec![
190+
("27B base→v1", &stats_1),
191+
("27B base→v2", &stats_2),
192+
("27B v1→v2", &stats_3),
193+
("9B base→dist", &stats_4),
194+
];
195+
196+
let revised = revise_across_diffs(&all_stats);
197+
198+
// Per projection type: integrated NARS truth across all model pairs
199+
for (proj, truth) in &revised {
200+
eprintln!(" {:<12} → f={:.3} c={:.3} ({})",
201+
proj, truth.frequency, truth.confidence,
202+
if truth.frequency > 0.5 { "SHIFTED" } else { "STABLE" });
203+
}
204+
```
205+
206+
Expected output:
207+
```
208+
Q → f=0.72 c=0.97 (SHIFTED) ← queries changed: planning
209+
K → f=0.15 c=0.96 (STABLE) ← keys preserved: same information
210+
V → f=0.45 c=0.95 (variable) ← retrieval partially changed
211+
O → f=0.68 c=0.97 (SHIFTED) ← synthesis changed: integration
212+
Gate → f=0.05 c=0.90 (STABLE) ← Qwen3.5 is dense, no MoE gate
213+
FfnGate → f=0.30 c=0.96 (moderate) ← some FFN rewiring
214+
Embedding → f=0.08 c=0.92 (STABLE) ← vocabulary unchanged
215+
```
216+
217+
## PHASE 5: Attention Head Cluster Analysis
218+
219+
```rust
220+
let clusters = cluster_by_head(&edges_1);
221+
222+
// Sort by shift intensity
223+
let mut sorted: Vec<_> = clusters.into_iter().collect();
224+
sorted.sort_by(|a, b| b.1.2.partial_cmp(&a.1.2).unwrap()); // by mean_L1
225+
226+
eprintln!("Top 10 most-shifted attention components:");
227+
for ((block, proj), (count, max_row, mean_l1)) in sorted.iter().take(10) {
228+
eprintln!(" Block {:>2} {:>5}: {}/{} shifted, mean_L1={:.0}",
229+
block, proj, count, max_row, mean_l1);
230+
}
231+
```
232+
233+
This identifies the SPECIFIC heads where reasoning was injected.
234+
235+
## PHASE 6: Write Results
236+
237+
```bash
238+
# Output to knowledge base
239+
.claude/knowledge/reasoning_reverse_eng_results.md
240+
241+
Contents:
242+
- Scaffold blocks per model (27B v1, 27B v2, 9B)
243+
- Scale-invariant vs capacity-dependent blocks
244+
- NARS revised truth per projection type
245+
- Top shifted heads with L1 magnitudes
246+
- v1→v2 convergence analysis
247+
```
248+
249+
## WHAT THE RESULTS MEAN
250+
251+
### For the NARS stack
252+
First OBSERVED truth values. Every TruthValue in the system so far was
253+
manufactured. These are measured from actual weight transformations.
254+
The stack goes from theoretical to empirical.
255+
256+
### For the reasoning orchestrator
257+
If heads [N, M, P] form the scaffold, the orchestrator now knows:
258+
"To add structured reasoning to a model, these attention heads must shift."
259+
That's a structural recipe, not a training recipe.
260+
261+
### For Ada
262+
The reasoning scaffold IS a concept node in the Sigma graph:
263+
```
264+
Σ.claude_reasoning_scaffold = {
265+
heads: [discovered blocks],
266+
pattern: Q_shift + O_shift + K_stable,
267+
truth: revised(all_diffs),
268+
scale_invariant: [subset],
269+
source: "Qwen3.5 → Claude-4.6-Opus distillation"
270+
}
271+
```
272+
273+
### Cross-reference with Maverick (future)
274+
Maverick's gate topology (expert routing) + Qwen's attention scaffold
275+
(token routing) = the complete picture of "reasoning = routing" at both
276+
MoE and attention granularity.
277+
278+
## CRITICAL CONSTRAINTS
279+
280+
1. Use SAFETENSORS path (BF16 precision), NOT GGUF Q8_0
281+
2. Match shards by index when diffing (shard 1 vs shard 1)
282+
3. Tensor names must match across models — verify with first shard
283+
4. threshold=100 is a starting point — may need tuning based on L1 distribution
284+
5. Qwen3.5 is DENSE (no MoE). Gate projections won't appear.
285+
All signal is in attention Q/K/V/O and FFN gate/up/down.
286+
6. Do NOT modify existing production code — only add test functions
287+
288+
## RUN COMMANDS
289+
290+
```bash
291+
# Step 1: Index all 5 models (parallelizable across machines)
292+
cargo test test_index_qwen35_27b_base --release -- --ignored --nocapture
293+
cargo test test_index_qwen35_27b_v1 --release -- --ignored --nocapture
294+
cargo test test_index_qwen35_27b_v2 --release -- --ignored --nocapture
295+
cargo test test_index_qwen35_9b_base --release -- --ignored --nocapture
296+
cargo test test_index_qwen35_9b_dist --release -- --ignored --nocapture
297+
298+
# Step 2: Run all diffs + NARS revision + scaffold detection
299+
cargo test test_qwen35_claude_reasoning_diff --release -- --ignored --nocapture
300+
301+
# Step 3: Write results
302+
# (integrated into step 2 test function)
303+
```
304+
305+
Expected total time: ~4 hours indexing + seconds diffing.
306+
Expected total output: ~50 MB bgz7 files + ~100 KB diff results.

0 commit comments

Comments
 (0)