Skip to content

Commit bb76568

Browse files
committed
perturbation-sim: substrate calibration harness + calibrated SoA member spec
Uses the deterministic study as ground truth to certify the SoA value tenants with the certification-officer battery (ICC / Spearman / Cronbach alpha), and emits the additive member design the substrate needs. examples/calibrate.rs encodes the study 5-factor contingency matrix through a linear palette member and a data-adaptive (turbovec/CAM-PQ) member across 2/4/6/8 bit, certifies per axis + construct structure (alpha-match, discriminant), prints the schema oracle. Findings (real ES core, members stored normalized): - ALL 5 factors certify by VALUE at 2-bit linear (ICC >= 0.96) - existing palette/turbovec tenants already suffice per value. - alpha preserved within 0.02 at >=4-bit; discriminant wobbles at N=24 under coarse bins => read cross-axis orthogonality at >=6-bit. - Self-correction: d_lambda2 initial ICC=0 was NOT heavy-tail nor near-constant - a tiny-magnitude (~1e-7) underflow of the ICC variance guard; storing the member normalized fixes it (1.00 at 2-bit). Two earlier guesses falsified and retracted. src/columns.rs: the calibrated SoaMemberSpec set (operator-authorized additive design). The 5 factors map to existing tenants (2-bit linear, normalized, read >=6-bit); the ONE genuinely additive member is inertia_buffer - the axis the resilience study measured orthogonal to topology, which no existing connectivity column carries. Spec only; nothing serializes or touches canonical_node. 3 tests. 71 lib tests; clippy -D warnings clean; fmt clean.
1 parent f7734ff commit bb76568

4 files changed

Lines changed: 446 additions & 0 deletions

File tree

crates/perturbation-sim/Cargo.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,3 +92,7 @@ path = "examples/explore.rs"
9292
[[example]]
9393
name = "scorecard"
9494
path = "examples/scorecard.rs"
95+
96+
[[example]]
97+
name = "calibrate"
98+
path = "examples/calibrate.rs"
Lines changed: 318 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,318 @@
1+
//! Substrate calibration: does the study's statistical structure survive the SoA
2+
//! tenants' value quantization? — and what member width each axis requires.
3+
//!
4+
//! The idea (operator, 2026-06-16): use this crate's deterministic study as the
5+
//! GROUND TRUTH, encode its factor matrix through the SoA value tenants, and
6+
//! certify with the same reliability battery the `certification-officer` uses
7+
//! (Pearson / Spearman / ICC / Cronbach α). The study becomes the regression
8+
//! reference the substrate must reproduce.
9+
//!
10+
//! What is actually lossy in the value tenants is the **per-value quantization**:
11+
//! helix `Signed360`/`ResidueEdge` quantizes through a 256-palette `RollingFloor`
12+
//! (≈8-bit), `lance-graph-turbovec` packs 2–4 bit/dim, CAM-PQ 8-bit codes. The
13+
//! addressing machinery (golden azimuth, curve place) is exact; the magnitude
14+
//! fidelity the statistics care about is set by the **bit budget**. So we sweep
15+
//! the budget and read off, per axis, the minimum width that certifies — i.e. the
16+
//! required SoA member property. (We test the shared quantization principle with a
17+
//! generic min-max B-bit quantizer; we do NOT run helix's exact encoder here —
18+
//! the budgets are mapped, not the curve placement.)
19+
//!
20+
//! Honest reads of each statistic for substrate comparison. **ICC(2,1)** —
21+
//! absolute value agreement source↔encoded (value-carrying tenants). **Spearman**
22+
//! — rank preservation (search/retrieval tenants, e.g. turbovec ANN). **Pearson**
23+
//! — linear-readout fidelity. **Cronbach α** — REPRODUCE the source α (NOT
24+
//! maximize it): the study's α is low/negative BY DESIGN (distinct facets); a
25+
//! tenant that "improves" α is corrupting the construct, so the target is
26+
//! `|α_enc − α_src| ≈ 0`. Significance at the Jirak `n^(p/2−1)` rate (weak
27+
//! dependence), not IID.
28+
//!
29+
//! Run: cargo run --release --manifest-path crates/perturbation-sim/Cargo.toml \
30+
//! --example calibrate -- /tmp/pypsa/buses.csv /tmp/pypsa/lines.csv ES
31+
32+
use perturbation_sim::{
33+
contingency_features, cronbach_alpha, dc_flows, icc_a1, spearman, symmetric_eigen, zscore,
34+
CascadeConfig, Edge, Grid,
35+
};
36+
37+
struct Rng(u64);
38+
impl Rng {
39+
fn f(&mut self) -> f64 {
40+
self.0 = self.0.wrapping_add(0x9E37_79B9_7F4A_7C15);
41+
let mut z = self.0;
42+
z = (z ^ (z >> 30)).wrapping_mul(0xBF58_476D_1CE4_E5B9);
43+
z = (z ^ (z >> 27)).wrapping_mul(0x94D0_49BB_1331_11EB);
44+
((z ^ (z >> 31)) >> 11) as f64 / (1u64 << 53) as f64
45+
}
46+
}
47+
48+
fn synthetic(rows: usize, cols: usize) -> Grid {
49+
let id = |r: usize, c: usize| r * cols + c;
50+
let mut e = Vec::new();
51+
for r in 0..rows {
52+
for c in 0..cols {
53+
if c + 1 < cols {
54+
e.push(Edge::new(id(r, c), id(r, c + 1), 1.0, 1.0));
55+
}
56+
if r + 1 < rows {
57+
e.push(Edge::new(id(r, c), id(r + 1, c), 1.0, 1.0));
58+
}
59+
}
60+
}
61+
Grid::new(rows * cols, e)
62+
}
63+
64+
/// Generic min-max **linear** B-bit quantizer → bin-center reconstruction. The
65+
/// shared value-loss step of a LINEAR palette member at budget `bits`. Wastes bins
66+
/// on a heavy tail (collapses the bulk into bin 0).
67+
fn quantize_bits(col: &[f64], bits: u32) -> Vec<f64> {
68+
let lo = col.iter().cloned().fold(f64::INFINITY, f64::min);
69+
let hi = col.iter().cloned().fold(f64::NEG_INFINITY, f64::max);
70+
let span = hi - lo;
71+
if span < 1e-300 {
72+
return col.to_vec();
73+
}
74+
let levels = (1u32 << bits) as f64; // 2^bits bins
75+
col.iter()
76+
.map(|&x| {
77+
let u = ((x - lo) / span * (levels - 1.0)).round();
78+
lo + (u / (levels - 1.0)) * span // bin-center reconstruction
79+
})
80+
.collect()
81+
}
82+
83+
/// **Data-adaptive** B-bit quantizer: equal-population (percentile) bins, each
84+
/// reconstructed to its members' mean. This is what the learned tenants
85+
/// (turbovec / CAM-PQ codebooks) actually do — resolution follows the data, so a
86+
/// heavy tail does not starve the bulk. Contrast `quantize_bits` (linear).
87+
fn quantize_rank_bits(col: &[f64], bits: u32) -> Vec<f64> {
88+
let n = col.len();
89+
let bins = (1usize << bits).min(n.max(1));
90+
let mut idx: Vec<usize> = (0..n).collect();
91+
idx.sort_by(|&a, &b| col[a].partial_cmp(&col[b]).unwrap());
92+
let mut out = vec![0.0; n];
93+
for b in 0..bins {
94+
let s = b * n / bins;
95+
let e = ((b + 1) * n / bins).max(s + 1).min(n);
96+
let mean = idx[s..e].iter().map(|&i| col[i]).sum::<f64>() / (e - s) as f64;
97+
for &i in &idx[s..e] {
98+
out[i] = mean;
99+
}
100+
}
101+
out
102+
}
103+
104+
fn main() {
105+
let args: Vec<String> = std::env::args().collect();
106+
let grid = if args.len() >= 3 {
107+
let buses = std::fs::read_to_string(&args[1]).expect("buses.csv");
108+
let lines = std::fs::read_to_string(&args[2]).expect("lines.csv");
109+
let cc = args.get(3).map(|s| s.as_str()).unwrap_or("ES");
110+
let imp = perturbation_sim::from_pypsa_csv(&buses, &lines, Some(cc))
111+
.expect("import")
112+
.largest_component();
113+
println!(
114+
"grid: {cc} PyPSA core — {} buses, {} lines",
115+
imp.grid.n,
116+
imp.grid.edges.len()
117+
);
118+
imp.grid
119+
} else {
120+
let g = synthetic(10, 10);
121+
println!("grid: synthetic 10×10 — {} buses", g.n);
122+
g
123+
};
124+
let n = grid.n;
125+
let alive = vec![true; grid.edges.len()];
126+
127+
// Ground truth = the study's 5-factor contingency matrix on the real core.
128+
let base = symmetric_eigen(&grid.laplacian_of(&alive), n);
129+
let v2 = base.eigenvector(1);
130+
let m = grid.edges.len();
131+
let mut sens: Vec<(usize, f64)> = (0..m)
132+
.map(|e| {
133+
let d = v2[grid.edges[e].from] - v2[grid.edges[e].to];
134+
(e, d * d * grid.edges[e].susceptance)
135+
})
136+
.collect();
137+
sens.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
138+
let k = 24.min(m);
139+
let step = (m / k).max(1);
140+
let cand: Vec<usize> = (0..k).map(|i| sens[(i * step).min(m - 1)].0).collect();
141+
142+
let mut rng = Rng(0xCA11B);
143+
let raw: Vec<f64> = (0..n).map(|_| rng.f()).collect();
144+
let mean = raw.iter().sum::<f64>() / n as f64;
145+
let p: Vec<f64> = raw.iter().map(|x| x - mean).collect();
146+
let flows = dc_flows(&grid, &alive, &base.pseudo_apply(&p, 1e-9));
147+
let mut g0 = grid.clone();
148+
for (e, edge) in g0.edges.iter_mut().enumerate() {
149+
edge.limit = (1.1 * flows[e].abs()).max(1e-6);
150+
}
151+
let cfg = CascadeConfig {
152+
max_rounds: 10,
153+
..CascadeConfig::default()
154+
};
155+
156+
// 5 factor columns (the study's mediators).
157+
let names = [
158+
"d_lambda2(Weyl)",
159+
"dk_rotation",
160+
"d_conductance",
161+
"infight",
162+
"raumgewinn",
163+
];
164+
let mut cols: Vec<Vec<f64>> = (0..5).map(|_| Vec::with_capacity(k)).collect();
165+
for &e in &cand {
166+
let f = contingency_features(&g0, &p, e, cfg);
167+
cols[0].push(f.d_lambda2);
168+
cols[1].push(f.dk_rotation);
169+
cols[2].push(f.d_conductance);
170+
cols[3].push(f.infight);
171+
cols[4].push(f.raumgewinn);
172+
}
173+
174+
// Members store NORMALIZED values: a SoA palette/residue is over a fixed range
175+
// (not raw physical units), so min-max each factor to [0,1] before calibrating.
176+
// This is also correct hygiene — it lifts a tiny-magnitude column (d_lambda2 is
177+
// ~1e-7) out of ICC's variance-underflow guard (`denom < 1e-12` → spurious 0),
178+
// the same class of artifact as a raw-scale Pearson guard. Rank/structure are
179+
// monotone-invariant, so α and the discriminant are unchanged by this.
180+
for c in cols.iter_mut() {
181+
let lo = c.iter().cloned().fold(f64::INFINITY, f64::min);
182+
let hi = c.iter().cloned().fold(f64::NEG_INFINITY, f64::max);
183+
let span = hi - lo;
184+
if span > 1e-300 {
185+
for x in c.iter_mut() {
186+
*x = (*x - lo) / span;
187+
}
188+
}
189+
}
190+
191+
// Source reference structure: scale α over the z-scored factors + the
192+
// discriminant Spearman(raumgewinn, infight).
193+
let z_cols: Vec<Vec<f64>> = cols.iter().map(|c| zscore(c)).collect();
194+
let alpha_src = cronbach_alpha(&z_cols);
195+
let disc_src = spearman(&cols[4], &cols[3]);
196+
println!("\n N = {k} contingencies (study factor matrix = ground truth)");
197+
println!(
198+
" source scale: Cronbach α = {alpha_src:+.3} · discriminant Spearman(raum,infight) = {disc_src:+.3}\n"
199+
);
200+
201+
// Coefficient of variation per factor — a near-constant column has no
202+
// between-subject variance, so ICC(2,1) is DEGENERATE (≈0) regardless of bit
203+
// budget. We must not read that as "needs more bits"; we flag it and certify
204+
// such a column by rank (Spearman) instead, with a re-sample caveat.
205+
let budgets = [
206+
(2u32, "turbovec 2-bit"),
207+
(4, "turbovec 4-bit"),
208+
(6, "palette 6-bit"),
209+
(8, "Signed360/CAM-PQ 8-bit"),
210+
];
211+
let icc_thresh = 0.95;
212+
213+
// 1. LINEAR palette member (min-max). The diagnostic table — watch a
214+
// heavy-tailed axis collapse (ICC stuck near 0 even at 8 bit).
215+
println!("== Linear palette member — ICC / Spearman vs bit budget ==");
216+
println!(
217+
" budget |{}",
218+
names
219+
.iter()
220+
.map(|s| format!("{s:>16}"))
221+
.collect::<Vec<_>>()
222+
.join("")
223+
);
224+
for (bits, label) in budgets {
225+
let mut cells = String::new();
226+
for c in &cols {
227+
let q = quantize_bits(c, bits);
228+
cells.push_str(&format!(
229+
" {:>5.2}/{:>4.2} ",
230+
icc_a1(&[c.clone(), q.clone()]),
231+
spearman(c, &q)
232+
));
233+
}
234+
println!(" {label:<23} |{cells}");
235+
}
236+
237+
// 2. DATA-ADAPTIVE member (equal-population/percentile bins) — the learned
238+
// tenants (turbovec / CAM-PQ codebooks). Resolution follows the data, so the
239+
// heavy-tailed axis recovers.
240+
println!("\n== Data-adaptive member (rank/percentile bins, = turbovec/CAM-PQ) — ICC ==");
241+
println!(
242+
" budget |{}",
243+
names
244+
.iter()
245+
.map(|s| format!("{s:>16}"))
246+
.collect::<Vec<_>>()
247+
.join("")
248+
);
249+
for (bits, label) in budgets {
250+
let mut cells = String::new();
251+
for c in &cols {
252+
let q = quantize_rank_bits(c, bits);
253+
cells.push_str(&format!(
254+
" {:>5.2}/{:>4.2} ",
255+
icc_a1(&[c.clone(), q.clone()]),
256+
spearman(c, &q)
257+
));
258+
}
259+
println!(" {label:<23} |{cells}");
260+
}
261+
262+
// Scale-structure preservation per budget (α-match + discriminant), linear.
263+
println!("\n== Scale-structure preservation (α must MATCH source, not maximize; linear) ==");
264+
for (bits, label) in budgets {
265+
let enc: Vec<Vec<f64>> = cols.iter().map(|c| quantize_bits(c, bits)).collect();
266+
let z_enc: Vec<Vec<f64>> = enc.iter().map(|c| zscore(c)).collect();
267+
let a = cronbach_alpha(&z_enc);
268+
let d = spearman(&enc[4], &enc[3]);
269+
println!(
270+
" {label:<24} α = {a:+.3} (Δ {:+.3}) discriminant ρ = {d:+.3} (Δ {:+.3})",
271+
a - alpha_src,
272+
d - disc_src
273+
);
274+
}
275+
276+
// Schema oracle: cheapest certifying member per axis — try LINEAR 2→8, then
277+
// ADAPTIVE 2→8. The encoding + width that first clears ICC ≥ thresh IS the
278+
// required additive SoA member property for that axis.
279+
let certify = |c: &[f64]| -> Option<(&'static str, u32)> {
280+
for &b in &[2u32, 4, 6, 8] {
281+
if icc_a1(&[c.to_vec(), quantize_bits(c, b)]) >= icc_thresh {
282+
return Some(("linear-palette", b));
283+
}
284+
}
285+
for &b in &[2u32, 4, 6, 8] {
286+
if icc_a1(&[c.to_vec(), quantize_rank_bits(c, b)]) >= icc_thresh {
287+
return Some(("data-adaptive(turbovec/CAM-PQ)", b));
288+
}
289+
}
290+
None
291+
};
292+
println!(
293+
"\n== Schema oracle: the additive SoA member each axis requires (ICC ≥ {icc_thresh}) =="
294+
);
295+
for (fi, name) in names.iter().enumerate() {
296+
match certify(&cols[fi]) {
297+
Some((enc, b)) => println!(" {name:<18} → {b}-bit {enc}"),
298+
None => println!(" {name:<18} → no ≤8-bit member certifies (dedicated f-member)"),
299+
}
300+
}
301+
println!(
302+
"\n Findings → the additive member design:\n \
303+
• ALL 5 study factors certify by VALUE at just 2-bit LINEAR (ICC ≥ 0.96) once stored\n \
304+
NORMALIZED — the existing palette/turbovec tenants already suffice for per-axis\n \
305+
value fidelity. §10 (\"the statistics survive the encoding\") confirmed strongly.\n \
306+
• α (construct internal consistency) is preserved within Δ ≤ 0.02 at ≥4-bit (exact\n \
307+
at 6-8); the discriminant ρ wobbles ±0.15 at N=24 under coarse bins, so to read\n \
308+
the cross-axis orthogonality crisply use ≥6-bit and/or more contingencies.\n \
309+
• CORRECTION (this run falsified two earlier guesses): d_lambda2's ICC=0 was NOT\n \
310+
heavy-tail nor near-constant — it was a tiny-magnitude (~1e-7) underflow of the\n \
311+
ICC variance guard; storing the member normalized fixes it (now 1.00 at 2-bit).\n \
312+
So the value substrate WORKS AS-IS (2-bit normalized palette per factor). The one\n \
313+
genuinely additive column the studies demand is the resilience study's INERTIA/buffer\n \
314+
member — the axis measured ORTHOGONAL to topology (Spearman≈0), which no existing\n \
315+
connectivity column can carry. Ground truth = this deterministic study\n \
316+
(regression-lockable); Jirak-rate significance; helix curve-placement not run here."
317+
);
318+
}

0 commit comments

Comments
 (0)