Skip to content

Commit 23c6703

Browse files
MaxGhenisclaude
andcommitted
Rerun embedding-PRDC and calibrate-on-synth with post-snap microplex
B1 from paper/REVIEW-RESPONSE.md: both scripts predated the upstream shared-col noise fix (Apr 17 08:03-08:06 vs snap commits at 12:06/12:20). With microplex installed editable from the repaired upstream sibling, rerunning both scripts now exercises the fixed generate() method. embedding-PRDC (40k x 50 real ECPS, AE latent dim 16): raw-50 embed-16 ZI-QRF 0.348 -> 0.982 0.309 -> 0.984 (post-snap) ZI-QDNN 0.219 -> 0.791 0.222 -> 0.819 ZI-MAF 0.025 -> 0.183 0.038 -> 0.201 Ordering preserved in both spaces; absolute PRDC coverage rises substantially for every method because noise on binary/categorical conditioning variables is no longer forcing synthetic values off the training support. ZI-QRF is near-ceiling (0.98+) in both spaces. calibrate-on-synth (20k x 50, 500 epochs microcalibrate): ZI-QRF pre 0.317 -> post 0.105 ZI-QDNN pre 0.386 -> post 0.251 ZI-MAF pre 17.51 -> post 11.86 Bumped from 200 to 500 epochs per reviewer's convergence concern. Ordering unchanged. ZI-MAF still ~100x worse than ZI-QDNN post-cal, consistent with the "calibration cannot rescue broken synthesis" story. Pre-snap artifacts preserved as artifacts/*.pre-snap.json for audit trail. Docs (embedding-prdc-validation.md, calibrate-on-synthesizer-result.md) and paper/index.qmd §5.4 updated with post-snap numbers. Pre-snap numbers kept inline as archived comparison for transparency. Note: artifacts/ is .gitignore'd so the JSON files live on disk but not in the repo. Log files also gitignore'd. This is intentional per the repo's earlier cleanup; result tables in docs and paper are the canonical record. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent fa959d3 commit 23c6703

3 files changed

Lines changed: 36 additions & 24 deletions

File tree

docs/calibrate-on-synthesizer-result.md

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,15 +12,17 @@
1212
4. Run `MicrocalibrateAdapter.fit_transform` with 200 epochs, lr 1e-3.
1313
5. Report mean relative error across target columns before and after calibration.
1414

15-
## Results
15+
## Results (post-snap-fix rerun with 500 epochs, 2026-04-17 21:17)
1616

1717
| Method | Pre-cal mean rel err | Post-cal mean rel err | Max post-cal err | Cal time |
1818
|---|---:|---:|---:|---:|
19-
| **ZI-QRF** | 0.256 | **0.141** | 1.000 | 1.2 s |
20-
| ZI-QDNN | 0.388 | 0.327 | 1.003 | 0.2 s |
21-
| ZI-MAF | 17.98 | 15.08 | 214.5 | 0.2 s |
19+
| **ZI-QRF** | 0.317 | **0.105** | 1.000 | 1.1 s |
20+
| ZI-QDNN | 0.386 | 0.251 | 1.002 | 0.6 s |
21+
| ZI-MAF | 17.51 | 11.86 | 168.3 | 0.6 s |
2222

23-
Reading: after calibration, ZI-QRF's weighted synthetic aggregates are within 14 % of the holdout targets on average. ZI-QDNN is at 33 %. ZI-MAF is at **1,508 %** — the synthetic output is so far off the target scale that calibration can't pull it back, even with 200 epochs of gradient descent.
23+
Reading: after calibration, ZI-QRF's weighted synthetic aggregates are within 10.5 % of the holdout targets on average. ZI-QDNN is at 25.1 %. ZI-MAF is at **1,186 %** — the synthetic output is so far off target scale that calibration can't pull it back, even with 500 epochs of gradient descent.
24+
25+
Pre-snap numbers at 200 epochs (archived as `artifacts/calibrate_on_synthesizer.pre-snap.json`) gave ZI-QRF post-cal 0.141, ZI-QDNN 0.327, ZI-MAF 15.08. The bump to 500 epochs + the snap fix both help; ordering and qualitative conclusion are unchanged.
2426

2527
## What this tells us
2628

@@ -38,17 +40,17 @@ Reading: after calibration, ZI-QRF's weighted synthetic aggregates are within 14
3840

3941
At production scale (1.5 M records × 1255 constraints), the per-epoch step is cheaper per-record but there are vastly more records to move, so even 500-1000 epochs may leave some constraints unsolved. The `MicrocalibrateAdapterConfig.epochs` default of 32 is too low; the `us.py` wiring uses `max(self.config.calibration_max_iter, 32)` which pulls from the pipeline's `calibration_max_iter=100`. Reasonable starting point; tune up if convergence is still incomplete.
4042

41-
## Four-way agreement on synthesizer ordering
43+
## Four-way agreement on synthesizer ordering (post-snap-fix)
4244

43-
Combined evidence:
45+
Combined evidence with the upstream shared-col noise fix applied:
4446

4547
| Check | ZI-QRF | ZI-QDNN | ZI-MAF |
4648
|---|---|---|---|
47-
| Raw 50-d PRDC (40k) | 0.348 (winner) | 0.219 | 0.025 |
48-
| Raw 50-d PRDC (77k) | 0.256 (winner) | 0.147 | 0.014 |
49-
| Embed 16-d PRDC (40k) | 0.309 (winner) | 0.222 | 0.038 |
50-
| ZI-MAF tuned (wide+long, 40k) ||| 0.033 |
51-
| Calibrate-on-synth mean err (20k) | 0.14 (winner) | 0.33 | 15.08 |
49+
| Raw 50-d PRDC at 40k (snap) | 0.979 (winner) | 0.796 | 0.168 |
50+
| Raw 50-d PRDC at 77k (snap) | 0.928 (winner) | 0.707 | 0.106 |
51+
| Embed 16-d PRDC at 40k (snap) | 0.984 (winner) | 0.819 | 0.201 |
52+
| ZI-MAF tuned (wide+long, 40k, pre-snap) ||| 0.033 |
53+
| Calibrate-on-synth post-cal mean err (20k, snap) | 0.105 (winner) | 0.251 | 11.86 |
5254

5355
Every axis, every scale, every metric: **ZI-QRF > ZI-QDNN > ZI-MAF**.
5456

docs/embedding-prdc-validation.md

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,25 +10,35 @@ Autoencoder: 50 → 64 → 64 → **16** → 64 → 64 → 50 (2 hidden layers e
1010

1111
For each method (ZI-QRF / ZI-MAF / ZI-QDNN) at default hyperparameters: fit on 32k train, generate 32k synthetic, compute PRDC on 15k/15k samples (capped) in both the raw 50-dim feature space and the 16-dim latent space.
1212

13-
## Results
13+
## Results (post-snap-fix rerun 2026-04-17 21:12)
1414

1515
| Method | Raw-50 coverage | Raw-50 precision | Raw-50 density | Emb-16 coverage | Emb-16 precision | Emb-16 density |
1616
|---|---:|---:|---:|---:|---:|---:|
17-
| ZI-QRF | **0.348** | 0.229 | 0.118 | **0.309** | 0.291 | 0.133 |
18-
| ZI-QDNN | 0.219 | 0.156 | 0.063 | 0.222 | 0.241 | 0.088 |
19-
| ZI-MAF | 0.025 | 0.008 | 0.003 | 0.038 | 0.024 | 0.010 |
17+
| ZI-QRF | **0.982** | 0.914 | 0.908 | **0.984** | 0.943 | 0.935 |
18+
| ZI-QDNN | 0.791 | 0.847 | 0.763 | 0.819 | 0.905 | 0.802 |
19+
| ZI-MAF | 0.183 | 0.033 | 0.026 | 0.201 | 0.070 | 0.042 |
2020

2121
**Ordering preserved in both spaces: ZI-QRF > ZI-QDNN > ZI-MAF.**
2222

23+
### Pre-snap numbers (archived)
24+
25+
The original run was executed before the shared-col categorical-noise
26+
fix landed upstream. Those artifacts are preserved as
27+
`artifacts/embedding_prdc_compare.pre-snap.json` and showed much lower
28+
absolute PRDC coverages (ZI-QRF 0.348 raw / 0.309 embed), because
29+
noise-injected integer conditioning variables reduced PRDC scores
30+
uniformly across all methods. Ordering was preserved in both
31+
pre-snap and post-snap regimes; only the absolute values shift.
32+
2333
## Observations
2434

2535
1. **The stage-1 verdict is not a metric artifact.** The concern in the scale-up protocol doc was that raw-feature PRDC in 50 dimensions concentrates distances and becomes noise-dominated. The embedding variant has 16 dimensions with more informative axes (learned from the data), which is where PRDC is known to behave best. The ordering is the same. So the 10× gap between ZI-QRF and ZI-MAF is a real quality gap, not a measurement artifact.
2636

27-
2. **Precision rises in embedding space for all three methods.** The AE compresses noise: random synthetic variation that looked far from real records in 50-dim now falls near them in 16-dim. This improves precision but slightly reduces coverage because the metric's radius tightens.
37+
2. **Precision rises in embedding space for all three methods.** The AE compresses noise: random synthetic variation that looked far from real records in 50-dim now falls near them in 16-dim. This improves precision and, in the post-snap regime, slightly raises coverage too (likely because the smaller latent dimension is easier to cover).
2838

29-
3. **ZI-QRF's edge narrows slightly.** 0.348 → 0.309 in raw → embed is a modest drop. ZI-QDNN held steady (0.219 → 0.222). ZI-MAF bumped up (0.025 → 0.038). So in the embedding space the gap compressed somewhat, but ZI-QRF is still 8× ZI-MAF (down from 14× in raw).
39+
3. **ZI-QRF's edge is close to the ceiling.** 0.982 raw → 0.984 embed — already near-perfect on holdout. ZI-QDNN rises modestly (0.791 → 0.819). ZI-MAF rises from 0.183 → 0.201. The gap narrows in absolute terms (ZI-QRF / ZI-MAF ratio 5.4× raw, 4.9× embed) but the ordering is invariant.
3040

31-
4. **ZI-MAF is still near-collapsed.** Even in the generous embedding space, ZI-MAF coverage is 0.038roughly an order of magnitude below the other two. Hyperparameter tuning (see `docs/zi-maf-hyperparameter-search.md`) doesn't close this at the architectural level.
41+
4. **ZI-MAF is still structurally behind.** Even in the embedding space, ZI-MAF coverage is 0.201about a quarter of ZI-QDNN and a fifth of ZI-QRF. Hyperparameter tuning (see `docs/zi-maf-hyperparameter-search.md`) does not close this at the architectural level.
3242

3343
## Interpretation
3444

paper/index.qmd

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -105,15 +105,15 @@ Ordering is preserved across the fix; absolute numbers are meaningfully higher.
105105

106106
## Calibration on synthesizer output
107107

108-
Identity-preserving gradient-descent chi-squared calibration applied to the 36 target-column sums of each synthesizer's output, with holdout totals as targets:
108+
Identity-preserving gradient-descent chi-squared calibration applied to the 36 target-column sums of each synthesizer's output, with holdout totals as targets (500 epochs, lr 1e-3):
109109

110110
| Method | Pre-cal mean rel. err. | Post-cal mean rel. err. |
111111
|----------|-----------------------:|------------------------:|
112-
| ZI-QRF | 0.256 | 0.141 |
113-
| ZI-QDNN | 0.388 | 0.327 |
114-
| ZI-MAF | 17.98 | 15.08 |
112+
| ZI-QRF | 0.317 | 0.105 |
113+
| ZI-QDNN | 0.386 | 0.251 |
114+
| ZI-MAF | 17.51 | 11.86 |
115115

116-
Calibration refines structurally sound synthesizer output; it cannot rescue a broken one.
116+
Calibration refines structurally sound synthesizer output; it cannot rescue a broken one. ZI-MAF's post-calibration error remains over 1100 % of target scale, consistent with its raw outputs being too far off target support for weight adjustment to bridge.
117117

118118
# Discussion {#sec-discussion}
119119

0 commit comments

Comments
 (0)