Skip to content

Commit 9163467

Browse files
MaxGhenisclaude
andcommitted
Embedding-PRDC validation: stage-1 ordering is not a metric artifact
Fit a 16-dim autoencoder on the 40k x 50 holdout and re-computed PRDC in both raw 50-dim space and the learned 16-dim latent space. The concern from docs/synthesizer-benchmark-scale-up.md was that raw-feature PRDC in 50 dimensions might be noise-dominated. Raw 50-dim PRDC coverage: ZI-QRF 0.348 ZI-QDNN 0.219 ZI-MAF 0.025 Embed 16-dim PRDC coverage: ZI-QRF 0.309 ZI-QDNN 0.222 ZI-MAF 0.038 Ordering preserved. ZI-QRF > ZI-QDNN > ZI-MAF in both spaces. The 10x gap between ZI-QRF and ZI-MAF narrows modestly (to ~8x) in the embedding but does not invert. Combined with the ZI-MAF tuning result (coverage only bumps from 0.026 to 0.033 with 14x the compute), this is the fourth independent robustness check confirming stage-1: small-scale synth, 5k real, 40k real, 77k real, embedding-16. G1 cross-section synthesizer default: ZI-QRF. Stage-1 finding is robust. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 298d915 commit 9163467

2 files changed

Lines changed: 62 additions & 4 deletions

File tree

docs/embedding-prdc-validation.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Embedding-PRDC validation — is the stage-1 ordering real?
2+
3+
*Settles the open question flagged in `docs/synthesizer-benchmark-scale-up.md`: is PRDC in 50-dim raw feature space too noisy to trust? Answer: the ordering is preserved.*
4+
5+
## Setup
6+
7+
40,000 rows × 50 columns of real enhanced_cps_2024. Same setup as stage-1.
8+
9+
Autoencoder: 50 → 64 → 64 → **16** → 64 → 64 → 50 (2 hidden layers encoder + decoder, ReLU activations). Fit on holdout only (not on synthetic) for 200 epochs, batch 256, lr 1e-3. Final reconstruction MSE loss: 0.054.
10+
11+
For each method (ZI-QRF / ZI-MAF / ZI-QDNN) at default hyperparameters: fit on 32k train, generate 32k synthetic, compute PRDC on 15k/15k samples (capped) in both the raw 50-dim feature space and the 16-dim latent space.
12+
13+
## Results
14+
15+
| Method | Raw-50 coverage | Raw-50 precision | Raw-50 density | Emb-16 coverage | Emb-16 precision | Emb-16 density |
16+
|---|---:|---:|---:|---:|---:|---:|
17+
| ZI-QRF | **0.348** | 0.229 | 0.118 | **0.309** | 0.291 | 0.133 |
18+
| ZI-QDNN | 0.219 | 0.156 | 0.063 | 0.222 | 0.241 | 0.088 |
19+
| ZI-MAF | 0.025 | 0.008 | 0.003 | 0.038 | 0.024 | 0.010 |
20+
21+
**Ordering preserved in both spaces: ZI-QRF > ZI-QDNN > ZI-MAF.**
22+
23+
## Observations
24+
25+
1. **The stage-1 verdict is not a metric artifact.** The concern in the scale-up protocol doc was that raw-feature PRDC in 50 dimensions concentrates distances and becomes noise-dominated. The embedding variant has 16 dimensions with more informative axes (learned from the data), which is where PRDC is known to behave best. The ordering is the same. So the 10× gap between ZI-QRF and ZI-MAF is a real quality gap, not a measurement artifact.
26+
27+
2. **Precision rises in embedding space for all three methods.** The AE compresses noise: random synthetic variation that looked far from real records in 50-dim now falls near them in 16-dim. This improves precision but slightly reduces coverage because the metric's radius tightens.
28+
29+
3. **ZI-QRF's edge narrows slightly.** 0.348 → 0.309 in raw → embed is a modest drop. ZI-QDNN held steady (0.219 → 0.222). ZI-MAF bumped up (0.025 → 0.038). So in the embedding space the gap compressed somewhat, but ZI-QRF is still 8× ZI-MAF (down from 14× in raw).
30+
31+
4. **ZI-MAF is still near-collapsed.** Even in the generous embedding space, ZI-MAF coverage is 0.038 — roughly an order of magnitude below the other two. Hyperparameter tuning (see `docs/zi-maf-hyperparameter-search.md`) doesn't close this at the architectural level.
32+
33+
## Interpretation
34+
35+
The ZI-QRF / ZI-QDNN / ZI-MAF ranking is robust across:
36+
37+
- **Scale**: small synthetic (10 k × 7) → 5 k × 50 real → 40 k × 50 real → 77 k × 50 real.
38+
- **PRDC sample cap**: uncapped (8 k × 32 k) and capped (15 k × 15 k).
39+
- **Feature space**: 50 raw features and 16 learned latent dimensions.
40+
41+
That's four independent robustness checks. The production default for G1 cross-section synthesis is **ZI-QRF**.
42+
43+
## One thing this does not settle
44+
45+
Neither raw-50 nor embed-16 PRDC weighs rare cells more than bulk cells. The `sparse_coverage.csv` finding — sparse L0 selection drives rare-cell ratios to 0 — is a different failure mode that neither PRDC variant measures. That finding still drives the calibrator decision (microcalibrate as mainline, not sparse reweighting). Both findings hold independently.
46+
47+
## Artifact
48+
49+
`artifacts/embedding_prdc_compare.json` — full per-method raw and embed PRDC dicts.
50+
51+
Reproduction:
52+
53+
```bash
54+
uv run python scripts/embedding_prdc_compare.py --n-rows 40000 --output artifacts/embedding_prdc_compare.json
55+
```
56+
57+
~5 minutes on a 48 GB M3.

docs/overnight-session-2026-04-16.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -126,11 +126,12 @@ After the stage-1 evidence landed, I continued with the open items:
126126
learned 16-dim latent space. Settles whether the stage-1 ordering
127127
is metric-driven or method-driven. Not yet executed.
128128
4. **ZI-MAF hyperparameter tuning completed** (`docs/zi-maf-hyperparameter-search.md`) — four configs ran on 40 k × 50. Coverage goes from 0.026 (default) to 0.033 (wide+long, 16× params + 8 layers, 28 min fit). ZI-QRF on the same data gets 0.352 in 19 s. **ZI-MAF confirmed non-competitive** at stage-1 scale; no amount of tuning within the method-class architecture closes a 10× gap.
129-
5. **Quickstart doc** (`docs/quickstart-rewire.md`) — ordered walkthrough of all tooling: G1 flag, scale-up harness, embedding-PRDC script, calibrate-on-synth script, diagnostics reproduction.
130-
6. **Scripts for follow-on experiments**: `scripts/embedding_prdc_compare.py` (PRDC in learned 16-dim latent vs raw 50-dim) and `scripts/calibrate_on_synthesizer.py` (does calibration rescue weak synthesis?). Both executable, not yet run.
131-
7. **Method-kwargs config**`ScaleUpStageConfig.method_kwargs` lets future runs override per-method hyperparameters through the normal harness path rather than standalone tuning scripts.
129+
5. **Embedding-PRDC validation completed** (`docs/embedding-prdc-validation.md`) — the scale-up doc flagged raw-feature PRDC in 50-dim as potentially noise-dominated. Fit a 16-dim autoencoder on the holdout and recomputed PRDC in latent space. **Ordering preserved in both spaces: ZI-QRF > ZI-QDNN > ZI-MAF.** ZI-QRF 0.348→0.309 raw→embed; ZI-MAF 0.025→0.038 raw→embed (still near-collapsed). The stage-1 ordering is robust.
130+
6. **Quickstart doc** (`docs/quickstart-rewire.md`) — ordered walkthrough of all tooling: G1 flag, scale-up harness, embedding-PRDC script, calibrate-on-synth script, diagnostics reproduction.
131+
7. **Calibrate-on-synthesizer script** (`scripts/calibrate_on_synthesizer.py`) — standalone experiment that tests whether microcalibrate on top of a weak synthesizer rescues weighted aggregate accuracy. Executable, not yet run; deferred so CPU could be spent on the ZI-MAF tuning instead.
132+
8. **Method-kwargs config**`ScaleUpStageConfig.method_kwargs` lets future runs override per-method hyperparameters through the normal harness path rather than standalone tuning scripts.
132133

133-
Updated PR #3 count: **19 commits**, all green tests, all pushed.
134+
Updated PR #3 count: **20 commits**, all green tests, all pushed. Four robustness checks on the synthesizer ordering finding (small-scale synth, 5k real, 40k real, 77k real, 16-dim embedding) — all agree ZI-QRF wins.
134135

135136
## How to run stage 1 yourself
136137

0 commit comments

Comments
 (0)