Skip to content
This repository was archived by the owner on Jun 14, 2026. It is now read-only.

Commit af62615

Browse files
MaxGhenisclaude
andcommitted
Add ScaleUpRunner harness for synthesizer scale-up benchmark
Implements the stage-1/2/3 protocol from docs/synthesizer-benchmark-scale-up.md as a real runnable harness. Components: - src/microplex_us/bakeoff/scale_up.py * ScaleUpStageConfig: frozen dataclass with curated 50-column default (14 demographics + 36 income/wealth/benefit targets) * ScaleUpRunner: load_frame, split, fit_and_generate, run * _load_enhanced_cps: entity-aware loader that broadcasts household / SPM-unit / tax-unit / family / marital-unit variables down to person level via person_<entity>_id -> <entity>_id lookups * Per-method metrics: PRDC precision/density/coverage (via prdc library), wall time, peak RSS, rare-cell preservation ratios (elderly self-employed, young dividend, disabled SSDI, top-1 % employment), zero-rate MAE * CLI: python -m microplex_us.bakeoff.scale_up --stage stage1 ... * Stage configs: stage1 (~77k from ECPS), stage2 (1M, needs larger source), stage3 (v6 seed-ready 3.4M x 155) - tests/bakeoff/test_scale_up.py * Smoke tests on a 500-row, 5-column, ZI-QRF-only slice * Entity-broadcast verification via real ECPS loading * Column-missing error path * Default column-set sanity check Notable limitations recorded for follow-up: - state_fips / snap_reported / net_worth / housing_assistance and other non-person entity variables are now correctly broadcast to person level via ID lookup. This was the blocker for a flat DataFrame. - enhanced_cps_2024 has 77k persons, not the 100k stage-1 target. n_rows=None now uses all available. - is_household_head is not in ECPS; replaced with is_separated. Not in this commit (deliberate): - No execution of stage1 / stage2 / stage3 runs yet - No CTGAN / TVAE support (present in registry, not in default method set) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent a408fb4 commit af62615

4 files changed

Lines changed: 861 additions & 0 deletions

File tree

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
"""Scale-up benchmark harness for synthesizer comparison.
2+
3+
Implements the stage-1/2/3 scale-up protocol from
4+
`docs/synthesizer-benchmark-scale-up.md`: load real enhanced_cps_2024,
5+
sub-sample to the stage's row count, fit each specified synthesizer on the
6+
conditioning + target column set, and report PRDC coverage, training wall
7+
time, peak RSS, and rare-cell preservation.
8+
9+
Use from the CLI:
10+
11+
uv run python -m microplex_us.bakeoff.scale_up \\
12+
--stage stage1 \\
13+
--methods ZI-QRF ZI-MAF ZI-QDNN \\
14+
--output artifacts/scale_up_stage1.json
15+
16+
or programmatically:
17+
18+
from microplex_us.bakeoff import ScaleUpRunner, stage1_config
19+
runner = ScaleUpRunner(stage1_config())
20+
results = runner.run()
21+
"""
22+
23+
from microplex_us.bakeoff.scale_up import (
24+
ScaleUpResult,
25+
ScaleUpRunner,
26+
ScaleUpStageConfig,
27+
DEFAULT_CONDITION_COLS,
28+
DEFAULT_TARGET_COLS,
29+
stage1_config,
30+
stage2_config,
31+
stage3_config,
32+
)
33+
34+
__all__ = [
35+
"ScaleUpResult",
36+
"ScaleUpRunner",
37+
"ScaleUpStageConfig",
38+
"DEFAULT_CONDITION_COLS",
39+
"DEFAULT_TARGET_COLS",
40+
"stage1_config",
41+
"stage2_config",
42+
"stage3_config",
43+
]

0 commit comments

Comments
 (0)