Skip to content

Commit cdd0057

Browse files
committed
Add benchmark scaffold with shared-matrix and IPF conversion paths
Introduce a paper benchmarking scaffold that compares L0 and GREG on the same exported calibration matrix while routing IPF through a separate automatic preprocessing step that reconstructs IPF-ready unit and target inputs from the saved package metadata. The scaffold includes two R runners, manifest-driven bundle export, common scoring against the shared matrix, environment setup helpers, and end-to-end tests for the runner schemas.
1 parent 063b8fc commit cdd0057

15 files changed

Lines changed: 1850 additions & 1 deletion

Makefile

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.PHONY: all format test test-unit test-integration install download upload docker documentation data validate-data calibrate calibrate-build publish-local-area upload-calibration upload-dataset push-to-modal build-data-modal build-matrices calibrate-modal calibrate-modal-national calibrate-both stage-h5s stage-national-h5 stage-all-h5s pipeline validate-staging validate-staging-full upload-validation check-staging check-sanity clean build paper clean-paper presentations database database-refresh promote-dataset promote build-h5s validate-local refresh-soi-targets push-pr-branch
1+
.PHONY: all format test test-unit test-integration install download upload docker documentation data validate-data calibrate calibrate-build publish-local-area upload-calibration upload-dataset push-to-modal build-data-modal build-matrices calibrate-modal calibrate-modal-national calibrate-both stage-h5s stage-national-h5 stage-all-h5s pipeline validate-staging validate-staging-full upload-validation check-staging check-sanity clean build paper clean-paper presentations database database-refresh promote-dataset promote build-h5s validate-local refresh-soi-targets push-pr-branch benchmarking-install-python benchmarking-install-r benchmarking-export benchmarking-run-l0 benchmarking-run-greg benchmarking-run-ipf
22

33
SOI_SOURCE_YEAR ?= 2021
44
SOI_TARGET_YEAR ?= 2023
@@ -13,6 +13,8 @@ BRANCH ?= $(shell git rev-parse --abbrev-ref HEAD)
1313
NUM_WORKERS ?= 8
1414
N_CLONES ?= 430
1515
VERSION ?=
16+
MANIFEST ?=
17+
RUN_DIR ?=
1618
SOI_SOURCE_YEAR ?= 2021
1719
SOI_TARGET_YEAR ?= 2023
1820

@@ -37,6 +39,32 @@ install:
3739
pip install policyengine-us
3840
pip install -e ".[dev]" --config-settings editable_mode=compat
3941

42+
benchmarking-install-python:
43+
pip install -r paper-l0/benchmarking/requirements-python.txt
44+
45+
benchmarking-install-r:
46+
Rscript paper-l0/benchmarking/install_r_packages.R
47+
48+
benchmarking-export:
49+
python paper-l0/benchmarking/benchmark_cli.py export \
50+
--manifest $(MANIFEST) \
51+
--output-dir $(RUN_DIR)
52+
53+
benchmarking-run-l0:
54+
python paper-l0/benchmarking/benchmark_cli.py run \
55+
--method l0 \
56+
--run-dir $(RUN_DIR)
57+
58+
benchmarking-run-greg:
59+
python paper-l0/benchmarking/benchmark_cli.py run \
60+
--method greg \
61+
--run-dir $(RUN_DIR)
62+
63+
benchmarking-run-ipf:
64+
python paper-l0/benchmarking/benchmark_cli.py run \
65+
--method ipf \
66+
--run-dir $(RUN_DIR)
67+
4068
changelog:
4169
python .github/bump_version.py
4270
towncrier build --yes --version $$(python -c "import re; print(re.search(r'version = \"(.+?)\"', open('pyproject.toml').read()).group(1))")

paper-l0/benchmarking/README.md

Lines changed: 240 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,240 @@
1+
# Benchmarking Scaffold
2+
3+
This directory contains the implementation scaffold for benchmarking the
4+
`L0` calibration pipeline against:
5+
6+
- `GREG` via R's `survey` package
7+
- `IPF` via R's `surveysd` package
8+
9+
## Experimental Setup
10+
11+
The benchmark is organized around one shared exported bundle and multiple
12+
method adapters.
13+
14+
- `L0` and `GREG` are compared on the shared calibration representation:
15+
a sparse target-by-unit matrix, the selected target table, and
16+
initial .npy weights.
17+
- `IPF` is benchmarked from the same target selection, but it requires a
18+
conversion step because `surveysd::ipf` consumes a microdata table plus
19+
IPF constraints rather than a generic sparse linear system.
20+
- The intended benchmark tiers are:
21+
- a practical reduced-size comparison tier, used for like-for-like `L0`
22+
versus `GREG` runs that are small enough to execute routinely during
23+
development
24+
- an IPF-focused reduced-size tier on count-style targets, used because
25+
classical `IPF` is most naturally evaluated on count or indicator margins
26+
rather than the full arbitrary target set
27+
- a scaling ladder over increasing target counts, used to show how runtime,
28+
memory use, convergence, and outright failure change as the benchmark moves
29+
from small target subsets toward the full calibration problem
30+
- a production-feasibility tier, used to test which methods can still run at
31+
something close to the full production clone count and target volume
32+
33+
Methodologically, the benchmark treats the methods as related but not
34+
identical:
35+
36+
- `L0` and `GREG` can consume arbitrary linear calibration targets.
37+
- `IPF` is most natural for count-style or indicator-style targets, so the
38+
current automatic conversion path supports `person_count` and
39+
`household_count`.
40+
41+
The core workflow is:
42+
43+
1. select a benchmark target subset with a manifest
44+
2. export a shared benchmark bundle from a saved calibration package
45+
3. auto-convert the bundle to IPF inputs when needed
46+
4. run `L0`, `GREG`, or `IPF`
47+
5. score all fitted weights against the same shared target matrix
48+
49+
## Layout
50+
51+
- `benchmark_cli.py`
52+
Main CLI for exporting benchmark bundles and running methods.
53+
- `benchmark_manifest.py`
54+
Manifest schema and target-filter logic.
55+
- `benchmark_export.py`
56+
Export utilities for shared benchmark artifacts.
57+
- `ipf_conversion.py`
58+
Automatic conversion from the saved calibration package to IPF-ready
59+
unit and target metadata.
60+
- `benchmark_metrics.py`
61+
Common diagnostics and summary generation.
62+
- `runners/greg_runner.R`
63+
R backend for `survey`-based GREG.
64+
- `runners/ipf_runner.R`
65+
R backend for `surveysd`-based IPF.
66+
- `runners/read_npy.R`
67+
Minimal `.npy` reader used by the R scripts.
68+
- `requirements-python.txt`
69+
Python dependencies for the benchmarking scaffold.
70+
- `install_r_packages.R`
71+
Installs the required R packages for the benchmark runners.
72+
- `manifests/*.example.json`
73+
Example benchmark manifests.
74+
75+
## Environment Setup
76+
77+
Python:
78+
79+
```bash
80+
pip install -r paper-l0/benchmarking/requirements-python.txt
81+
```
82+
83+
R:
84+
85+
```bash
86+
Rscript paper-l0/benchmarking/install_r_packages.R
87+
```
88+
89+
Or, from the repo root:
90+
91+
```bash
92+
make benchmarking-install-python
93+
make benchmarking-install-r
94+
```
95+
96+
## Chosen Interchange Formats
97+
98+
- sparse matrix: Matrix Market `.mtx`
99+
- target metadata: `.csv`
100+
- unit metadata: `.csv`
101+
- initial weights: `.npy`
102+
- benchmark manifest: `.json`
103+
- method result summary: `.json`
104+
- fitted weights: `.npy`
105+
106+
## Notes
107+
108+
### Shared calibration package
109+
110+
The exporter reads the saved calibration package directly from pickle rather
111+
than importing the full calibration CLI. This keeps the benchmark I/O path
112+
lightweight.
113+
114+
### IPF inputs
115+
116+
The exporter now auto-generates IPF inputs when the manifest includes `ipf`
117+
and no external overrides are supplied. It reconstructs an IPF microdata table
118+
from:
119+
120+
- the saved calibration package
121+
- the package metadata's `dataset_path`
122+
- the package metadata's `db_path`
123+
- the selected count-like targets and their stratum constraints
124+
125+
The generated `unit_metadata.csv` is currently built for `person_count` and
126+
`household_count` targets. It expands cloned households to a person-level table
127+
when person targets are present, carries a repeated household `unit_index`, and
128+
adds one derived indicator column per selected target. The generated
129+
`ipf_target_metadata.csv` then references those indicator columns as numerical
130+
IPF totals.
131+
132+
External CSVs are still supported through `external_inputs.*` and override the
133+
automatic conversion path when provided.
134+
135+
### IPF conversion step by step
136+
137+
The IPF conversion is implemented in
138+
[ipf_conversion.py](/Users/movil1/Desktop/PYTHONJOBS/PolicyEngine/policyengine-us-data/paper-l0/benchmarking/ipf_conversion.py)
139+
and runs during `benchmark_cli.py export`.
140+
141+
1. Load the saved calibration package and apply the manifest target filters.
142+
2. Read `dataset_path`, `db_path`, and `n_clones` from the package metadata.
143+
3. Query `stratum_constraints` for the selected targets from the target DB.
144+
4. Identify the source variables needed to evaluate those constraints, such as
145+
`age`, `snap`, or `medicaid_enrolled`.
146+
5. Reconstruct the cloned household universe from `initial_weights`,
147+
`block_geoid`, and `cd_geoid`. This yields one benchmark unit per matrix
148+
column.
149+
6. If any selected IPF target is `person_count`, expand that cloned household
150+
universe to a person-level table using the source dataset's person-to-
151+
household links. Multiple person rows may therefore share the same
152+
household-clone `unit_index`.
153+
7. Calculate the needed source variables from the dataset and attach them to
154+
the IPF unit table.
155+
8. For each selected target, evaluate its original stratum logic row by row and
156+
materialize the result as a derived indicator column such as
157+
`ipf_indicator_00000`.
158+
9. Write `ipf_target_metadata.csv` so each selected target becomes a
159+
`numeric_total` IPF constraint over one of those derived indicator columns.
160+
10. Run `surveysd::ipf` on the generated unit table and target metadata.
161+
11. Collapse the fitted IPF row weights back to one weight per shared benchmark
162+
`unit_index`, so the fitted result can be scored against the same sparse
163+
calibration matrix used by `L0` and `GREG`.
164+
165+
This means the benchmark uses one common scoring space even though `IPF`
166+
requires a richer input representation than `L0` and `GREG`.
167+
168+
### Why the IPF conversion exists
169+
170+
`L0` and `GREG` can work directly with a sparse linear system of the form
171+
`X w = t`.
172+
173+
Classical `IPF` does not start from that object. It expects:
174+
175+
- a unit-record table
176+
- categorical or indicator variables on that table
177+
- target totals over those variables
178+
179+
So the benchmark exporter translates selected count-style calibration targets
180+
into that IPF-friendly representation instead of trying to feed the sparse
181+
matrix directly into `surveysd::ipf`.
182+
183+
### IPF target metadata schema
184+
185+
`ipf_runner.R` supports two target metadata encodings:
186+
187+
- `numeric_total`
188+
One row per target with:
189+
- `scope`: `person` or `household`
190+
- `target_type`: `numeric_total`
191+
- `value_column`: unit-data column to calibrate
192+
- `variables`: grouping variables used to wrap the numeric total in a one-cell
193+
or multi-cell array
194+
- `cell`: pipe-separated assignments for the target cell
195+
- `target_value`: numeric total
196+
- `categorical_margin`
197+
One row per margin cell with:
198+
- `scope`: `person` or `household`
199+
- `target_type`: `categorical_margin`
200+
- `margin_id`: identifier for a margin table
201+
- `variables`: pipe-separated variable names, e.g. `district_id|age_bin`
202+
- `cell`: pipe-separated assignments, e.g.
203+
`district_id=0601|age_bin=18_24`
204+
- `target_value`: numeric target
205+
206+
The automatic conversion path currently emits `numeric_total` rows.
207+
208+
## Example Commands
209+
210+
Export a benchmark bundle:
211+
212+
```bash
213+
python paper-l0/benchmarking/benchmark_cli.py export \
214+
--manifest paper-l0/benchmarking/manifests/greg_demo_small.example.json \
215+
--output-dir paper-l0/benchmarking/runs/greg_demo_small
216+
```
217+
218+
Run a GREG benchmark from an exported bundle:
219+
220+
```bash
221+
python paper-l0/benchmarking/benchmark_cli.py run \
222+
--method greg \
223+
--run-dir paper-l0/benchmarking/runs/greg_demo_small
224+
```
225+
226+
Run `L0` on an exported bundle:
227+
228+
```bash
229+
python paper-l0/benchmarking/benchmark_cli.py run \
230+
--method l0 \
231+
--run-dir paper-l0/benchmarking/runs/greg_demo_small
232+
```
233+
234+
Equivalent root Make targets:
235+
236+
```bash
237+
make benchmarking-export MANIFEST=paper-l0/benchmarking/manifests/greg_demo_small.example.json RUN_DIR=paper-l0/benchmarking/runs/greg_demo_small
238+
make benchmarking-run-greg RUN_DIR=paper-l0/benchmarking/runs/greg_demo_small
239+
make benchmarking-run-l0 RUN_DIR=paper-l0/benchmarking/runs/greg_demo_small
240+
```

0 commit comments

Comments
 (0)