|
| 1 | +# Benchmarking Scaffold |
| 2 | + |
| 3 | +This directory contains the implementation scaffold for benchmarking the |
| 4 | +`L0` calibration pipeline against: |
| 5 | + |
| 6 | +- `GREG` via R's `survey` package |
| 7 | +- `IPF` via R's `surveysd` package |
| 8 | + |
| 9 | +## Experimental Setup |
| 10 | + |
| 11 | +The benchmark is organized around one shared exported bundle and multiple |
| 12 | +method adapters. |
| 13 | + |
| 14 | +- `L0` and `GREG` are compared on the shared calibration representation: |
| 15 | + a sparse target-by-unit matrix, the selected target table, and |
| 16 | + initial .npy weights. |
| 17 | +- `IPF` is benchmarked from the same target selection, but it requires a |
| 18 | + conversion step because `surveysd::ipf` consumes a microdata table plus |
| 19 | + IPF constraints rather than a generic sparse linear system. |
| 20 | +- The intended benchmark tiers are: |
| 21 | + - a practical reduced-size comparison tier, used for like-for-like `L0` |
| 22 | + versus `GREG` runs that are small enough to execute routinely during |
| 23 | + development |
| 24 | + - an IPF-focused reduced-size tier on count-style targets, used because |
| 25 | + classical `IPF` is most naturally evaluated on count or indicator margins |
| 26 | + rather than the full arbitrary target set |
| 27 | + - a scaling ladder over increasing target counts, used to show how runtime, |
| 28 | + memory use, convergence, and outright failure change as the benchmark moves |
| 29 | + from small target subsets toward the full calibration problem |
| 30 | + - a production-feasibility tier, used to test which methods can still run at |
| 31 | + something close to the full production clone count and target volume |
| 32 | + |
| 33 | +Methodologically, the benchmark treats the methods as related but not |
| 34 | +identical: |
| 35 | + |
| 36 | +- `L0` and `GREG` can consume arbitrary linear calibration targets. |
| 37 | +- `IPF` is most natural for count-style or indicator-style targets, so the |
| 38 | + current automatic conversion path supports `person_count` and |
| 39 | + `household_count`. |
| 40 | + |
| 41 | +The core workflow is: |
| 42 | + |
| 43 | +1. select a benchmark target subset with a manifest |
| 44 | +2. export a shared benchmark bundle from a saved calibration package |
| 45 | +3. auto-convert the bundle to IPF inputs when needed |
| 46 | +4. run `L0`, `GREG`, or `IPF` |
| 47 | +5. score all fitted weights against the same shared target matrix |
| 48 | + |
| 49 | +## Layout |
| 50 | + |
| 51 | +- `benchmark_cli.py` |
| 52 | + Main CLI for exporting benchmark bundles and running methods. |
| 53 | +- `benchmark_manifest.py` |
| 54 | + Manifest schema and target-filter logic. |
| 55 | +- `benchmark_export.py` |
| 56 | + Export utilities for shared benchmark artifacts. |
| 57 | +- `ipf_conversion.py` |
| 58 | + Automatic conversion from the saved calibration package to IPF-ready |
| 59 | + unit and target metadata. |
| 60 | +- `benchmark_metrics.py` |
| 61 | + Common diagnostics and summary generation. |
| 62 | +- `runners/greg_runner.R` |
| 63 | + R backend for `survey`-based GREG. |
| 64 | +- `runners/ipf_runner.R` |
| 65 | + R backend for `surveysd`-based IPF. |
| 66 | +- `runners/read_npy.R` |
| 67 | + Minimal `.npy` reader used by the R scripts. |
| 68 | +- `requirements-python.txt` |
| 69 | + Python dependencies for the benchmarking scaffold. |
| 70 | +- `install_r_packages.R` |
| 71 | + Installs the required R packages for the benchmark runners. |
| 72 | +- `manifests/*.example.json` |
| 73 | + Example benchmark manifests. |
| 74 | + |
| 75 | +## Environment Setup |
| 76 | + |
| 77 | +Python: |
| 78 | + |
| 79 | +```bash |
| 80 | +pip install -r paper-l0/benchmarking/requirements-python.txt |
| 81 | +``` |
| 82 | + |
| 83 | +R: |
| 84 | + |
| 85 | +```bash |
| 86 | +Rscript paper-l0/benchmarking/install_r_packages.R |
| 87 | +``` |
| 88 | + |
| 89 | +Or, from the repo root: |
| 90 | + |
| 91 | +```bash |
| 92 | +make benchmarking-install-python |
| 93 | +make benchmarking-install-r |
| 94 | +``` |
| 95 | + |
| 96 | +## Chosen Interchange Formats |
| 97 | + |
| 98 | +- sparse matrix: Matrix Market `.mtx` |
| 99 | +- target metadata: `.csv` |
| 100 | +- unit metadata: `.csv` |
| 101 | +- initial weights: `.npy` |
| 102 | +- benchmark manifest: `.json` |
| 103 | +- method result summary: `.json` |
| 104 | +- fitted weights: `.npy` |
| 105 | + |
| 106 | +## Notes |
| 107 | + |
| 108 | +### Shared calibration package |
| 109 | + |
| 110 | +The exporter reads the saved calibration package directly from pickle rather |
| 111 | +than importing the full calibration CLI. This keeps the benchmark I/O path |
| 112 | +lightweight. |
| 113 | + |
| 114 | +### IPF inputs |
| 115 | + |
| 116 | +The exporter now auto-generates IPF inputs when the manifest includes `ipf` |
| 117 | +and no external overrides are supplied. It reconstructs an IPF microdata table |
| 118 | +from: |
| 119 | + |
| 120 | +- the saved calibration package |
| 121 | +- the package metadata's `dataset_path` |
| 122 | +- the package metadata's `db_path` |
| 123 | +- the selected count-like targets and their stratum constraints |
| 124 | + |
| 125 | +The generated `unit_metadata.csv` is currently built for `person_count` and |
| 126 | +`household_count` targets. It expands cloned households to a person-level table |
| 127 | +when person targets are present, carries a repeated household `unit_index`, and |
| 128 | +adds one derived indicator column per selected target. The generated |
| 129 | +`ipf_target_metadata.csv` then references those indicator columns as numerical |
| 130 | +IPF totals. |
| 131 | + |
| 132 | +External CSVs are still supported through `external_inputs.*` and override the |
| 133 | +automatic conversion path when provided. |
| 134 | + |
| 135 | +### IPF conversion step by step |
| 136 | + |
| 137 | +The IPF conversion is implemented in |
| 138 | +[ipf_conversion.py](/Users/movil1/Desktop/PYTHONJOBS/PolicyEngine/policyengine-us-data/paper-l0/benchmarking/ipf_conversion.py) |
| 139 | +and runs during `benchmark_cli.py export`. |
| 140 | + |
| 141 | +1. Load the saved calibration package and apply the manifest target filters. |
| 142 | +2. Read `dataset_path`, `db_path`, and `n_clones` from the package metadata. |
| 143 | +3. Query `stratum_constraints` for the selected targets from the target DB. |
| 144 | +4. Identify the source variables needed to evaluate those constraints, such as |
| 145 | + `age`, `snap`, or `medicaid_enrolled`. |
| 146 | +5. Reconstruct the cloned household universe from `initial_weights`, |
| 147 | + `block_geoid`, and `cd_geoid`. This yields one benchmark unit per matrix |
| 148 | + column. |
| 149 | +6. If any selected IPF target is `person_count`, expand that cloned household |
| 150 | + universe to a person-level table using the source dataset's person-to- |
| 151 | + household links. Multiple person rows may therefore share the same |
| 152 | + household-clone `unit_index`. |
| 153 | +7. Calculate the needed source variables from the dataset and attach them to |
| 154 | + the IPF unit table. |
| 155 | +8. For each selected target, evaluate its original stratum logic row by row and |
| 156 | + materialize the result as a derived indicator column such as |
| 157 | + `ipf_indicator_00000`. |
| 158 | +9. Write `ipf_target_metadata.csv` so each selected target becomes a |
| 159 | + `numeric_total` IPF constraint over one of those derived indicator columns. |
| 160 | +10. Run `surveysd::ipf` on the generated unit table and target metadata. |
| 161 | +11. Collapse the fitted IPF row weights back to one weight per shared benchmark |
| 162 | + `unit_index`, so the fitted result can be scored against the same sparse |
| 163 | + calibration matrix used by `L0` and `GREG`. |
| 164 | + |
| 165 | +This means the benchmark uses one common scoring space even though `IPF` |
| 166 | +requires a richer input representation than `L0` and `GREG`. |
| 167 | + |
| 168 | +### Why the IPF conversion exists |
| 169 | + |
| 170 | +`L0` and `GREG` can work directly with a sparse linear system of the form |
| 171 | +`X w = t`. |
| 172 | + |
| 173 | +Classical `IPF` does not start from that object. It expects: |
| 174 | + |
| 175 | +- a unit-record table |
| 176 | +- categorical or indicator variables on that table |
| 177 | +- target totals over those variables |
| 178 | + |
| 179 | +So the benchmark exporter translates selected count-style calibration targets |
| 180 | +into that IPF-friendly representation instead of trying to feed the sparse |
| 181 | +matrix directly into `surveysd::ipf`. |
| 182 | + |
| 183 | +### IPF target metadata schema |
| 184 | + |
| 185 | +`ipf_runner.R` supports two target metadata encodings: |
| 186 | + |
| 187 | +- `numeric_total` |
| 188 | + One row per target with: |
| 189 | + - `scope`: `person` or `household` |
| 190 | + - `target_type`: `numeric_total` |
| 191 | + - `value_column`: unit-data column to calibrate |
| 192 | + - `variables`: grouping variables used to wrap the numeric total in a one-cell |
| 193 | + or multi-cell array |
| 194 | + - `cell`: pipe-separated assignments for the target cell |
| 195 | + - `target_value`: numeric total |
| 196 | +- `categorical_margin` |
| 197 | + One row per margin cell with: |
| 198 | + - `scope`: `person` or `household` |
| 199 | + - `target_type`: `categorical_margin` |
| 200 | + - `margin_id`: identifier for a margin table |
| 201 | + - `variables`: pipe-separated variable names, e.g. `district_id|age_bin` |
| 202 | + - `cell`: pipe-separated assignments, e.g. |
| 203 | + `district_id=0601|age_bin=18_24` |
| 204 | + - `target_value`: numeric target |
| 205 | + |
| 206 | +The automatic conversion path currently emits `numeric_total` rows. |
| 207 | + |
| 208 | +## Example Commands |
| 209 | + |
| 210 | +Export a benchmark bundle: |
| 211 | + |
| 212 | +```bash |
| 213 | +python paper-l0/benchmarking/benchmark_cli.py export \ |
| 214 | + --manifest paper-l0/benchmarking/manifests/greg_demo_small.example.json \ |
| 215 | + --output-dir paper-l0/benchmarking/runs/greg_demo_small |
| 216 | +``` |
| 217 | + |
| 218 | +Run a GREG benchmark from an exported bundle: |
| 219 | + |
| 220 | +```bash |
| 221 | +python paper-l0/benchmarking/benchmark_cli.py run \ |
| 222 | + --method greg \ |
| 223 | + --run-dir paper-l0/benchmarking/runs/greg_demo_small |
| 224 | +``` |
| 225 | + |
| 226 | +Run `L0` on an exported bundle: |
| 227 | + |
| 228 | +```bash |
| 229 | +python paper-l0/benchmarking/benchmark_cli.py run \ |
| 230 | + --method l0 \ |
| 231 | + --run-dir paper-l0/benchmarking/runs/greg_demo_small |
| 232 | +``` |
| 233 | + |
| 234 | +Equivalent root Make targets: |
| 235 | + |
| 236 | +```bash |
| 237 | +make benchmarking-export MANIFEST=paper-l0/benchmarking/manifests/greg_demo_small.example.json RUN_DIR=paper-l0/benchmarking/runs/greg_demo_small |
| 238 | +make benchmarking-run-greg RUN_DIR=paper-l0/benchmarking/runs/greg_demo_small |
| 239 | +make benchmarking-run-l0 RUN_DIR=paper-l0/benchmarking/runs/greg_demo_small |
| 240 | +``` |
0 commit comments