Skip to content

Commit 6de667d

Browse files
committed
Make IPF benchmarking fail closed and runnable as one coherent problem
Replace the old order-dependent sequential IPF flow with a validated one-call path. The converter now keeps only closed categorical systems, derives complements only from authored parent totals, reports explicit drop reasons for non-runnable target families, and surfaces incompatible-total failures through structured diagnostics instead of silently chaining margin blocks. Update the benchmark harness to export retained-authored IPF scoring artifacts, validate the external-IPF input contract, and let L0/GREG opt into scoring on the same retained-authored subset for apples-to-apples comparisons. Refresh the walkthrough notebook, README, example manifest, and add regression coverage for closure logic, runner behavior, export validation, and the new scoring contract.
1 parent 5642a62 commit 6de667d

11 files changed

Lines changed: 2221 additions & 659 deletions

paper-l0/benchmarking/README.md

Lines changed: 81 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ The core workflow is:
4444
2. export a shared benchmark bundle from a saved calibration package
4545
3. auto-convert the bundle to IPF inputs when needed
4646
4. run `L0`, `GREG`, or `IPF`
47-
5. score all fitted weights against the same shared target matrix
47+
5. score each method against the target set that matches its benchmark contract
4848

4949
## Layout
5050

@@ -113,30 +113,65 @@ lightweight.
113113

114114
### IPF inputs
115115

116-
The exporter now auto-generates IPF inputs when the manifest includes `ipf`
117-
and no external overrides are supplied. It reconstructs an IPF microdata table
118-
from:
116+
The exporter auto-generates IPF inputs when the manifest includes `ipf` and no
117+
external overrides are supplied. It reconstructs an IPF microdata table from:
119118

120119
- the saved calibration package
121120
- the package metadata's `dataset_path`
122121
- the package metadata's `db_path`
123122
- the selected count-like targets and their stratum constraints
124123

125-
The generated `unit_metadata.csv` is currently built for `person_count` and
124+
The generated `unit_metadata.csv` is built for `person_count` and
126125
`household_count` targets. It expands cloned households to a person-level table
127-
when person targets are present, carries a repeated household `unit_index`, and
128-
adds one derived indicator column per selected target. The generated
129-
`ipf_target_metadata.csv` then references those indicator columns as numerical
130-
IPF totals.
126+
when person targets are present, carries a repeated household `unit_index` so
127+
per-person weights collapse cleanly back to per-household, and adds one
128+
string-valued derived category column per declared bucket schema (e.g.
129+
`age_bracket`, `agi_bracket_district`, `snap_positive`).
130+
131+
The generated `ipf_target_metadata.csv` contains one `categorical_margin` row
132+
per retained IPF cell after validation. That means:
133+
134+
- authored cells that belong to a closed categorical system are kept
135+
- binary subset families may gain exactly-derived complement cells when an
136+
authored parent total exists on the exact reduced key
137+
- open subset families are dropped rather than emitted as 1-cell margins
138+
139+
The exporter also writes:
140+
141+
- `ipf_scoring_target_metadata.csv`
142+
- `ipf_scoring_X_targets_by_units.mtx`
143+
144+
These score IPF on its retained authored targets only. Derived complements are
145+
recorded for transparency in `ipf_conversion_diagnostics.json`, but they are
146+
not part of the main benchmark metric set.
147+
148+
When comparing `L0` or `GREG` against that same subset, pass:
149+
150+
```bash
151+
python paper-l0/benchmarking/benchmark_cli.py run \
152+
--method l0 \
153+
--run-dir <bundle> \
154+
--score-on ipf_retained_authored
155+
```
131156

132157
External CSVs are still supported through `external_inputs.*` and override the
133-
automatic conversion path when provided.
158+
automatic conversion path when provided. The external-IPF contract is strict:
159+
160+
- `external_inputs.ipf_unit_metadata_csv`
161+
- `external_inputs.ipf_target_metadata_csv`
162+
- `external_inputs.ipf_scoring_target_metadata_csv`
163+
- `external_inputs.ipf_scoring_matrix_mtx`
164+
165+
must be provided together. An optional
166+
`external_inputs.ipf_conversion_diagnostics_json` can also be supplied and will
167+
be copied through for reporting. External CSVs must also follow the
168+
`categorical_margin` schema below; the runner rejects `numeric_total` rows.
134169

135170
### IPF conversion step by step
136171

137172
The IPF conversion is implemented in
138-
[ipf_conversion.py](/Users/movil1/Desktop/PYTHONJOBS/PolicyEngine/policyengine-us-data/paper-l0/benchmarking/ipf_conversion.py)
139-
and runs during `benchmark_cli.py export`.
173+
[ipf_conversion.py](./ipf_conversion.py) and runs during
174+
`benchmark_cli.py export`.
140175

141176
1. Load the saved calibration package and apply the manifest target filters.
142177
2. Read `dataset_path`, `db_path`, and `n_clones` from the package metadata.
@@ -152,18 +187,25 @@ and runs during `benchmark_cli.py export`.
152187
household-clone `unit_index`.
153188
7. Calculate the needed source variables from the dataset and attach them to
154189
the IPF unit table.
155-
8. For each selected target, evaluate its original stratum logic row by row and
156-
materialize the result as a derived indicator column such as
157-
`ipf_indicator_00000`.
158-
9. Write `ipf_target_metadata.csv` so each selected target becomes a
159-
`numeric_total` IPF constraint over one of those derived indicator columns.
160-
10. Run `surveysd::ipf` on the generated unit table and target metadata.
161-
11. Collapse the fitted IPF row weights back to one weight per shared benchmark
162-
`unit_index`, so the fitted result can be scored against the same sparse
163-
calibration matrix used by `L0` and `GREG`.
164-
165-
This means the benchmark uses one common scoring space even though `IPF`
166-
requires a richer input representation than `L0` and `GREG`.
190+
8. Materialize the string-valued derived category columns the margins cover
191+
(e.g. `age_bracket`, `snap_positive`) on that unit table.
192+
9. Group the resolved targets into margin families, validate them against the
193+
observed unit-table support, and keep only families that are already closed
194+
or can be closed exactly from authored parent totals.
195+
10. Emit one `categorical_margin` row per retained authored or exactly-derived
196+
cell, sharing a `margin_id` within each family.
197+
11. Write diagnostics (`dropped_targets`, retained-authored counts, derived
198+
complements, and any coherence issues) to
199+
`inputs/ipf_conversion_diagnostics.json`.
200+
12. Run `surveysd::ipf` once on the generated unit table and full validated
201+
IPF target metadata.
202+
13. Collapse the fitted IPF row weights back to one weight per shared
203+
benchmark `unit_index`, so the fitted result can be scored against the
204+
retained-authored sparse target subset used for the IPF benchmark.
205+
206+
This means the benchmark keeps a shared requested target space for the export,
207+
but an IPF-specific retained-authored scoring space for the actual IPF
208+
comparison.
167209

168210
### Why the IPF conversion exists
169211

@@ -182,28 +224,21 @@ matrix directly into `surveysd::ipf`.
182224

183225
### IPF target metadata schema
184226

185-
`ipf_runner.R` supports two target metadata encodings:
186-
187-
- `numeric_total`
188-
One row per target with:
189-
- `scope`: `person` or `household`
190-
- `target_type`: `numeric_total`
191-
- `value_column`: unit-data column to calibrate
192-
- `variables`: grouping variables used to wrap the numeric total in a one-cell
193-
or multi-cell array
194-
- `cell`: pipe-separated assignments for the target cell
195-
- `target_value`: numeric total
196-
- `categorical_margin`
197-
One row per margin cell with:
198-
- `scope`: `person` or `household`
199-
- `target_type`: `categorical_margin`
200-
- `margin_id`: identifier for a margin table
201-
- `variables`: pipe-separated variable names, e.g. `district_id|age_bin`
202-
- `cell`: pipe-separated assignments, e.g.
203-
`district_id=0601|age_bin=18_24`
204-
- `target_value`: numeric target
205-
206-
The automatic conversion path currently emits `numeric_total` rows.
227+
`ipf_runner.R` accepts one encoding: `categorical_margin`. One row per
228+
authored margin cell:
229+
230+
- `scope`: `person` or `household`
231+
- `target_type`: `categorical_margin`
232+
- `margin_id`: identifier for a margin block. Rows sharing a `margin_id` are
233+
grouped into one `surveysd::ipf` constraint (via `xtabs`).
234+
- `variables`: pipe-separated variable names, e.g.
235+
`congressional_district_geoid|age_bracket`
236+
- `cell`: pipe-separated assignments, e.g.
237+
`congressional_district_geoid=0601|age_bracket=0-4`
238+
- `target_value`: numeric target
239+
240+
Open subset systems are not exported. If a subset family cannot be closed from
241+
an authored parent total, it is dropped before the R call.
207242

208243
## Example Commands
209244

paper-l0/benchmarking/benchmark_cli.py

Lines changed: 118 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -101,10 +101,51 @@ def _run_greg(run_dir: Path):
101101
return weights_path, elapsed
102102

103103

104+
def _collapse_ipf_rows_to_unit_weights(
105+
raw_weights: pd.DataFrame, n_units: int
106+
) -> np.ndarray:
107+
"""Validate a per-row IPF output and collapse it to a length-`n_units` vector.
108+
109+
surveysd::ipf with `meanHH = TRUE` guarantees every row that shares a
110+
`unit_index` carries the same fitted weight; the spread check keeps that
111+
assumption honest.
112+
"""
113+
if "unit_index" not in raw_weights.columns:
114+
raise RuntimeError("IPF runner output must include a unit_index column")
115+
if raw_weights["unit_index"].isna().any():
116+
raise RuntimeError("IPF runner output contains missing unit_index values")
117+
118+
raw_weights = raw_weights.copy()
119+
raw_weights["unit_index"] = raw_weights["unit_index"].astype(np.int64)
120+
if (raw_weights["unit_index"] < 0).any() or (
121+
raw_weights["unit_index"] >= n_units
122+
).any():
123+
raise RuntimeError("IPF runner output contains out-of-range unit_index values")
124+
125+
per_unit_spread = raw_weights.groupby("unit_index", sort=True)["fitted_weight"].agg(
126+
lambda series: float(series.max() - series.min())
127+
)
128+
if (per_unit_spread > 1e-9).any():
129+
raise RuntimeError(
130+
"IPF runner produced inconsistent fitted weights within the same unit_index"
131+
)
132+
133+
weights_by_unit = (
134+
raw_weights.groupby("unit_index", sort=True)["fitted_weight"]
135+
.first()
136+
.reindex(np.arange(n_units, dtype=np.int64))
137+
)
138+
if weights_by_unit.isna().any():
139+
raise RuntimeError(
140+
"Aggregated IPF weights do not cover the full benchmark unit range"
141+
)
142+
return weights_by_unit.to_numpy(dtype=np.float64)
143+
144+
104145
def _run_ipf(run_dir: Path):
146+
"""Run one coherent IPF problem in a single `surveysd::ipf` call."""
105147
inputs = run_dir / "inputs"
106148
outputs = run_dir / "outputs"
107-
temp_csv = outputs / "_ipf_weights.csv"
108149

109150
with open(inputs / "benchmark_manifest.json") as f:
110151
manifest = json.load(f)
@@ -116,69 +157,97 @@ def _run_ipf(run_dir: Path):
116157
"IPF run requires inputs/ipf_target_metadata.csv. "
117158
"Provide external_inputs.ipf_target_metadata_csv in the manifest."
118159
)
160+
unit_metadata_path = inputs / "unit_metadata.csv"
161+
if not unit_metadata_path.exists():
162+
raise FileNotFoundError("IPF run requires inputs/unit_metadata.csv.")
163+
164+
full_targets = pd.read_csv(target_metadata_path)
165+
if full_targets.empty:
166+
raise RuntimeError("IPF target metadata is empty; nothing to run.")
167+
unit_metadata = pd.read_csv(unit_metadata_path)
168+
if "unit_index" not in unit_metadata.columns:
169+
raise RuntimeError("Unit metadata must include a unit_index column for IPF")
170+
171+
weight_col = str(options.get("weight_col", "base_weight"))
172+
household_id_col = str(options.get("household_id_col", "household_id"))
173+
174+
initial_weights = np.load(inputs / "initial_weights.npy").astype(np.float64)
175+
n_units = len(initial_weights)
176+
unit_indices = unit_metadata["unit_index"].astype(np.int64).to_numpy()
177+
if unit_indices.min() < 0 or unit_indices.max() >= n_units:
178+
raise RuntimeError(
179+
"Unit metadata unit_index values fall outside the initial weight vector"
180+
)
181+
temp_csv = outputs / "_ipf_weights.csv"
182+
unit_with_weights = unit_metadata.copy()
183+
unit_with_weights[weight_col] = initial_weights[unit_indices]
184+
temp_unit_csv = outputs / "_ipf_unit_metadata.csv"
185+
unit_with_weights.to_csv(temp_unit_csv, index=False)
119186

120187
cmd = [
121188
"Rscript",
122189
str(RUNNERS_DIR / "ipf_runner.R"),
123-
str(inputs / "unit_metadata.csv"),
190+
str(temp_unit_csv),
124191
str(target_metadata_path),
125192
str(inputs / "initial_weights.npy"),
126193
str(temp_csv),
127194
str(int(options.get("max_iter", 200))),
128195
str(float(options.get("bound", 4.0))),
129196
str(float(options.get("epsP", 1e-6))),
130197
str(float(options.get("epsH", 1e-2))),
131-
str(options.get("household_id_col", "household_id")),
132-
str(options.get("weight_col", "base_weight")),
198+
household_id_col,
199+
weight_col,
133200
]
134201
proc, elapsed = _run_subprocess(cmd)
135202
if proc.returncode != 0:
136203
raise RuntimeError(f"IPF runner failed with exit code {proc.returncode}")
137204

138205
raw_weights = pd.read_csv(temp_csv)
139-
if "unit_index" not in raw_weights.columns:
140-
raise RuntimeError("IPF runner output must include a unit_index column")
141-
if raw_weights["unit_index"].isna().any():
142-
raise RuntimeError("IPF runner output contains missing unit_index values")
143-
144-
raw_weights["unit_index"] = raw_weights["unit_index"].astype(np.int64)
145-
n_units = len(np.load(inputs / "initial_weights.npy"))
146-
if (raw_weights["unit_index"] < 0).any() or (
147-
raw_weights["unit_index"] >= n_units
148-
).any():
149-
raise RuntimeError("IPF runner output contains out-of-range unit_index values")
150-
151-
per_unit_spread = raw_weights.groupby("unit_index", sort=True)["fitted_weight"].agg(
152-
lambda series: float(series.max() - series.min())
153-
)
154-
inconsistent_units = per_unit_spread[per_unit_spread > 1e-9]
155-
if not inconsistent_units.empty:
156-
raise RuntimeError(
157-
"IPF runner produced inconsistent fitted weights within the same unit_index"
158-
)
159-
160-
weights_by_unit = (
161-
raw_weights.groupby("unit_index", sort=True)["fitted_weight"]
162-
.first()
163-
.reindex(np.arange(n_units, dtype=np.int64))
164-
)
165-
if weights_by_unit.isna().any():
166-
raise RuntimeError(
167-
"Aggregated IPF weights do not cover the full benchmark unit range"
168-
)
169-
weights = weights_by_unit.to_numpy(dtype=np.float64)
206+
current_weights = _collapse_ipf_rows_to_unit_weights(raw_weights, n_units)
170207
weights_path = outputs / "fitted_weights.npy"
171-
np.save(weights_path, weights)
208+
np.save(weights_path, current_weights)
172209
temp_csv.unlink(missing_ok=True)
210+
temp_unit_csv.unlink(missing_ok=True)
173211
return weights_path, elapsed
174212

175213

214+
def _select_scoring_inputs(
215+
run_dir: Path, method: str, score_on: str
216+
) -> tuple[Path, Path, str]:
217+
inputs = run_dir / "inputs"
218+
ipf_targets = inputs / "ipf_scoring_target_metadata.csv"
219+
ipf_matrix = inputs / "ipf_scoring_X_targets_by_units.mtx"
220+
has_ipf_scoring = ipf_targets.exists() and ipf_matrix.exists()
221+
222+
if score_on == "ipf_retained_authored":
223+
if not has_ipf_scoring:
224+
raise FileNotFoundError(
225+
"Requested score_on=ipf_retained_authored, but "
226+
"inputs/ipf_scoring_target_metadata.csv and "
227+
"inputs/ipf_scoring_X_targets_by_units.mtx are not both present."
228+
)
229+
return ipf_targets, ipf_matrix, "ipf_retained_authored"
230+
231+
if score_on == "auto" and method == "ipf" and has_ipf_scoring:
232+
return ipf_targets, ipf_matrix, "ipf_retained_authored"
233+
return (
234+
inputs / "target_metadata.csv",
235+
inputs / "X_targets_by_units.mtx",
236+
"shared_requested",
237+
)
238+
239+
176240
def cmd_run(args):
177241
run_dir = Path(args.run_dir)
178242
inputs = run_dir / "inputs"
179243
outputs = run_dir / "outputs"
180244
outputs.mkdir(parents=True, exist_ok=True)
181-
targets_df = load_targets_csv(inputs / "target_metadata.csv")
245+
targets_path, matrix_path, scoring_target_set = _select_scoring_inputs(
246+
run_dir,
247+
args.method,
248+
getattr(args, "score_on", "auto"),
249+
)
250+
targets_df = load_targets_csv(targets_path)
182251

183252
started = time.time()
184253
if args.method == "l0":
@@ -195,11 +264,12 @@ def cmd_run(args):
195264
summary = compute_common_metrics(
196265
weights=weights,
197266
targets_df=targets_df,
198-
matrix_path=inputs / "X_targets_by_units.mtx",
267+
matrix_path=matrix_path,
199268
)
200269
summary["method"] = args.method
201270
summary["run_dir"] = str(run_dir.resolve())
202271
summary["runtime_seconds"] = elapsed
272+
summary["scoring_target_set"] = scoring_target_set
203273
write_method_summary(summary, outputs / f"{args.method}_summary.json")
204274
print(json.dumps(summary, indent=2, sort_keys=True))
205275
return 0
@@ -225,6 +295,16 @@ def build_parser():
225295
run_parser.add_argument(
226296
"--run-dir", required=True, help="Exported benchmark bundle directory"
227297
)
298+
run_parser.add_argument(
299+
"--score-on",
300+
default="auto",
301+
choices=["auto", "shared_requested", "ipf_retained_authored"],
302+
help=(
303+
"Scoring target set. 'auto' uses IPF-retained-authored targets only "
304+
"for method=ipf when available; the other methods default to the "
305+
"shared requested target set unless explicitly overridden."
306+
),
307+
)
228308
run_parser.set_defaults(func=cmd_run)
229309

230310
return parser

0 commit comments

Comments
 (0)