Skip to content

Commit 420deb4

Browse files
juaristi22claude
andcommitted
Document IPF's two-entity-level constraint as a hard limit
Replaces the deferred-future framing ("`tax_unit_count` and `spm_unit_count` remain outside the core household/person IPF path *in this pass*") with the methodological reason: `surveysd::ipf` supports exactly two entity levels natively (`conP` for row-level / person-style constraints, `conH` for constraints aggregated by `hid`), and there is no generalised mechanism for additional counted entities such as `tax_unit` or `spm_unit`. Producing a single weight vector that simultaneously satisfies targets at three or more entity levels is not possible with classical IPF. Running IPF separately per scope and aggregating would give it more degrees of freedom than L0 / GREG (each pass solves a smaller subproblem with its own freedom) and would not be a like-for-like comparison. The benchmark therefore restricts IPF to `person_count` and `household_count` — together or alone — and drops other count families at the count check. Those targets remain in the shared sparse system that L0 and GREG fit, so the cross-method comparison on the IPF-feasible subset stays apples-to-apples via `--score-on ipf_retained_authored`. Updated the README's methodology section and the IPF inputs subsection to articulate the constraint, and tightened the `_target_scope` error message in `ipf_conversion.py` to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent eae0ee7 commit 420deb4

2 files changed

Lines changed: 33 additions & 12 deletions

File tree

paper-l0/benchmarking/README.md

Lines changed: 26 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,23 @@ Methodologically, the benchmark treats the methods as related but not
3434
identical:
3535

3636
- `L0` and `GREG` can consume arbitrary linear calibration targets.
37-
- `IPF` is most natural for count-style or indicator-style targets, so the
38-
current automatic conversion path supports `person_count` and
39-
`household_count`.
37+
- `IPF` is most natural for count-style or indicator-style targets, and is
38+
additionally limited to **at most two entity levels per run**. `surveysd::ipf`
39+
has built-in handles only for the person/household pair via `conP` (row-level
40+
constraints) and `conH` (constraints aggregated by `hid`); there is no
41+
generalised mechanism for additional counted entities such as `tax_unit` or
42+
`spm_unit`. The benchmark therefore restricts IPF to `person_count` and
43+
`household_count` targets — together or alone — and drops other count
44+
families (`tax_unit_count`, `spm_unit_count`, `family_count`,
45+
`marital_unit_count`) at the count check with explicit diagnostics. Those
46+
targets remain in the shared sparse system that L0 and GREG fit, so the
47+
cross-method comparison on the IPF-feasible subset is still apples-to-apples
48+
via `--score-on ipf_retained_authored`.
49+
50+
Producing a single weight vector that simultaneously satisfies targets at
51+
three or more entity levels is not possible with classical IPF; running IPF
52+
separately per scope and aggregating would give it more degrees of freedom
53+
than L0 / GREG and would not be a like-for-like comparison.
4054

4155
The core workflow is:
4256

@@ -122,11 +136,15 @@ external overrides are supplied. It reconstructs an IPF microdata table from:
122136
- the selected count-like targets and their stratum constraints
123137

124138
The generated `unit_metadata.csv` is built for `person_count` and
125-
`household_count` targets. It expands cloned households to a person-level table
126-
when person targets are present, carries a repeated household `unit_index` so
127-
per-person weights collapse cleanly back to per-household, and adds one
128-
string-valued derived category column per declared bucket schema (e.g.
129-
`age_bracket`, `agi_bracket_district`, `snap_positive`).
139+
`household_count` targets only — the two entity levels `surveysd::ipf` supports
140+
natively via `conP` and `conH`. It expands cloned households to a person-level
141+
table when person targets are present, carries a repeated household
142+
`unit_index` so per-person weights collapse cleanly back to per-household, and
143+
adds one string-valued derived category column per declared bucket schema
144+
(e.g. `age_bracket`, `agi_bracket_district`, `snap_positive`). Targets at
145+
other entity levels (e.g. `tax_unit_count`, `spm_unit_count`) are dropped at
146+
the count check with `non_count_style` diagnostics; they remain in the shared
147+
sparse target matrix that L0 and GREG fit.
130148

131149
The generated `ipf_target_metadata.csv` contains one `categorical_margin` row
132150
per retained IPF cell after validation. That means:

paper-l0/benchmarking/ipf_conversion.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -353,10 +353,13 @@ def _target_scope(target_variable: str) -> str:
353353
except KeyError as exc:
354354
raise ValueError(
355355
f"IPF conversion does not support target variable "
356-
f"'{target_variable}'. Currently supported: "
357-
f"{sorted(_SCOPE_BY_VARIABLE)}. "
358-
"`tax_unit_count` and `spm_unit_count` remain outside the core "
359-
"household/person IPF path in this pass."
356+
f"'{target_variable}'. Supported: "
357+
f"{sorted(_SCOPE_BY_VARIABLE)}. Classical IPF in this benchmark "
358+
"is limited to person and household scopes — the two entity "
359+
"levels surveysd::ipf supports natively via `conP` and `conH`. "
360+
"Other count families (e.g. `tax_unit_count`, `spm_unit_count`) "
361+
"are dropped from the IPF run with explicit diagnostics; they "
362+
"remain in the shared sparse system that L0 and GREG fit."
360363
) from exc
361364

362365

0 commit comments

Comments
 (0)