Summary
The released enhanced_cps_2024.h5 contains the PUF clone half, but the final calibrated household_weight assigns it exactly zero total weight. This means the PUF-imputed half is effectively unused in the final ECPS, despite being present in the file.
Live artifact checked
policyengine-us-data package version: 1.115.4
- HuggingFace artifact:
policyengine/policyengine-us-data/enhanced_cps_2024.h5
- HF snapshot:
0c1409119fe197f4604a0e125999f8ebd3c73a21
Result
total_household_weight=161,309,969.280334
cps_household_weight=161,309,969.280334
puf_clone_household_weight=0.000000
cps_share=100.000000%
puf_clone_share=0.000000%
household_rows=41,314
cps_rows=20,657
puf_clone_rows=20,657
cps_positive_weight_rows=9,343
puf_positive_weight_rows=0
puf_max_weight=0.000000
Reproduction
from importlib.metadata import version
import h5py
from huggingface_hub import hf_hub_download
print(version("policyengine-us-data"))
path = hf_hub_download(
repo_id="policyengine/policyengine-us-data",
filename="enhanced_cps_2024.h5",
)
def read_period(f, var, period="2024"):
obj = f[var]
if isinstance(obj, h5py.Dataset):
return obj[:]
return obj[period][:]
with h5py.File(path, "r") as f:
weight = read_period(f, "household_weight").astype(float)
clone = read_period(f, "household_is_puf_clone").astype(bool)
print("total", weight.sum())
print("cps", weight[~clone].sum())
print("puf_clone", weight[clone].sum())
print("puf positive rows", (weight[clone] > 0).sum(), "/", clone.sum())
Why this matters
This likely explains why PUF-only or PUF-heavy tax variables can remain far below administrative targets in the final ECPS. For example, local diagnostics showed the PUF source has high-LTCG donors, but final calibrated ECPS places no meaningful weight on the PUF clone half.
Suspected cause
puf_clone_dataset() intentionally creates the clone half with zero household weight. initialize_weight_priors() then turns zero weights into near-zero priors (~1e-6). Since reweighting optimizes weights in log space, those rows appear to be effectively unable to gain meaningful national weight.
Fix direction
Zero-weight clone households need meaningful positive prior mass before reweighting, while retaining calibration constraints strong enough to prevent PUF-heavy variables from exploding. A simple local diagnostic that split initial prior mass 50% CPS / 50% PUF-clone did make clone rows usable, but produced an unstable full rebuild: clone rows received 59.8% of final household weight and 2024 long-term capital gains rose to about 87x the current SOI target. So the fix should combine positive clone priors with validation/guardrails for aggregate targets, especially capital gains.
Summary
The released
enhanced_cps_2024.h5contains the PUF clone half, but the final calibratedhousehold_weightassigns it exactly zero total weight. This means the PUF-imputed half is effectively unused in the final ECPS, despite being present in the file.Live artifact checked
policyengine-us-datapackage version:1.115.4policyengine/policyengine-us-data/enhanced_cps_2024.h50c1409119fe197f4604a0e125999f8ebd3c73a21Result
Reproduction
Why this matters
This likely explains why PUF-only or PUF-heavy tax variables can remain far below administrative targets in the final ECPS. For example, local diagnostics showed the PUF source has high-LTCG donors, but final calibrated ECPS places no meaningful weight on the PUF clone half.
Suspected cause
puf_clone_dataset()intentionally creates the clone half with zero household weight.initialize_weight_priors()then turns zero weights into near-zero priors (~1e-6). Since reweighting optimizes weights in log space, those rows appear to be effectively unable to gain meaningful national weight.Fix direction
Zero-weight clone households need meaningful positive prior mass before reweighting, while retaining calibration constraints strong enough to prevent PUF-heavy variables from exploding. A simple local diagnostic that split initial prior mass 50% CPS / 50% PUF-clone did make clone rows usable, but produced an unstable full rebuild: clone rows received 59.8% of final household weight and 2024 long-term capital gains rose to about 87x the current SOI target. So the fix should combine positive clone priors with validation/guardrails for aggregate targets, especially capital gains.