Problem
The HF Hub dataset policyengine/policyengine-us-data/districts/{XX-NN}.h5
doesn't declare which Congress's boundaries were used to allocate
households to each district. Users can't tell whether pairing the h5
with a particular geojson (118th, 119th, 120th) is internally
consistent.
Concrete failure mode: a dashboard renders a 119th-Congress
choropleth on top of impact numbers sampled against 118th boundaries
(or vice versa). District labels match, but the underlying populations
don't.
Asks
Why this matters now
We're staring down a 119th → 120th transition that touches 8 states
(see #1144). Without boundary-vintage metadata, dashboards will
silently mix vintages during the rollout window.
Related
Problem
The HF Hub dataset
policyengine/policyengine-us-data/districts/{XX-NN}.h5doesn't declare which Congress's boundaries were used to allocate
households to each district. Users can't tell whether pairing the h5
with a particular geojson (118th, 119th, 120th) is internally
consistent.
Concrete failure mode: a dashboard renders a 119th-Congress
choropleth on top of impact numbers sampled against 118th boundaries
(or vice versa). District labels match, but the underlying populations
don't.
Asks
boundary_year/congress_sessionattribute to each h5output (e.g.,
"congress": 119,"tiger_year": 2024,"genz_vintage": "GENZ2024").dataset card (Census GENZ / TIGER vintage, plus the Hugging Face
release tag that produced the current h5 set).
otherwise mark them as
unknownand document going forward.Why this matters now
We're staring down a 119th → 120th transition that touches 8 states
(see #1144). Without boundary-vintage metadata, dashboards will
silently mix vintages during the rollout window.
Related