Skip to content

Document the boundary vintage of districts/*.h5 files #1143

@DTrim99

Description

@DTrim99

Problem

The HF Hub dataset policyengine/policyengine-us-data/districts/{XX-NN}.h5
doesn't declare which Congress's boundaries were used to allocate
households to each district. Users can't tell whether pairing the h5
with a particular geojson (118th, 119th, 120th) is internally
consistent.

Concrete failure mode: a dashboard renders a 119th-Congress
choropleth on top of impact numbers sampled against 118th boundaries
(or vice versa). District labels match, but the underlying populations
don't.

Asks

  • Add a boundary_year / congress_session attribute to each h5
    output (e.g., "congress": 119, "tiger_year": 2024,
    "genz_vintage": "GENZ2024").
  • Document the build script's geographic input source in the
    dataset card (Census GENZ / TIGER vintage, plus the Hugging Face
    release tag that produced the current h5 set).
  • Backfill the attribute on the existing artifacts if practical;
    otherwise mark them as unknown and document going forward.

Why this matters now

We're staring down a 119th → 120th transition that touches 8 states
(see #1144). Without boundary-vintage metadata, dashboards will
silently mix vintages during the rollout window.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions