Skip to content

Add source-package facts for PE parity targets#3

Draft
MaxGhenis wants to merge 14 commits into
mainfrom
codex/source-packages-pe-parity
Draft

Add source-package facts for PE parity targets#3
MaxGhenis wants to merge 14 commits into
mainfrom
codex/source-packages-pe-parity

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis commented May 22, 2026

Summary

  • adds the source-package harness and consumer fact contract needed by Microplex target adapters
  • adds PE parity source packages and raw/source manifests across US and UK publisher sources
  • extends high-value packages for the current parity pass, including TANF state family counts, LIHEAP households served, SOI Historic Table 2 counts, state AGI amounts, ISC private-school pupil counts, and the HMT Budget 2025 salary-sacrifice contribution amount
  • redacts the CMS page embedded public Mapbox token from the committed raw HTML while preserving the source-package parse path
  • extends voa-council-tax-bands-2025 with 2,563 source-numeric local-authority band facts from CTSOP1.0, omitting suppressed cells rather than fabricating values
  • extends PDF/HTML number parsing to preserve currency-prefixed compact amounts such as £32bn, and updates existing PDF guards to match the source text

Paired work

Validation

  • uv run arch build-suite hmt-budget-policy-costings-2025-salary-sacrifice --year 2025 --out /tmp/arch-hmt-budget-policy-costings-2025-salary-sacrifice --replace -> valid, 1 fact
  • uv run pytest tests/test_arch_source_package.py -q -k "hmt_budget_salary_sacrifice or hmt_budget_policy_costings or hmrc_salary_sacrifice_reform or hmrc_salary_sacrifice_relief" -> 6 passed
  • uv run pytest tests/test_arch_source_package.py tests/test_arch_suite.py -q -k "soi_table_1_2_facts or isc_census_pupil_count or voa_council_tax_band or cms_medicare_trustees_part_b_premium or treasury_eitc_outlay or new_us_source_counts or build_source_suite_supports_soi_table_1_4" -> 59 passed
  • uv run pytest tests/test_arch_source_package.py -q -> 227 passed
  • uv run pytest tests/test_arch_suite.py -q -> 13 passed
  • uv run ruff check arch/sources/cells.py arch/source_package.py arch/suite.py tests/test_arch_source_package.py tests/test_arch_suite.py -> passed
  • git diff --check -> passed
  • uv run pytest tests/test_arch_source_package.py -q -k "isc_census or isc_census_pupil" -> 2 passed
  • uv run arch validate-package packages/isc/census_2024 --year 2024 -> valid, 1 source record / 1 measure
  • uv run arch build-suite packages/isc/census_2024 --year 2024 --out /tmp/arch-suite-isc-census-2024-current --replace -> valid, 1 consumer fact with concept alignment
  • uv run pytest tests/test_arch_source_package.py -q -k "hhs_acf_tanf_caseload or hhs_acf_liheap_profile or soi_historic_table_2_state_agi or new_us_source_counts" -> 55 passed
  • uv run pytest tests/test_arch_source_package.py::test_source_package_alias_builds_cms_medicare_state_payment_facts -q -> 1 passed
  • uv run arch build-suite soi-historic-table-2-state-agi-2022 --year 2022 --out /tmp/arch-suite-soi-historic-table-2-state-agi-2022 --replace -> valid, 918 facts
  • uv run arch build-suite hhs-acf-liheap-fy2024-national-profile --year 2024 --out /tmp/arch-suite-hhs-acf-liheap-fy2024-national-profile --replace -> valid, 1 fact
  • uv run arch build-suite hhs-acf-tanf-caseload-2024 --year 2024 --out /tmp/arch-suite-hhs-acf-tanf-caseload-2024 --replace -> valid, 58 facts
  • uv run arch build-suite soi-historic-table-2 --year 2022 --out /tmp/arch-suite-soi-historic-table-2-2022 --replace -> valid, 143 facts
  • uv run arch validate-package voa-council-tax-bands-2025 --year 2025 -> valid, 2,653 source records / 2,653 consumer facts after the local-authority extension
  • uv run arch build-suite voa-council-tax-bands-2025 --year 2025 --out /tmp/arch-suite-voa-council-tax-bands-2025-current --replace -> valid, 2,653 consumer facts, 0 agent acceptance errors
  • uv run arch validate-package scotgov-council-tax-bands-2025 --year 2025 -> valid, 9 source records
  • uv run arch build-suite scotgov-council-tax-bands-2025 --year 2025 --out /tmp/arch-suite-scotgov-council-tax-bands-2025-current --replace -> valid, 9 consumer facts, 0 agent acceptance errors

@MaxGhenis
Copy link
Copy Markdown
Contributor Author

Cycle pass update (2026-05-27):\n\n- Merged the stacked storage-config PR into this branch earlier.\n- Removed stale Cosilico package/env/schema references from this draft branch.\n- Redacted token-like values embedded in mirrored HTML source files and updated the affected source manifests/checksums.\n- Local validation passed: ruff check README.md db/supabase_client.py tests/test_supabase_client.py docs/architecture.md docs/repository-model.md, git diff --check, focused source-package redaction checks (16 passed, 1 skipped), and the broader source-package/suite/Supabase set (250 passed, 1 skipped).\n\nI am keeping this PR as draft for now. The remaining blocker is reviewability/repo hygiene: it still bundles core harness/schema work with about 155 MB of raw publisher files and generated-scale source-package YAML. Recommended next step is to split this into: (1) core harness/schema/tests/docs, (2) compact source package specs, and (3) raw artifact/R2 registration or fixtures, keeping large raw bytes/generated outputs out of the main merge path where possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant