Skip to content

Commit 1ae870a

Browse files
authored
Merge pull request #9 from PolicyEngine/codex/arch-target-parity-coverage
Add Arch-backed PE target parity adapters
2 parents 1a9a1eb + 95e4a64 commit 1ae870a

30 files changed

Lines changed: 21381 additions & 122 deletions

.github/workflows/site-snapshot.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,13 @@ jobs:
2929
ref: main
3030
path: microplex
3131

32+
- name: Check out microunit
33+
uses: actions/checkout@v4
34+
with:
35+
repository: CosilicoAI/microunit
36+
ref: main
37+
path: microunit
38+
3239
- name: Set up Python
3340
uses: actions/setup-python@v5
3441
with:
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# ACA PTC Multiplier Source Choice
2+
3+
This records the first Microplex-US reconstruction of
4+
`policyengine-us-data`'s `aca_ptc_multipliers_2022_2024.csv` from Arch
5+
publisher-source consumer facts.
6+
7+
## Recipe
8+
9+
Inputs:
10+
11+
- KFF full-year average marketplace effectuated enrollment, 2022 and 2024
12+
- CMS 2022 OEP state-level average monthly APTC
13+
- CMS 2024 OEP state-level average monthly APTC
14+
- CMS full-year 2022 effectuated-enrollment workbook average monthly APTC
15+
16+
Source selection:
17+
18+
- `enroll_2022` and `enroll_2024`: KFF full-year effectuated enrollment
19+
- `aptc_2024`: CMS 2024 OEP average monthly APTC
20+
- `aptc_2022`: CMS 2022 OEP average monthly APTC where published, with CMS
21+
full-year 2022 average monthly APTC as fallback
22+
23+
Derived columns:
24+
25+
- `vol_mult = enroll_2024 / enroll_2022`
26+
- `val_mult = aptc_2024 / aptc_2022`
27+
- PE's state `tax_unit_count` factor uses `vol_mult`
28+
- PE's state `aca_ptc` amount factor uses `vol_mult * val_mult`
29+
30+
## Reproduction
31+
32+
Build the five Arch source-package suites, then run:
33+
34+
```bash
35+
uv run microplex-us-build-aca-ptc-multipliers \
36+
/tmp/mp-aca-ptc-arch-sources/kff-2022/consumer_facts.jsonl \
37+
/tmp/mp-aca-ptc-arch-sources/kff-2024/consumer_facts.jsonl \
38+
/tmp/mp-aca-ptc-arch-sources/cms-oep-2022/consumer_facts.jsonl \
39+
/tmp/mp-aca-ptc-arch-sources/cms-oep-2024/consumer_facts.jsonl \
40+
/tmp/mp-aca-ptc-arch-sources/cms-effectuated-2022/consumer_facts.jsonl \
41+
--out /tmp/mp-aca-ptc-arch-sources/aca_ptc_multipliers_2022_2024.csv
42+
```
43+
44+
The 2026-05-12 run wrote 51 rows. Compared with PE's incumbent
45+
`policyengine_us_data/storage/aca_ptc_multipliers_2022_2024.csv`:
46+
47+
- state set matches
48+
- `enroll_2022` matches for all 51 states
49+
- `enroll_2024` matches for all 51 states
50+
- `vol_mult` matches for all 51 states
51+
- `aptc_2024` matches for all 51 states
52+
- `aptc_2022` differs for 22 states
53+
- `val_mult` differs for the same 22 states
54+
55+
## PE Incumbent Provenance Trace
56+
57+
The local `policyengine-us-data` history does not contain a generator for the
58+
incumbent CSV. `git log --follow` shows the file first appearing at its current
59+
path in `8d2c49fa15a515e2379d1b4b5e2c1856a1d4ebe9` on 2026-02-11:
60+
`Add hierarchical uprating notebook, fix verification, move ACA PTC
61+
multipliers`. The commit adds
62+
`policyengine_us_data/storage/aca_ptc_multipliers_2022_2024.csv` directly, plus
63+
notebooks which document that ACA PTC factors are loaded from the CSV and
64+
described as CMS/KFF enrollment data. Those notebooks do not show row-level
65+
source derivation.
66+
67+
Spot checks against the raw CMS 2022 OEP state-level source support the
68+
Microplex-US source choice for the mismatching states where OEP publishes a
69+
number. For example, current Arch-selected OEP values are New Jersey `489`, New
70+
Mexico `460`, and Virginia `506`, matching the CMS OEP
71+
`APTC_Cnsmr_Avg_APTC` column. The PE incumbent has `504`, `534`, and `407` for
72+
those states, respectively. Nevada remains the explicit fallback case because
73+
the CMS 2022 OEP state-level file reports no Nevada average monthly APTC fact;
74+
Microplex-US uses the CMS full-year effectuated-enrollment value `429.75`.
75+
76+
## Reconciliation Queue
77+
78+
States not listed matched PE's incumbent CSV exactly. For listed states, the
79+
Microplex-US value is the Arch publisher-source value selected by the recipe
80+
above. Nevada is the known CMS full-year fallback case because the CMS 2022 OEP
81+
state-level source package has no Nevada average monthly APTC fact.
82+
83+
| State | PE aptc_2022 | Microplex-US aptc_2022 | PE val_mult | Microplex-US val_mult |
84+
| --- | ---: | ---: | ---: | ---: |
85+
| Nevada | 435 | 429.75 | 1.006896551724138 | 1.019197207678883 |
86+
| New Jersey | 504 | 489 | 1.0337301587301588 | 1.065439672801636 |
87+
| New Mexico | 534 | 460 | 1.0318352059925093 | 1.1978260869565218 |
88+
| New York | 364 | 363 | 1.25 | 1.2534435261707988 |
89+
| North Carolina | 583 | 579 | 0.9571183533447685 | 0.9637305699481865 |
90+
| North Dakota | 436 | 452 | 0.9931192660550459 | 0.9579646017699115 |
91+
| Ohio | 479 | 437 | 1.0396659707724425 | 1.139588100686499 |
92+
| Oklahoma | 577 | 558 | 0.9965337954939342 | 1.0304659498207884 |
93+
| Oregon | 503 | 489 | 1.0417495029821073 | 1.0715746421267893 |
94+
| Pennsylvania | 523 | 501 | 1.0133843212237095 | 1.0578842315369261 |
95+
| Rhode Island | 427 | 403 | 1.063231850117096 | 1.1265508684863523 |
96+
| South Carolina | 566 | 512 | 0.9770318021201413 | 1.080078125 |
97+
| South Dakota | 649 | 640 | 0.9414483821263482 | 0.9546875 |
98+
| Tennessee | 572 | 543 | 1.013986013986014 | 1.0681399631675874 |
99+
| Texas | 539 | 502 | 0.9944341372912802 | 1.0677290836653386 |
100+
| Utah | 385 | 370 | 1.0935064935064935 | 1.1378378378378378 |
101+
| Vermont | 620 | 566 | 1.132258064516129 | 1.2402826855123674 |
102+
| Virginia | 407 | 506 | 0.995085995085995 | 0.8003952569169961 |
103+
| Washington | 438 | 437 | 1.0342465753424657 | 1.036613272311213 |
104+
| West Virginia | 1057 | 1002 | 0.97918637653737 | 1.032934131736527 |
105+
| Wisconsin | 562 | 530 | 1.0177935943060499 | 1.079245283018868 |
106+
| Wyoming | 873 | 812 | 0.9885452462772051 | 1.062807881773399 |
107+
108+
Open reconciliation decision:
109+
110+
- Treat the Microplex-US output as the publisher-source reconstruction.
111+
- Treat PE byte parity as a separate legacy-compatibility target. Do not add
112+
overrides unless a row-level legacy source or intentional source-choice table
113+
is supplied.

docs/arch-target-gap-queue.md

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# Arch Target Gap Queue
2+
3+
The Arch target gap queue is a Microplex-side review tool. It compares a
4+
Microplex target profile to a queryable Arch target DB and emits rows that help
5+
humans or agents decide what Arch source work is missing.
6+
7+
The queue does not make Arch own Microplex target selection. Profile membership,
8+
source aging, reconciliation, activation, and model-variable aliases remain in
9+
`microplex-us`.
10+
11+
## Boundary Rules
12+
13+
- Arch stores publisher/source facts with provenance, constraints, periods,
14+
geography, and source lineage.
15+
- Arch should not duplicate a source fact only because Microplex names a model
16+
variable differently.
17+
- Microplex adapters may map one Arch source fact into simulator-specific target
18+
semantics. For example, Arch
19+
`irs_soi.returns_with_income_tax_after_credits` can satisfy the
20+
PolicyEngine `income_tax_positive` count target because SOI Table 1.1 reports
21+
the count of returns with positive income tax after credits.
22+
- A gap row is an authoring hint, not proof that a source exists.
23+
- Rows marked as source-mapping review or deprioritized must be reviewed before
24+
assigning loader work to agents.
25+
26+
## Categories
27+
28+
`gap_category` is the high-level agent-readiness taxonomy:
29+
30+
| Category | Meaning | Default action |
31+
| --- | --- | --- |
32+
| `covered` | An Arch target record already satisfies the target cell. | No task. |
33+
| `ready_primary_loader` | The expected publisher source and Arch variable shape are known, but the record is missing. | Assign source-loader/spec work. |
34+
| `ready_rollup_or_geography` | The Arch variable exists but not at the requested geography. | Add rollup/geography records or review source geography. |
35+
| `adapter_or_constraint_review` | The Arch variable exists at the geography, but filters or adapter matching do not cover the cell. | Review constraints and adapter mapping. |
36+
| `source_mapping_review` | The queue cannot identify a defensible source fact or Arch variable shape. | Human source-mapping review first. |
37+
| `survey_or_model_input_deprioritized` | The cell is currently treated as a survey/model-input proxy rather than a primary administrative source task. | Defer unless a primary source is identified. |
38+
39+
`loader_status` is the lower-level diagnostic used to derive the category. Use
40+
`gap_category` for agent routing and `loader_status` for debugging why a cell
41+
landed there.
42+
43+
## Current PolicyEngine Profile Boundary
44+
45+
`pe_native_broad` keeps the raw PolicyEngine parity surface intact. It includes
46+
all currently tracked broad target cells, including survey/model-input rows and
47+
cells whose publisher-source semantics still need review.
48+
49+
`pe_native_broad_source_backed` is the Arch-backed calibration/profile boundary.
50+
It excludes only cells with explicit reasons in
51+
`src/microplex_us/policyengine/target_profiles.py`, such as:
52+
53+
- SOI multi-domain cells that would require joint AGI, filing status, and
54+
positive income-tax-before-credits facts not currently published by the loaded
55+
SOI packages
56+
- survey-heavy or model-input cells such as rent, child support,
57+
non-Part-B medical premium/expense components, SPM capped expenses, and
58+
`ssn_card_type`
59+
- source-near but non-equivalent rows such as `childcare_expenses`, where IRS
60+
credit expenses and W-2 dependent-care benefits are narrower tax concepts
61+
- pregnancy stock by state, where live births are a flow rather than a direct
62+
source fact for the PolicyEngine target
63+
64+
## Current Local Snapshot
65+
66+
Snapshot date: 2026-05-22.
67+
68+
Inputs:
69+
70+
- `/Users/maxghenis/CosilicoAI/arch/arch/fixtures/consumer_facts.jsonl`
71+
- `/Users/maxghenis/CosilicoAI/arch/macro/targets.db`
72+
- `/tmp/arch-suite-hhs-acf-tanf-caseload-2024/consumer_facts.jsonl`
73+
- `/tmp/arch-suite-soi-historic-table-2-2022/consumer_facts.jsonl`
74+
- `/tmp/arch-suite-hhs-acf-liheap-fy2024-national-profile/consumer_facts.jsonl`
75+
- `/tmp/arch-suite-soi-historic-table-2-state-agi-2022/consumer_facts.jsonl`
76+
- `/tmp/arch-suite-soi-w2-statistics-2020/consumer_facts.jsonl`
77+
- `/tmp/arch-suite-soi-table-1-4-2023/consumer_facts.jsonl`
78+
- `/tmp/arch-suite-federal-reserve-z1-household-net-worth/consumer_facts.jsonl`
79+
- `/tmp/arch-suite-cms-medicare-trustees-report-2025-part-b-premium-income/consumer_facts.jsonl`
80+
81+
Command:
82+
83+
```bash
84+
uv run --extra policyengine microplex-us-arch-target-refresh \
85+
--arch-targets-db /Users/maxghenis/CosilicoAI/arch/arch/fixtures/consumer_facts.jsonl \
86+
--arch-targets-db /Users/maxghenis/CosilicoAI/arch/macro/targets.db \
87+
--arch-targets-db /tmp/arch-suite-hhs-acf-tanf-caseload-2024/consumer_facts.jsonl \
88+
--arch-targets-db /tmp/arch-suite-soi-historic-table-2-2022/consumer_facts.jsonl \
89+
--arch-targets-db /tmp/arch-suite-hhs-acf-liheap-fy2024-national-profile/consumer_facts.jsonl \
90+
--arch-targets-db /tmp/arch-suite-soi-historic-table-2-state-agi-2022/consumer_facts.jsonl \
91+
--arch-targets-db /tmp/arch-suite-soi-w2-statistics-2020/consumer_facts.jsonl \
92+
--arch-targets-db /tmp/arch-suite-soi-table-1-4-2023/consumer_facts.jsonl \
93+
--arch-targets-db /tmp/arch-suite-federal-reserve-z1-household-net-worth/consumer_facts.jsonl \
94+
--arch-targets-db /tmp/arch-suite-cms-medicare-trustees-report-2025-part-b-premium-income/consumer_facts.jsonl \
95+
--period 2024 \
96+
--profile pe_native_broad_source_backed \
97+
--output-dir artifacts/arch-target-coverage-source-backed
98+
```
99+
100+
Coverage:
101+
102+
- 174 target cells in `pe_native_broad_source_backed`
103+
- 174 covered
104+
- 0 uncovered
105+
- 100.0% coverage
106+
107+
The raw `pe_native_broad` profile is at 174 of 189 covered with 15 explicitly
108+
reviewed rows outside the source-backed boundary. Federal Reserve Z.1 household
109+
net worth and CMS Medicare Trustees Report Part B premium income are now
110+
source-backed.
111+
112+
| Category | Rows |
113+
| --- | ---: |
114+
| `adapter_or_constraint_review` | 3 |
115+
| `source_mapping_review` | 2 |
116+
| `survey_or_model_input_deprioritized` | 10 |
117+
118+
Generated outputs:
119+
120+
- `artifacts/arch-target-coverage-source-backed/pe_native_broad_source_backed_2024_coverage.json`
121+
- `artifacts/arch-target-coverage-source-backed/pe_native_broad_source_backed_2024_gaps.json`
122+
- `artifacts/arch-target-coverage-source-backed/pe_native_broad_source_backed_2024_gaps.csv`
123+
- `artifacts/arch-target-coverage-source-backed/pe_native_broad_source_backed_2024_summary.md`
124+
- `artifacts/arch-target-coverage-broad-plus/pe_native_broad_2024_coverage.json`
125+
- `artifacts/arch-target-coverage-broad-plus/pe_native_broad_2024_gaps.json`
126+
- `artifacts/arch-target-coverage-broad-plus/pe_native_broad_2024_gaps.csv`
127+
- `artifacts/arch-target-coverage-broad-plus/pe_native_broad_2024_summary.md`
128+
129+
Remaining work is concentrated in:
130+
131+
- the raw `pe_native_broad` cells excluded from the source-backed profile, if a
132+
future primary publisher source can support them without changing semantics
133+
- keeping the UK source-backed/raw boundary aligned with the same rule: leave
134+
raw PE target rows visible, and exclude only rows where source equivalence is
135+
not defensible

pyproject.toml

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ authors = [
1313
]
1414
requires-python = ">=3.13"
1515
dependencies = [
16-
"microplex[calibrate]",
16+
"microplex[calibrate] @ git+https://github.com/PolicyEngine/microplex.git@1e0627182f9df40aacd7043c96956c2895bf9d30",
1717
"duckdb>=1.2",
1818
"requests>=2.31",
1919
]
@@ -23,25 +23,43 @@ dev = [
2323
"pytest>=7.0",
2424
"ruff>=0.1",
2525
]
26+
r2 = [
27+
"boto3>=1.34",
28+
]
2629
policyengine = [
2730
"microimpute==1.15.1 ; python_full_version >= '3.12' and python_full_version < '3.15'",
2831
"policyengine-us==1.587.0; python_version >= '3.11' and python_version < '3.15'",
32+
"spm-calculator>=0.3.1",
2933
]
3034

3135
[project.urls]
3236
Repository = "https://github.com/PolicyEngine/microplex-us"
3337

3438
[project.scripts]
39+
microplex-us-arch-target-coverage = "microplex_us.targets.arch:main_coverage"
40+
microplex-us-arch-target-gaps = "microplex_us.targets.arch:main_gaps"
41+
microplex-us-arch-target-parity = "microplex_us.targets.arch:main_parity"
42+
microplex-us-arch-target-refresh = "microplex_us.targets.arch:main_refresh"
43+
microplex-us-arch-target-smoke = "microplex_us.targets.arch:main_smoke"
44+
microplex-us-build-aca-ptc-multipliers = "microplex_us.targets.aca_ptc:main"
3545
microplex-us-backfill-pe-native-audit = "microplex_us.pipelines.backfill_pe_native_audit:main"
3646
microplex-us-backfill-pe-native-scores = "microplex_us.pipelines.backfill_pe_native_scores:main"
3747
microplex-us-check-site-snapshot = "microplex_us.pipelines.check_site_snapshot:main"
48+
microplex-us-pe-dataset-readiness = "microplex_us.pipelines.pe_us_dataset_readiness:main"
49+
microplex-us-dashboard = "microplex_us.pipelines.dashboard:main"
50+
microplex-us-pe-native-calibration-benchmark = "microplex_us.pipelines.pe_native_calibration_benchmark:main"
3851
microplex-us-pe-native-target-diagnostics = "microplex_us.pipelines.pe_native_scores:main_target_diagnostics"
52+
microplex-us-r2-archive-artifact = "microplex_us.pipelines.r2_artifacts:main"
53+
microplex-us-reweight-cd-age-targets = "microplex_us.pipelines.cd_age_reweighting:main"
3954
microplex-us-score-pe-native-loss = "microplex_us.pipelines.pe_native_scores:main"
4055
microplex-us-version-bump-benchmark = "microplex_us.pipelines.version_benchmark:main"
4156

4257
[tool.hatch.build.targets.wheel]
4358
packages = ["src/microplex_us"]
4459

60+
[tool.hatch.metadata]
61+
allow-direct-references = true
62+
4563
[tool.hatch.build.targets.wheel.force-include]
4664
"src/microplex_us/pipelines/pe_native_scores.py" = "microplex_us/pipelines/pe_native_scores.py"
4765

@@ -65,6 +83,3 @@ ignore = [
6583
[tool.ruff.lint.per-file-ignores]
6684
"examples/**/*.py" = ["E402"]
6785
"tests/**/*.py" = ["E402", "N802"]
68-
69-
[tool.uv.sources]
70-
microplex = { path = "../microplex", editable = true }

0 commit comments

Comments
 (0)