Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
976e616
Add target DB diagnostics dashboard
MaxGhenis Apr 26, 2026
6d53b24
Add ACA PTC multiplier source builder
MaxGhenis May 13, 2026
343830a
Add Arch-backed PE target parity adapters
MaxGhenis May 22, 2026
7e22bb9
Map SOI retirement contribution targets
MaxGhenis May 22, 2026
b4140de
Add source-backed PE target profile
MaxGhenis May 22, 2026
9205d25
Merge remote-tracking branch 'origin/main' into HEAD
MaxGhenis May 22, 2026
e25df05
Fix site snapshot microunit checkout
MaxGhenis May 22, 2026
12d2e8d
Cover wealth and Part B Arch targets
MaxGhenis May 22, 2026
a87212c
Cover local SOI Arch target rows
MaxGhenis May 27, 2026
78672e5
Merge remote-tracking branch 'origin/main' into HEAD
MaxGhenis May 27, 2026
cc55ad4
Map state broad SOI Arch targets (#21)
MaxGhenis May 28, 2026
54acbd9
Treat SOI medical dental amounts as itemized targets (#22)
MaxGhenis May 28, 2026
621bfae
Map Arch QBI consumer facts (#23)
MaxGhenis May 28, 2026
6c568a5
Map Arch rental royalty consumer facts
MaxGhenis May 28, 2026
7c3edb5
Merge pull request #24 from PolicyEngine/codex/arch-rental-consumer-f…
MaxGhenis May 28, 2026
4623a52
Map Arch child tax credit facts
MaxGhenis May 28, 2026
33c1cd2
Merge pull request #25 from PolicyEngine/codex/arch-child-tax-credit-…
MaxGhenis May 28, 2026
17342f2
Accept Arch EITC child-count amount targets
MaxGhenis May 28, 2026
be82c64
Merge pull request #26 from PolicyEngine/codex/microplex-arch-eitc-ch…
MaxGhenis May 28, 2026
95e4a64
Pin microplex core dependency to PolicyEngine ref
MaxGhenis May 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/workflows/site-snapshot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,13 @@ jobs:
ref: main
path: microplex

- name: Check out microunit
uses: actions/checkout@v4
with:
repository: CosilicoAI/microunit
ref: main
path: microunit

- name: Set up Python
uses: actions/setup-python@v5
with:
Expand Down
113 changes: 113 additions & 0 deletions docs/aca-ptc-multiplier-source-choice.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# ACA PTC Multiplier Source Choice

This records the first Microplex-US reconstruction of
`policyengine-us-data`'s `aca_ptc_multipliers_2022_2024.csv` from Arch
publisher-source consumer facts.

## Recipe

Inputs:

- KFF full-year average marketplace effectuated enrollment, 2022 and 2024
- CMS 2022 OEP state-level average monthly APTC
- CMS 2024 OEP state-level average monthly APTC
- CMS full-year 2022 effectuated-enrollment workbook average monthly APTC

Source selection:

- `enroll_2022` and `enroll_2024`: KFF full-year effectuated enrollment
- `aptc_2024`: CMS 2024 OEP average monthly APTC
- `aptc_2022`: CMS 2022 OEP average monthly APTC where published, with CMS
full-year 2022 average monthly APTC as fallback

Derived columns:

- `vol_mult = enroll_2024 / enroll_2022`
- `val_mult = aptc_2024 / aptc_2022`
- PE's state `tax_unit_count` factor uses `vol_mult`
- PE's state `aca_ptc` amount factor uses `vol_mult * val_mult`

## Reproduction

Build the five Arch source-package suites, then run:

```bash
uv run microplex-us-build-aca-ptc-multipliers \
/tmp/mp-aca-ptc-arch-sources/kff-2022/consumer_facts.jsonl \
/tmp/mp-aca-ptc-arch-sources/kff-2024/consumer_facts.jsonl \
/tmp/mp-aca-ptc-arch-sources/cms-oep-2022/consumer_facts.jsonl \
/tmp/mp-aca-ptc-arch-sources/cms-oep-2024/consumer_facts.jsonl \
/tmp/mp-aca-ptc-arch-sources/cms-effectuated-2022/consumer_facts.jsonl \
--out /tmp/mp-aca-ptc-arch-sources/aca_ptc_multipliers_2022_2024.csv
```

The 2026-05-12 run wrote 51 rows. Compared with PE's incumbent
`policyengine_us_data/storage/aca_ptc_multipliers_2022_2024.csv`:

- state set matches
- `enroll_2022` matches for all 51 states
- `enroll_2024` matches for all 51 states
- `vol_mult` matches for all 51 states
- `aptc_2024` matches for all 51 states
- `aptc_2022` differs for 22 states
- `val_mult` differs for the same 22 states

## PE Incumbent Provenance Trace

The local `policyengine-us-data` history does not contain a generator for the
incumbent CSV. `git log --follow` shows the file first appearing at its current
path in `8d2c49fa15a515e2379d1b4b5e2c1856a1d4ebe9` on 2026-02-11:
`Add hierarchical uprating notebook, fix verification, move ACA PTC
multipliers`. The commit adds
`policyengine_us_data/storage/aca_ptc_multipliers_2022_2024.csv` directly, plus
notebooks which document that ACA PTC factors are loaded from the CSV and
described as CMS/KFF enrollment data. Those notebooks do not show row-level
source derivation.

Spot checks against the raw CMS 2022 OEP state-level source support the
Microplex-US source choice for the mismatching states where OEP publishes a
number. For example, current Arch-selected OEP values are New Jersey `489`, New
Mexico `460`, and Virginia `506`, matching the CMS OEP
`APTC_Cnsmr_Avg_APTC` column. The PE incumbent has `504`, `534`, and `407` for
those states, respectively. Nevada remains the explicit fallback case because
the CMS 2022 OEP state-level file reports no Nevada average monthly APTC fact;
Microplex-US uses the CMS full-year effectuated-enrollment value `429.75`.

## Reconciliation Queue

States not listed matched PE's incumbent CSV exactly. For listed states, the
Microplex-US value is the Arch publisher-source value selected by the recipe
above. Nevada is the known CMS full-year fallback case because the CMS 2022 OEP
state-level source package has no Nevada average monthly APTC fact.

| State | PE aptc_2022 | Microplex-US aptc_2022 | PE val_mult | Microplex-US val_mult |
| --- | ---: | ---: | ---: | ---: |
| Nevada | 435 | 429.75 | 1.006896551724138 | 1.019197207678883 |
| New Jersey | 504 | 489 | 1.0337301587301588 | 1.065439672801636 |
| New Mexico | 534 | 460 | 1.0318352059925093 | 1.1978260869565218 |
| New York | 364 | 363 | 1.25 | 1.2534435261707988 |
| North Carolina | 583 | 579 | 0.9571183533447685 | 0.9637305699481865 |
| North Dakota | 436 | 452 | 0.9931192660550459 | 0.9579646017699115 |
| Ohio | 479 | 437 | 1.0396659707724425 | 1.139588100686499 |
| Oklahoma | 577 | 558 | 0.9965337954939342 | 1.0304659498207884 |
| Oregon | 503 | 489 | 1.0417495029821073 | 1.0715746421267893 |
| Pennsylvania | 523 | 501 | 1.0133843212237095 | 1.0578842315369261 |
| Rhode Island | 427 | 403 | 1.063231850117096 | 1.1265508684863523 |
| South Carolina | 566 | 512 | 0.9770318021201413 | 1.080078125 |
| South Dakota | 649 | 640 | 0.9414483821263482 | 0.9546875 |
| Tennessee | 572 | 543 | 1.013986013986014 | 1.0681399631675874 |
| Texas | 539 | 502 | 0.9944341372912802 | 1.0677290836653386 |
| Utah | 385 | 370 | 1.0935064935064935 | 1.1378378378378378 |
| Vermont | 620 | 566 | 1.132258064516129 | 1.2402826855123674 |
| Virginia | 407 | 506 | 0.995085995085995 | 0.8003952569169961 |
| Washington | 438 | 437 | 1.0342465753424657 | 1.036613272311213 |
| West Virginia | 1057 | 1002 | 0.97918637653737 | 1.032934131736527 |
| Wisconsin | 562 | 530 | 1.0177935943060499 | 1.079245283018868 |
| Wyoming | 873 | 812 | 0.9885452462772051 | 1.062807881773399 |

Open reconciliation decision:

- Treat the Microplex-US output as the publisher-source reconstruction.
- Treat PE byte parity as a separate legacy-compatibility target. Do not add
overrides unless a row-level legacy source or intentional source-choice table
is supplied.
135 changes: 135 additions & 0 deletions docs/arch-target-gap-queue.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Arch Target Gap Queue

The Arch target gap queue is a Microplex-side review tool. It compares a
Microplex target profile to a queryable Arch target DB and emits rows that help
humans or agents decide what Arch source work is missing.

The queue does not make Arch own Microplex target selection. Profile membership,
source aging, reconciliation, activation, and model-variable aliases remain in
`microplex-us`.

## Boundary Rules

- Arch stores publisher/source facts with provenance, constraints, periods,
geography, and source lineage.
- Arch should not duplicate a source fact only because Microplex names a model
variable differently.
- Microplex adapters may map one Arch source fact into simulator-specific target
semantics. For example, Arch
`irs_soi.returns_with_income_tax_after_credits` can satisfy the
PolicyEngine `income_tax_positive` count target because SOI Table 1.1 reports
the count of returns with positive income tax after credits.
- A gap row is an authoring hint, not proof that a source exists.
- Rows marked as source-mapping review or deprioritized must be reviewed before
assigning loader work to agents.

## Categories

`gap_category` is the high-level agent-readiness taxonomy:

| Category | Meaning | Default action |
| --- | --- | --- |
| `covered` | An Arch target record already satisfies the target cell. | No task. |
| `ready_primary_loader` | The expected publisher source and Arch variable shape are known, but the record is missing. | Assign source-loader/spec work. |
| `ready_rollup_or_geography` | The Arch variable exists but not at the requested geography. | Add rollup/geography records or review source geography. |
| `adapter_or_constraint_review` | The Arch variable exists at the geography, but filters or adapter matching do not cover the cell. | Review constraints and adapter mapping. |
| `source_mapping_review` | The queue cannot identify a defensible source fact or Arch variable shape. | Human source-mapping review first. |
| `survey_or_model_input_deprioritized` | The cell is currently treated as a survey/model-input proxy rather than a primary administrative source task. | Defer unless a primary source is identified. |

`loader_status` is the lower-level diagnostic used to derive the category. Use
`gap_category` for agent routing and `loader_status` for debugging why a cell
landed there.

## Current PolicyEngine Profile Boundary

`pe_native_broad` keeps the raw PolicyEngine parity surface intact. It includes
all currently tracked broad target cells, including survey/model-input rows and
cells whose publisher-source semantics still need review.

`pe_native_broad_source_backed` is the Arch-backed calibration/profile boundary.
It excludes only cells with explicit reasons in
`src/microplex_us/policyengine/target_profiles.py`, such as:

- SOI multi-domain cells that would require joint AGI, filing status, and
positive income-tax-before-credits facts not currently published by the loaded
SOI packages
- survey-heavy or model-input cells such as rent, child support,
non-Part-B medical premium/expense components, SPM capped expenses, and
`ssn_card_type`
- source-near but non-equivalent rows such as `childcare_expenses`, where IRS
credit expenses and W-2 dependent-care benefits are narrower tax concepts
- pregnancy stock by state, where live births are a flow rather than a direct
source fact for the PolicyEngine target

## Current Local Snapshot

Snapshot date: 2026-05-22.

Inputs:

- `/Users/maxghenis/CosilicoAI/arch/arch/fixtures/consumer_facts.jsonl`
- `/Users/maxghenis/CosilicoAI/arch/macro/targets.db`
- `/tmp/arch-suite-hhs-acf-tanf-caseload-2024/consumer_facts.jsonl`
- `/tmp/arch-suite-soi-historic-table-2-2022/consumer_facts.jsonl`
- `/tmp/arch-suite-hhs-acf-liheap-fy2024-national-profile/consumer_facts.jsonl`
- `/tmp/arch-suite-soi-historic-table-2-state-agi-2022/consumer_facts.jsonl`
- `/tmp/arch-suite-soi-w2-statistics-2020/consumer_facts.jsonl`
- `/tmp/arch-suite-soi-table-1-4-2023/consumer_facts.jsonl`
- `/tmp/arch-suite-federal-reserve-z1-household-net-worth/consumer_facts.jsonl`
- `/tmp/arch-suite-cms-medicare-trustees-report-2025-part-b-premium-income/consumer_facts.jsonl`

Command:

```bash
uv run --extra policyengine microplex-us-arch-target-refresh \
--arch-targets-db /Users/maxghenis/CosilicoAI/arch/arch/fixtures/consumer_facts.jsonl \
--arch-targets-db /Users/maxghenis/CosilicoAI/arch/macro/targets.db \
--arch-targets-db /tmp/arch-suite-hhs-acf-tanf-caseload-2024/consumer_facts.jsonl \
--arch-targets-db /tmp/arch-suite-soi-historic-table-2-2022/consumer_facts.jsonl \
--arch-targets-db /tmp/arch-suite-hhs-acf-liheap-fy2024-national-profile/consumer_facts.jsonl \
--arch-targets-db /tmp/arch-suite-soi-historic-table-2-state-agi-2022/consumer_facts.jsonl \
--arch-targets-db /tmp/arch-suite-soi-w2-statistics-2020/consumer_facts.jsonl \
--arch-targets-db /tmp/arch-suite-soi-table-1-4-2023/consumer_facts.jsonl \
--arch-targets-db /tmp/arch-suite-federal-reserve-z1-household-net-worth/consumer_facts.jsonl \
--arch-targets-db /tmp/arch-suite-cms-medicare-trustees-report-2025-part-b-premium-income/consumer_facts.jsonl \
--period 2024 \
--profile pe_native_broad_source_backed \
--output-dir artifacts/arch-target-coverage-source-backed
```

Coverage:

- 174 target cells in `pe_native_broad_source_backed`
- 174 covered
- 0 uncovered
- 100.0% coverage

The raw `pe_native_broad` profile is at 174 of 189 covered with 15 explicitly
reviewed rows outside the source-backed boundary. Federal Reserve Z.1 household
net worth and CMS Medicare Trustees Report Part B premium income are now
source-backed.

| Category | Rows |
| --- | ---: |
| `adapter_or_constraint_review` | 3 |
| `source_mapping_review` | 2 |
| `survey_or_model_input_deprioritized` | 10 |

Generated outputs:

- `artifacts/arch-target-coverage-source-backed/pe_native_broad_source_backed_2024_coverage.json`
- `artifacts/arch-target-coverage-source-backed/pe_native_broad_source_backed_2024_gaps.json`
- `artifacts/arch-target-coverage-source-backed/pe_native_broad_source_backed_2024_gaps.csv`
- `artifacts/arch-target-coverage-source-backed/pe_native_broad_source_backed_2024_summary.md`
- `artifacts/arch-target-coverage-broad-plus/pe_native_broad_2024_coverage.json`
- `artifacts/arch-target-coverage-broad-plus/pe_native_broad_2024_gaps.json`
- `artifacts/arch-target-coverage-broad-plus/pe_native_broad_2024_gaps.csv`
- `artifacts/arch-target-coverage-broad-plus/pe_native_broad_2024_summary.md`

Remaining work is concentrated in:

- the raw `pe_native_broad` cells excluded from the source-backed profile, if a
future primary publisher source can support them without changing semantics
- keeping the UK source-backed/raw boundary aligned with the same rule: leave
raw PE target rows visible, and exclude only rows where source equivalence is
not defensible
23 changes: 19 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ authors = [
]
requires-python = ">=3.13"
dependencies = [
"microplex[calibrate]",
"microplex[calibrate] @ git+https://github.com/PolicyEngine/microplex.git@1e0627182f9df40aacd7043c96956c2895bf9d30",
"duckdb>=1.2",
"requests>=2.31",
]
Expand All @@ -23,25 +23,43 @@ dev = [
"pytest>=7.0",
"ruff>=0.1",
]
r2 = [
"boto3>=1.34",
]
policyengine = [
"microimpute==1.15.1 ; python_full_version >= '3.12' and python_full_version < '3.15'",
"policyengine-us==1.587.0; python_version >= '3.11' and python_version < '3.15'",
"spm-calculator>=0.3.1",
]

[project.urls]
Repository = "https://github.com/PolicyEngine/microplex-us"

[project.scripts]
microplex-us-arch-target-coverage = "microplex_us.targets.arch:main_coverage"
microplex-us-arch-target-gaps = "microplex_us.targets.arch:main_gaps"
microplex-us-arch-target-parity = "microplex_us.targets.arch:main_parity"
microplex-us-arch-target-refresh = "microplex_us.targets.arch:main_refresh"
microplex-us-arch-target-smoke = "microplex_us.targets.arch:main_smoke"
microplex-us-build-aca-ptc-multipliers = "microplex_us.targets.aca_ptc:main"
microplex-us-backfill-pe-native-audit = "microplex_us.pipelines.backfill_pe_native_audit:main"
microplex-us-backfill-pe-native-scores = "microplex_us.pipelines.backfill_pe_native_scores:main"
microplex-us-check-site-snapshot = "microplex_us.pipelines.check_site_snapshot:main"
microplex-us-pe-dataset-readiness = "microplex_us.pipelines.pe_us_dataset_readiness:main"
microplex-us-dashboard = "microplex_us.pipelines.dashboard:main"
microplex-us-pe-native-calibration-benchmark = "microplex_us.pipelines.pe_native_calibration_benchmark:main"
microplex-us-pe-native-target-diagnostics = "microplex_us.pipelines.pe_native_scores:main_target_diagnostics"
microplex-us-r2-archive-artifact = "microplex_us.pipelines.r2_artifacts:main"
microplex-us-reweight-cd-age-targets = "microplex_us.pipelines.cd_age_reweighting:main"
microplex-us-score-pe-native-loss = "microplex_us.pipelines.pe_native_scores:main"
microplex-us-version-bump-benchmark = "microplex_us.pipelines.version_benchmark:main"

[tool.hatch.build.targets.wheel]
packages = ["src/microplex_us"]

[tool.hatch.metadata]
allow-direct-references = true

[tool.hatch.build.targets.wheel.force-include]
"src/microplex_us/pipelines/pe_native_scores.py" = "microplex_us/pipelines/pe_native_scores.py"

Expand All @@ -65,6 +83,3 @@ ignore = [
[tool.ruff.lint.per-file-ignores]
"examples/**/*.py" = ["E402"]
"tests/**/*.py" = ["E402", "N802"]

[tool.uv.sources]
microplex = { path = "../microplex", editable = true }
Loading
Loading