Skip to content
This repository was archived by the owner on Jun 14, 2026. It is now read-only.

Commit 343830a

Browse files
committed
Add Arch-backed PE target parity adapters
1 parent 6d53b24 commit 343830a

28 files changed

Lines changed: 19466 additions & 116 deletions

docs/arch-target-gap-queue.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Arch Target Gap Queue
2+
3+
The Arch target gap queue is a Microplex-side review tool. It compares a
4+
Microplex target profile to a queryable Arch target DB and emits rows that help
5+
humans or agents decide what Arch source work is missing.
6+
7+
The queue does not make Arch own Microplex target selection. Profile membership,
8+
source aging, reconciliation, activation, and model-variable aliases remain in
9+
`microplex-us`.
10+
11+
## Boundary Rules
12+
13+
- Arch stores publisher/source facts with provenance, constraints, periods,
14+
geography, and source lineage.
15+
- Arch should not duplicate a source fact only because Microplex names a model
16+
variable differently.
17+
- Microplex adapters may map one Arch source fact into simulator-specific target
18+
semantics. For example, Arch
19+
`irs_soi.returns_with_income_tax_after_credits` can satisfy the
20+
PolicyEngine `income_tax_positive` count target because SOI Table 1.1 reports
21+
the count of returns with positive income tax after credits.
22+
- A gap row is an authoring hint, not proof that a source exists.
23+
- Rows marked as source-mapping review or deprioritized must be reviewed before
24+
assigning loader work to agents.
25+
26+
## Categories
27+
28+
`gap_category` is the high-level agent-readiness taxonomy:
29+
30+
| Category | Meaning | Default action |
31+
| --- | --- | --- |
32+
| `covered` | An Arch target record already satisfies the target cell. | No task. |
33+
| `ready_primary_loader` | The expected publisher source and Arch variable shape are known, but the record is missing. | Assign source-loader/spec work. |
34+
| `ready_rollup_or_geography` | The Arch variable exists but not at the requested geography. | Add rollup/geography records or review source geography. |
35+
| `adapter_or_constraint_review` | The Arch variable exists at the geography, but filters or adapter matching do not cover the cell. | Review constraints and adapter mapping. |
36+
| `source_mapping_review` | The queue cannot identify a defensible source fact or Arch variable shape. | Human source-mapping review first. |
37+
| `survey_or_model_input_deprioritized` | The cell is currently treated as a survey/model-input proxy rather than a primary administrative source task. | Defer unless a primary source is identified. |
38+
39+
`loader_status` is the lower-level diagnostic used to derive the category. Use
40+
`gap_category` for agent routing and `loader_status` for debugging why a cell
41+
landed there.
42+
43+
## Current PolicyEngine Broad Profile Boundary
44+
45+
The current Arch-backed PE broad profile coverage intentionally stops before
46+
survey-heavy or model-input cells such as rent, net worth, child support,
47+
medical-premium subcomponents, SPM expenses, and `ssn_card_type`. Those rows are
48+
not ready for automated source-loader agents under the primary-source-first
49+
policy.
50+
51+
## Current Local Snapshot
52+
53+
Snapshot date: 2026-05-19.
54+
55+
Inputs:
56+
57+
- `/Users/maxghenis/CosilicoAI/arch/arch/fixtures/consumer_facts.jsonl`
58+
- `/Users/maxghenis/CosilicoAI/arch/macro/targets.db`
59+
60+
Command:
61+
62+
```bash
63+
uv run microplex-us-arch-target-refresh \
64+
--artifact-root /Users/maxghenis/CosilicoAI/arch \
65+
--period 2024 \
66+
--profile pe_native_broad \
67+
--output-dir artifacts/arch-target-coverage
68+
```
69+
70+
Coverage:
71+
72+
- 189 target cells in `pe_native_broad`
73+
- 138 covered
74+
- 51 uncovered
75+
- 73.0% coverage
76+
- national: 79 of 116 covered
77+
- state: 59 of 73 covered
78+
79+
Gap categories:
80+
81+
| Category | Rows |
82+
| --- | ---: |
83+
| `source_mapping_review` | 26 |
84+
| `survey_or_model_input_deprioritized` | 12 |
85+
| `adapter_or_constraint_review` | 10 |
86+
| `ready_rollup_or_geography` | 3 |
87+
88+
Generated outputs:
89+
90+
- `artifacts/arch-target-coverage/pe_native_broad_2024_coverage.json`
91+
- `artifacts/arch-target-coverage/pe_native_broad_2024_gaps.json`
92+
- `artifacts/arch-target-coverage/pe_native_broad_2024_gaps.csv`
93+
- `artifacts/arch-target-coverage/pe_native_broad_2024_summary.md`
94+
95+
Remaining work is concentrated in:
96+
97+
- source-mapping review for the newly expanded PE parity cells, especially
98+
domains whose expected Arch concept is not yet encoded in the gap taxonomy
99+
- adapter or constraint review where Arch has the variable at the right
100+
geography but the Microplex adapter does not yet match the PE target cell
101+
- a small rollup/geography queue for variables loaded in Arch but not at the
102+
requested national or state target geography
103+
- survey/model-input proxy cells that remain deprioritized until a primary
104+
publisher source is identified

pyproject.toml

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@ authors = [
1313
]
1414
requires-python = ">=3.13"
1515
dependencies = [
16-
"microplex[calibrate]",
16+
"microplex[calibrate]>=0.2.0",
17+
"microunit>=0.1.0",
1718
"duckdb>=1.2",
1819
"requests>=2.31",
1920
]
@@ -23,20 +24,34 @@ dev = [
2324
"pytest>=7.0",
2425
"ruff>=0.1",
2526
]
27+
r2 = [
28+
"boto3>=1.34",
29+
]
2630
policyengine = [
2731
"microimpute==1.15.1 ; python_full_version >= '3.12' and python_full_version < '3.15'",
2832
"policyengine-us==1.587.0; python_version >= '3.11' and python_version < '3.15'",
33+
"spm-calculator>=0.3.1",
2934
]
3035

3136
[project.urls]
3237
Repository = "https://github.com/CosilicoAI/microplex-us"
3338

3439
[project.scripts]
40+
microplex-us-arch-target-coverage = "microplex_us.targets.arch:main_coverage"
41+
microplex-us-arch-target-gaps = "microplex_us.targets.arch:main_gaps"
42+
microplex-us-arch-target-parity = "microplex_us.targets.arch:main_parity"
43+
microplex-us-arch-target-refresh = "microplex_us.targets.arch:main_refresh"
44+
microplex-us-arch-target-smoke = "microplex_us.targets.arch:main_smoke"
3545
microplex-us-build-aca-ptc-multipliers = "microplex_us.targets.aca_ptc:main"
3646
microplex-us-backfill-pe-native-audit = "microplex_us.pipelines.backfill_pe_native_audit:main"
3747
microplex-us-backfill-pe-native-scores = "microplex_us.pipelines.backfill_pe_native_scores:main"
3848
microplex-us-check-site-snapshot = "microplex_us.pipelines.check_site_snapshot:main"
49+
microplex-us-pe-dataset-readiness = "microplex_us.pipelines.pe_us_dataset_readiness:main"
50+
microplex-us-dashboard = "microplex_us.pipelines.dashboard:main"
51+
microplex-us-pe-native-calibration-benchmark = "microplex_us.pipelines.pe_native_calibration_benchmark:main"
3952
microplex-us-pe-native-target-diagnostics = "microplex_us.pipelines.pe_native_scores:main_target_diagnostics"
53+
microplex-us-r2-archive-artifact = "microplex_us.pipelines.r2_artifacts:main"
54+
microplex-us-reweight-cd-age-targets = "microplex_us.pipelines.cd_age_reweighting:main"
4055
microplex-us-score-pe-native-loss = "microplex_us.pipelines.pe_native_scores:main"
4156
microplex-us-version-bump-benchmark = "microplex_us.pipelines.version_benchmark:main"
4257

@@ -69,3 +84,4 @@ ignore = [
6984

7085
[tool.uv.sources]
7186
microplex = { path = "../microplex", editable = true }
87+
microunit = { path = "../microunit", editable = true }

0 commit comments

Comments
 (0)