|
| 1 | +# Arch Target Gap Queue |
| 2 | + |
| 3 | +The Arch target gap queue is a Microplex-side review tool. It compares a |
| 4 | +Microplex target profile to a queryable Arch target DB and emits rows that help |
| 5 | +humans or agents decide what Arch source work is missing. |
| 6 | + |
| 7 | +The queue does not make Arch own Microplex target selection. Profile membership, |
| 8 | +source aging, reconciliation, activation, and model-variable aliases remain in |
| 9 | +`microplex-us`. |
| 10 | + |
| 11 | +## Boundary Rules |
| 12 | + |
| 13 | +- Arch stores publisher/source facts with provenance, constraints, periods, |
| 14 | + geography, and source lineage. |
| 15 | +- Arch should not duplicate a source fact only because Microplex names a model |
| 16 | + variable differently. |
| 17 | +- Microplex adapters may map one Arch source fact into simulator-specific target |
| 18 | + semantics. For example, Arch |
| 19 | + `irs_soi.returns_with_income_tax_after_credits` can satisfy the |
| 20 | + PolicyEngine `income_tax_positive` count target because SOI Table 1.1 reports |
| 21 | + the count of returns with positive income tax after credits. |
| 22 | +- A gap row is an authoring hint, not proof that a source exists. |
| 23 | +- Rows marked as source-mapping review or deprioritized must be reviewed before |
| 24 | + assigning loader work to agents. |
| 25 | + |
| 26 | +## Categories |
| 27 | + |
| 28 | +`gap_category` is the high-level agent-readiness taxonomy: |
| 29 | + |
| 30 | +| Category | Meaning | Default action | |
| 31 | +| --- | --- | --- | |
| 32 | +| `covered` | An Arch target record already satisfies the target cell. | No task. | |
| 33 | +| `ready_primary_loader` | The expected publisher source and Arch variable shape are known, but the record is missing. | Assign source-loader/spec work. | |
| 34 | +| `ready_rollup_or_geography` | The Arch variable exists but not at the requested geography. | Add rollup/geography records or review source geography. | |
| 35 | +| `adapter_or_constraint_review` | The Arch variable exists at the geography, but filters or adapter matching do not cover the cell. | Review constraints and adapter mapping. | |
| 36 | +| `source_mapping_review` | The queue cannot identify a defensible source fact or Arch variable shape. | Human source-mapping review first. | |
| 37 | +| `survey_or_model_input_deprioritized` | The cell is currently treated as a survey/model-input proxy rather than a primary administrative source task. | Defer unless a primary source is identified. | |
| 38 | + |
| 39 | +`loader_status` is the lower-level diagnostic used to derive the category. Use |
| 40 | +`gap_category` for agent routing and `loader_status` for debugging why a cell |
| 41 | +landed there. |
| 42 | + |
| 43 | +## Current PolicyEngine Broad Profile Boundary |
| 44 | + |
| 45 | +The current Arch-backed PE broad profile coverage intentionally stops before |
| 46 | +survey-heavy or model-input cells such as rent, net worth, child support, |
| 47 | +medical-premium subcomponents, SPM expenses, and `ssn_card_type`. Those rows are |
| 48 | +not ready for automated source-loader agents under the primary-source-first |
| 49 | +policy. |
| 50 | + |
| 51 | +## Current Local Snapshot |
| 52 | + |
| 53 | +Snapshot date: 2026-05-19. |
| 54 | + |
| 55 | +Inputs: |
| 56 | + |
| 57 | +- `/Users/maxghenis/CosilicoAI/arch/arch/fixtures/consumer_facts.jsonl` |
| 58 | +- `/Users/maxghenis/CosilicoAI/arch/macro/targets.db` |
| 59 | + |
| 60 | +Command: |
| 61 | + |
| 62 | +```bash |
| 63 | +uv run microplex-us-arch-target-refresh \ |
| 64 | + --artifact-root /Users/maxghenis/CosilicoAI/arch \ |
| 65 | + --period 2024 \ |
| 66 | + --profile pe_native_broad \ |
| 67 | + --output-dir artifacts/arch-target-coverage |
| 68 | +``` |
| 69 | + |
| 70 | +Coverage: |
| 71 | + |
| 72 | +- 189 target cells in `pe_native_broad` |
| 73 | +- 138 covered |
| 74 | +- 51 uncovered |
| 75 | +- 73.0% coverage |
| 76 | +- national: 79 of 116 covered |
| 77 | +- state: 59 of 73 covered |
| 78 | + |
| 79 | +Gap categories: |
| 80 | + |
| 81 | +| Category | Rows | |
| 82 | +| --- | ---: | |
| 83 | +| `source_mapping_review` | 26 | |
| 84 | +| `survey_or_model_input_deprioritized` | 12 | |
| 85 | +| `adapter_or_constraint_review` | 10 | |
| 86 | +| `ready_rollup_or_geography` | 3 | |
| 87 | + |
| 88 | +Generated outputs: |
| 89 | + |
| 90 | +- `artifacts/arch-target-coverage/pe_native_broad_2024_coverage.json` |
| 91 | +- `artifacts/arch-target-coverage/pe_native_broad_2024_gaps.json` |
| 92 | +- `artifacts/arch-target-coverage/pe_native_broad_2024_gaps.csv` |
| 93 | +- `artifacts/arch-target-coverage/pe_native_broad_2024_summary.md` |
| 94 | + |
| 95 | +Remaining work is concentrated in: |
| 96 | + |
| 97 | +- source-mapping review for the newly expanded PE parity cells, especially |
| 98 | + domains whose expected Arch concept is not yet encoded in the gap taxonomy |
| 99 | +- adapter or constraint review where Arch has the variable at the right |
| 100 | + geography but the Microplex adapter does not yet match the PE target cell |
| 101 | +- a small rollup/geography queue for variables loaded in Arch but not at the |
| 102 | + requested national or state target geography |
| 103 | +- survey/model-input proxy cells that remain deprioritized until a primary |
| 104 | + publisher source is identified |
0 commit comments