|
| 1 | +# Imputation Conditioning Contract |
| 2 | + |
| 3 | +This document states the current execution rule for donor conditioning in |
| 4 | +`microplex-us`. |
| 5 | + |
| 6 | +It is meant to answer three questions: |
| 7 | + |
| 8 | +1. Which parts of donor conditioning are conceptually required? |
| 9 | +2. Which parts are still experimental tuning choices? |
| 10 | +3. Which artifact files should we read to evaluate those choices? |
| 11 | + |
| 12 | +## Core rule |
| 13 | + |
| 14 | +Keep three layers separate: |
| 15 | + |
| 16 | +1. Structural contract |
| 17 | + - what the donor block is trying to represent |
| 18 | + - which entity the block lives on |
| 19 | + - which variables are allowed to define support |
| 20 | +2. Predictor-surface choice |
| 21 | + - which compatible conditioning variables are actually used for one block |
| 22 | +3. Downstream evaluation |
| 23 | + - how the imputation choice propagates through synthesis, calibration, and |
| 24 | + the PolicyEngine oracle |
| 25 | + |
| 26 | +Those layers interact, but they are not the same decision. |
| 27 | + |
| 28 | +## Conceptually required structure |
| 29 | + |
| 30 | +These are not optional shortcuts. They are the current conceptual contract. |
| 31 | + |
| 32 | +- Donor integration is block-based, not one flat shared-variable imputer. |
| 33 | +- Each block has a native entity and an allowed conditioning-entity policy. |
| 34 | +- Zero-inflated positive variables should preserve support, not just totals. |
| 35 | +- Structural tax-unit roles matter. |
| 36 | + - `is_tax_unit_head` |
| 37 | + - `is_tax_unit_spouse` |
| 38 | + - `is_tax_unit_dependent` |
| 39 | + - `tax_unit_is_joint` |
| 40 | + - `tax_unit_count_dependents` |
| 41 | +- Variable semantics decide whether a quantity is atomic, derived, signed, |
| 42 | + zero-inflated, or share-like. |
| 43 | + |
| 44 | +This is the layer where we should encode ideas like "dependents are a distinct |
| 45 | +role in the tax-unit support process" or "dividend components should not be |
| 46 | +treated as unrelated continuous totals." |
| 47 | + |
| 48 | +## Current production modes |
| 49 | + |
| 50 | +The current donor-conditioning modes are: |
| 51 | + |
| 52 | +- `all_shared` |
| 53 | + - use every compatible shared predictor |
| 54 | +- `top_correlated` |
| 55 | + - score compatible shared predictors and keep the strongest subset |
| 56 | +- `pe_prespecified` |
| 57 | + - use a PE-style structural predictor backbone declared in variable semantics |
| 58 | + - optionally admit a narrow supplemental shared set from the *actual* |
| 59 | + compatible overlap |
| 60 | + |
| 61 | +For the current PUF IRS tax-leaf family, PE alignment means the structural-only |
| 62 | +path. The local `policyengine-us-data` |
| 63 | +`policyengine_us_data/calibration/puf_impute.py` implementation trains the PUF |
| 64 | +clone QRF on demographic / tax-unit-role predictors only, and the PUF source |
| 65 | +capability policy intentionally blocks derived convenience columns like |
| 66 | +`income`, `employment_status`, and synthetic `state_fips` from entering donor |
| 67 | +conditioning. |
| 68 | + |
| 69 | +The important practical point is that `pe_prespecified` is not "use some hard |
| 70 | +coded list no matter what." It still depends on what survives source |
| 71 | +capabilities, semantic compatibility, entity projection, and prepared condition |
| 72 | +surface construction. |
| 73 | + |
| 74 | +## What is structural vs experimental |
| 75 | + |
| 76 | +Structural: |
| 77 | + |
| 78 | +- donor block boundaries |
| 79 | +- support family and donor match strategy |
| 80 | +- native entity |
| 81 | +- role-aware PE structural predictors |
| 82 | +- semantic transforms/checks that prevent category errors |
| 83 | + |
| 84 | +Experimental: |
| 85 | + |
| 86 | +- whether `all_shared`, `top_correlated`, or `pe_prespecified` wins for a given |
| 87 | + block family |
| 88 | +- whether a particular variable should admit a |
| 89 | + `supplemental_shared_condition_vars` set |
| 90 | +- which compatible shared predictors should be let back into a PE-structured |
| 91 | + block |
| 92 | +- whether a condition surface should be widened upstream or left narrow |
| 93 | + |
| 94 | +Usually the failure mode has been treating an experimental choice as if it were |
| 95 | +a structural truth, or vice versa. |
| 96 | + |
| 97 | +## What is not a real fix |
| 98 | + |
| 99 | +These can still be useful probes, but they should not be confused for upstream |
| 100 | +imputation repairs: |
| 101 | + |
| 102 | +- late export-layer patches |
| 103 | +- post-donor clipping/zeroing guards |
| 104 | +- calibration-only improvements that hide unrealistic pre-calibration support |
| 105 | + |
| 106 | +The current working rule is: |
| 107 | + |
| 108 | +- if a patch improves only after calibration but worsens the pre-calibration |
| 109 | + imputation evidence or the mission metric, it is not a clean imputation win |
| 110 | + |
| 111 | +## Evidence contract |
| 112 | + |
| 113 | +We read four artifact layers for imputation questions. |
| 114 | + |
| 115 | +### 1. Block-level conditioning evidence |
| 116 | + |
| 117 | +- `manifest.json` |
| 118 | + - `synthesis.donor_conditioning_diagnostics` |
| 119 | +- `python -m microplex_us.pipelines.summarize_donor_conditioning <artifact>` |
| 120 | + |
| 121 | +Use this first when the question is: |
| 122 | + |
| 123 | +- Which predictors did this donor block actually use? |
| 124 | +- Which shared predictors were available but dropped? |
| 125 | +- Did the block use a prepared PE-style condition surface? |
| 126 | +- Did a requested predictor fail at raw overlap, projection, or prepared |
| 127 | + compatibility? |
| 128 | + |
| 129 | +### 2. Pre-calibration imputation evidence |
| 130 | + |
| 131 | +- `imputation_ablation.json` |
| 132 | + |
| 133 | +Use this when the question is: |
| 134 | + |
| 135 | +- Which variant wins support realism? |
| 136 | +- Which variant wins weighted MAE? |
| 137 | +- Are we trading support realism against MAE? |
| 138 | + |
| 139 | +### 3. Full checkpoint parity evidence |
| 140 | + |
| 141 | +- `pe_us_data_rebuild_parity.json` |
| 142 | +- `pe_us_data_rebuild_native_audit.json` |
| 143 | + |
| 144 | +Use these when the question is: |
| 145 | + |
| 146 | +- Did the candidate beat the incumbent on harness slices? |
| 147 | +- Did it beat the incumbent on the native broad loss? |
| 148 | +- Which target families regressed? |
| 149 | + |
| 150 | +### 4. Calibration trajectory evidence |
| 151 | + |
| 152 | +- `manifest.json` |
| 153 | + - `calibration.full_oracle_capped_mean_abs_relative_error` |
| 154 | + - `calibration.active_solve_capped_mean_abs_relative_error` |
| 155 | + - deferred-stage summaries |
| 156 | + |
| 157 | +Use this when the question is: |
| 158 | + |
| 159 | +- Did calibration rescue the candidate? |
| 160 | +- Did the change make the solve harder before any rescue happened? |
| 161 | + |
| 162 | +## Current read as of 2026-04-14 |
| 163 | + |
| 164 | +- Post-hoc dependent tax-leaf guards are not a satisfactory repair. |
| 165 | + - they regress the mission metric |
| 166 | +- A narrow PE-structured supplemental shared patch also failed as a real fix. |
| 167 | + - the raw-gate diagnostics now show why: the PUF source policy blocks |
| 168 | + `income`, `employment_status`, and synthetic `state_fips` from donor |
| 169 | + conditioning before they ever reach live overlap for these tax-leaf blocks |
| 170 | +- The local `policyengine-us-data` read resolves the PE-alignment question. |
| 171 | + - PE's PUF clone QRF uses the structural demographic / tax-unit-role |
| 172 | + predictors only for this family |
| 173 | +- That means the next question is not "why did compatible overlap lose these |
| 174 | + vars?" |
| 175 | + - the real question is whether we want a challenger path with source-native |
| 176 | + PUF predictors that survive source policy, or whether we keep the current |
| 177 | + structural-only PE-aligned contract |
| 178 | + |
| 179 | +This is a better next question than "which post-hoc guard should we try next," |
| 180 | +because it targets the actual modeling choice instead of clipping the output |
| 181 | +after the fact or chasing a nonexistent overlap bug. |
0 commit comments