This document is the living methods record for microplex-us.
It is not the paper. It is the shortest place in the repo that should answer:
- what the current canonical pipeline is
- what PolicyEngine is doing in that pipeline
- which methodological choices are considered canonical today
- which choices are explicitly provisional or challenger-only
- where the evidence for those choices is stored
microplex-us is not trying to literally recreate policyengine-us-data.
Current framing:
policyengine-usis the shared measurement operator- the active PE-US targets DB is the truth surface we score against
policyengine-us-datais the incumbent comparator and interface referencemicroplex-usis an independent US data-construction runtime
That means incumbent-compatibility work exists to improve attribution and interface confidence, not to define the project as a wrapper around PE-US-data.
We keep four claims separate:
- Architecture claim
microplex-usis a cleaner, more modular, more auditable runtime.
- Oracle-compatibility claim
- where important, Microplex matches or intentionally departs from incumbent PE-US-data construction behavior.
- Benchmark claim
- Microplex produces a better PE-ingestable dataset than the incumbent on the active target estate.
- Paper claim
- a stable narrative about methodology, evidence, and novelty that can be defended externally.
The first three live in code and artifacts now. The fourth should be written from them later, not invented separately.
This is the current working methods snapshot, not a claim of finality.
| Area | Current reading | Status | Main evidence |
|---|---|---|---|
| Measurement contract | policyengine-us plus the active targets DB are the oracle. policyengine-us-data is the incumbent comparator. |
Canonical |
benchmarking.md |
| Runtime boundary | Microplex owns source loading, donor integration, synthesis, entity build, export, artifacts, and experiment tracking. PolicyEngine owns measurement/materialization at eval time. | Canonical |
architecture.md |
| Incumbent-compatibility work | PE-style modes are used where they improve attribution or interface confidence, but they do not define the whole project. | Canonical |
policyengine-oracle-compatibility.md |
| Construction parity claim | Some construction layers are close or compatible, but general PE-construction parity is not yet established. | Canonical |
pe-construction-parity.md |
| Imputation evaluation | We currently track both support realism and MAE. Neither should be collapsed into a single unqualified "best" method. | Canonical |
pe_us_data_rebuild_parity.json, pe_us_data_rebuild_native_audit.json |
| Current production imputation reading | structured_pe_conditioning is the support winner on the current checkpoint ablation; top_correlated_qrf is the MAE winner. |
Provisional |
pe_us_data_rebuild_parity.json, pe_us_data_rebuild_native_audit.json |
| Broad mission metric | The mission metric is PE-native broad loss frontier, but pre-calibration support evidence is retained so unrealistic imputations do not hide behind later weighting. | Canonical |
superseding-policyengine-us-data.md, pe_us_data_rebuild_native_audit.json |
| Full-oracle loss accounting | full_oracle_* metrics now score the entire active targets DB, including explicit penalty mass for unsupported rows. Supported-only diagnostics remain separate. |
Canonical |
policyengine-oracle-compatibility.md, manifest.json |
| Calibration target planning | The active targets DB is one catalog, but calibration is staged and support-aware: rows are classified into solve_now, solve_later, or audit_only instead of forcing one flat solve. |
Canonical |
policyengine-oracle-compatibility.md, manifest.json |
| Current deferred calibration policy | Default PE-oracle rebuilds use a dense first pass plus two deferred passes at support 10 and 1, each capped to 24 constraints, always consider those passes, and narrow them to the top 7 deferred families and top 4 deferred geographies. Within that focus, deferred stages spend capacity by row-level capped error first, then family/geography loss share, and each deferred pass is only kept if it improves capped full-oracle loss. After correcting the upstream EITC-recipient oracle semantics, the support-10 pass improved the matched 2000/2000 large no-donor run from 0.9729 to 0.9498, the matched donor-inclusive large run from 0.9730 to 0.9502, and the medium no-donor run from 1.0298 to 1.0291. With the row-aware selector in place, the support-1 pass further improves the broader donor-inclusive run from 0.8783 to 0.8213, the matched broader no-donor run from 0.8908 to 0.8362, and the medium no-donor run from 1.0291 to 1.0029. Widening deferred family focus from 3 to 4 then improves the broader donor-inclusive run again from 0.8213 to 0.7909, the matched broader no-donor run from 0.8362 to 0.7996, and the medium no-donor run from 1.0029 to 0.9969. A fresh broader donor-inclusive checkpoint through the unmodified default entrypoint reproduces that 0.7909 result exactly. Widening deferred geographies from 4 to 8 on the same broader donor run then regresses capped full-oracle loss from 0.7909 to 0.7992, so the geography focus should stay at 4. Fixing raw PUF checkpoint sampling to respect S006 weights then improves the broader donor-inclusive default again from 0.7909 to 0.7682 and the matched broader no-donor default from 0.7996 to 0.7683 without any calibration-policy change. After promoting the earnsplit-only PUF person-expansion default, widening deferred family focus from 4 to 7 improves the broader donor-inclusive run again from 0.7176 to 0.7045, and the matched donor-free broader run from 0.7171 to 0.7040, with the same focused family set including aca_ptc and rental_income. |
Provisional |
manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, manifest.json |
| Current checkpoint PUF sampling reading | Checkpoint-scale PUF sampling should respect raw S006 weights before variable mapping rather than uniformly sampling raw PUF records. This is incumbent-alignment work, not a challenger method: it changes the checkpoint source sample so it better reflects the PUF weighting surface before any Microplex-specific synthesis or calibration logic. |
Provisional |
manifest.json, manifest.json, manifest.json, manifest.json |
| Current checkpoint CPS age-support sampling reading | Checkpoint-scale CPS sampling should guarantee at least one sampled household per observed state x 5-year age-band cell. This is also checkpoint-only incumbent-compatibility work: it does not change the full-data runtime, only the sampled source surface used in checkpoint experiments. On the matched broader donor run it improves capped full-oracle loss from 0.7682 to 0.7329, and on the matched broader no-donor run from 0.7683 to 0.7368. |
Provisional |
manifest.json, manifest.json, manifest.json, manifest.json |
| Current checkpoint donor age-support sampling reading | On donor-inclusive checkpoints, donor survey sampling should also guarantee at least one sampled household per observed state x 5-year age-band cell when a donor source exposes both state and age. This stays in the same checkpoint-only incumbent-compatibility bucket as the CPS age floor, but the effect is much smaller: on the matched broader donor run it improves capped full-oracle loss from 0.7329149849 to 0.7327632809 with the same selected-constraint count. |
Provisional |
manifest.json, manifest.json |
| Current checkpoint CPS income-support sampling reading | Do not promote checkpoint CPS income-support floors yet. The household-income analogue clearly regressed the matched broader donor run from 0.7329 to 0.7554, and the more PE-aligned tax-unit-income analogue was a near miss but still regressed the frontier metric from 0.7329 to 0.7372 even while improving uncapped full-oracle and active-solve loss. The accepted upstream checkpoint support change therefore remains the CPS state x age-band floor only. |
Provisional |
manifest.json, manifest.json, manifest.json |
| Current PUF person-expansion reading | Keep PE-style EARNSPLIT randomization in the PUF PE-demographics branch, but do not promote PE-style age-bin and spouse/dependent-sex randomization into the default path yet. The winning split-only version improves the matched broader donor checkpoint from 0.7327632809 to 0.7176041064, while the age/sex-only version regresses it to 0.7463902007. A later retest of the full age/sex path on top of the stronger family-7 broader donor default still regresses the mission metric from 0.7044626415 to 0.7111876263, so this remains a rejected lane rather than an unresolved default question. This keeps the upstream income-splitting alignment that helps the frontier metric without forcing the age/sex piece that currently hurts checkpoint performance. |
Provisional |
manifest.json, manifest.json, manifest.json, manifest.json, manifest.json, tmp_puf_source_stage_parity_personexpansion_20260412.json |
| Current post-fix residual reading | After the raw PUF weighting fix, the checkpoint CPS state x age-band floor, the earnsplit-only PUF person-expansion default, and the wider deferred family gate, ACA PTC and rental mass drop sharply, but the remaining capped-error mass is now led again by age, person AGI, tax-unit AGI, and EITC child-count families. The worst individual rows are still dominated by ACA amount and ACA-eligibility cells, with a thinner stored-input tail now mostly in tax-exempt interest and a few rental states. That keeps the next upstream lane on age/AGI/EITC structure rather than another broad calibration-policy sweep. |
Provisional |
tmp_broader_puf_personexpansion_family7_donor_drilldown_20260412.json, manifest.json, manifest.json |
| Current stored-input tail reading | Keep the accepted interest/rental donor-conditioning change, reject the property-cost extension, and reject both export-side rental normalization and direct zero-support-mask propagation in zero-inflated donor rank matching. Each looked locally plausible, but fresh 2000/2000 large no-donor source checkpoints regressed capped full-oracle loss from 1.3274 to 1.3874 and 1.9223 respectively, so the default path stays conservative here. |
Provisional |
manifest.json, manifest.json, tmp_policyengine_oracle_target_drilldown_asset_tail_smoke_current_20260411.json, tmp_policyengine_oracle_target_drilldown_asset_tail_smoke_old_20260411.json, manifest.json, manifest.json, manifest.json |
| Current interest-family reading | Do not promote the interest_income + tax_exempt_interest_share decomposition into the default path yet. It looked strong on the 400/400 medium no-donor run, but the matched 2000/2000 no-donor confirmation regressed capped full-oracle loss from 1.3274 to 1.3555, so the default remains separate taxable_interest_income and tax_exempt_interest_income lanes. |
Provisional |
manifest.json, manifest.json, manifest.json |
| Current donor-support sampling reading | Keep donor-support sampling with replacement. Forcing no-replacement support sampling looked cleaner mechanically but made the matched smoke run materially worse on both capped full-oracle and active-solve loss. | Provisional |
manifest.json, manifest.json |
| Current benchmark reading | On the current checkpoint artifact, harness metrics improved versus the incumbent comparator, but native broad loss is still much worse than enhanced_cps_2024. |
Canonical |
pe_us_data_rebuild_parity.json, pe_us_data_rebuild_native_audit.json |
| Current cross-run regression reading | Across 66 scored modelpass checkpoint runs, national_irs_other appears in the top 3 every time, state_agi_distribution in 63/66, and state_aca_spending in 54/66. Near-term model work should target those recurring families directly rather than broad tuning. |
Provisional |
live_pe_us_data_rebuild_checkpoint_modelpass_regression_summary_20260410.json |
Current national_irs_other drilldown reading |
The audited national_irs_other failures are concentrated in filing-status-sensitive IRS cells and coincide with large SINGLE and JOINT overcounts plus SEPARATE undercounts. The first remediation step is to preserve source-authoritative filing-status inputs into the PE construction path. |
Provisional |
live_pe_us_data_rebuild_checkpoint_national_irs_other_drilldown_20260410.json |
The current broad US pipeline is:
- Load raw survey/tax sources into canonical observation frames.
- Apply source semantics and variable semantics.
- Build donor blocks and donor-condition surfaces.
- Impute donor-only variables into the scaffold population.
- Synthesize a candidate population.
- Build PolicyEngine-ingestable entity tables.
- Export final H5.
- Run PolicyEngine materialization and compare implied aggregates to the active target DB.
- Save artifact bundles, sidecars, and registry/index records.
This is a fresh Microplex pipeline with a PolicyEngine evaluation boundary, not an attempt to make PE-US-data the runtime architecture.
- Source and variable semantics are declared in Microplex-owned registries and manifests.
- Final evaluation uses the shared PE-US runtime and active targets DB.
- Artifact discipline is required for serious runs:
manifest.jsondata_flow_snapshot.jsonpolicyengine_harness.jsonwhen harness evaluation runspolicyengine_native_scores.jsonwhen PE-native broad loss runspe_us_data_rebuild_parity.jsonfor incumbent-compatibility checkpointspe_us_data_rebuild_native_audit.jsonfor target/family/support auditrun_registry.jsonlrun_index.duckdb
- Incumbent-compatibility modes are allowed when they improve attribution.
- Materially different model choices should be explicit challenger variants.
- The default imputation stack is still under active evaluation.
- Support realism vs MAE tradeoffs are still live methodological questions.
- Full-support candidate construction and selector design are not settled.
- Calibration is still operationally important, but it is not the only or even always the dominant methodological lever.
- Held-out evaluation is not yet the default outer loop.
These should not be written up later as if they were settled all along.
- Should runtime imputation selection prioritize support realism, weighted MAE, or a gated combination of the two?
- How much conditioning structure should be imposed before flexible donor/QRF prediction begins?
- How much of the remaining broad-loss gap is record construction versus selection/calibration?
- Should deferred calibration eligibility stay at a single scalar trigger
(
full_oracle_capped_mean_abs_relative_error > 2.45), or should it become family-aware once larger source runs accumulate? - Which incumbent-compatible modes are worth keeping as long-run options, and which should remain diagnostic-only?
- When should held-out evaluation become a required gate rather than an optional extra?
Use these surfaces when writing claims down later:
- benchmarking.md for the truth/comparator/operator contract
- policyengine-oracle-compatibility.md for incumbent-compatibility rules
- pe-construction-parity.md for audited construction-layer matching vs intentional difference
- saved artifact bundles for actual run-level evidence
- tests for the code-enforced contract behind those claims
For the current checkpoint-style evidence bundle, the most useful files are:
- manifest.json
- data_flow_snapshot.json
- policyengine_harness.json
- policyengine_native_scores.json
- pe_us_data_rebuild_parity.json
- pe_us_data_rebuild_native_audit.json
- imputation_ablation.json
- live_pe_us_data_rebuild_checkpoint_modelpass_regression_summary_20260410.json
- live_pe_us_data_rebuild_checkpoint_national_irs_other_drilldown_20260410.json
- Decision:
- describe
policyengine-usas the oracle/evaluator andpolicyengine-us-dataas the incumbent comparator
- describe
- Why:
- this matches how the system is actually being used
- it avoids understating the novelty of the Microplex runtime
- it keeps incumbent-compatibility work from swallowing the whole project
- Evidence:
- Decision:
- keep support realism and MAE as separate evidence channels
- do not summarize imputation quality using post-calibration loss alone
- Why:
- the current checkpoint artifact shows a real tradeoff
structured_pe_conditioningwins supporttop_correlated_qrfwins weighted MAE- collapsing the two too early would hide methodology risk
- Evidence:
- Decision:
- treat sidecars and registry metadata as part of the methodology, not just engineering exhaust
- Why:
- paper-facing claims will need reproducible evidence with exact configs, metrics, and comparison slices
- the artifact bundle is now the canonical storage layer for that evidence
- Evidence:
- Decision:
- prioritize targeted fixes for
national_irs_other,state_agi_distribution, and thenstate_aca_spending
- prioritize targeted fixes for
- Why:
- across recent modelpass checkpoint families, the same regressions recur even when total loss improves substantially
national_irs_otherappears in the top 3 for all 66 scored runsstate_agi_distributionappears in the top 3 for 63/66 runs and is the largest regressing family in 34 runsstate_aca_spendingappears in the top 3 for 54/66 runs but is more often a secondary or tertiary regression
- Evidence:
- Decision:
- first fix the preservation of source-authoritative filing-status inputs in the PE-oracle rebuild path before attempting more downstream status tuning
- Why:
- audited
national_irs_otherlead runs show repeated IRS target failures in filing-status-sensitive cells, especiallySingle,Joint, and high-AGI bins - those same audited runs show large
SINGLEandJOINTcount surpluses, largeSEPARATEdeficits, and missing or distorted MFS support bins - the saved candidate seed/synthetic/calibrated rows for leading runs retain
marital_statusbut notfiling_status_code, so the authoritative PUF tax filing code is disappearing before tax-unit construction
- audited
- Evidence:
- Decision:
- score
full_oracle_*metrics over the full active targets DB, not just the supported subset - penalize unsupported rows explicitly rather than letting them disappear from the scalar objective
- keep supported-only summaries as separate diagnostics
- score
- Why:
- "measure everything, optimize the feasible subset" only works if the measurement metric actually reflects unsupported misses
- otherwise frontier selection and deferred-stage triggers can be gamed by leaving hard rows unsupported
- Evidence:
- Decision:
- keep the full active targets DB as one measurement catalog
- classify rows into
solve_now,solve_later, oraudit_only - use a dense first pass plus at most one deferred pass by default on the incumbent-compatible PE-oracle rebuild path
- Why:
- one flat broad solve is not numerically credible on thinner artifacts
- the right execution rule is support-aware staging, not shadow target CSVs or pretending all DB rows belong in the same solve
- Evidence:
- Decision:
- default deferred calibration on the incumbent-compatible PE-oracle rebuild
path uses:
- one deferred pass at support floor
10 - deferred-pass cap
24 - trigger threshold
full_oracle_capped_mean_abs_relative_error > 2.45
- one deferred pass at support floor
- default deferred calibration on the incumbent-compatible PE-oracle rebuild
path uses:
- Why:
- tiny-source evidence still benefits from the deferred pass
- medium, donor-inclusive, and larger replayed/source artifacts do not justify attempting it below that threshold
- Evidence:
- Decision:
- support
person -> tax_unit/family/spm_unitboolean target filters in the PE household-constraint compiler using group-membership.any()semantics
- support
- Why:
- broad-oracle runs were carrying an artificial unsupported wall across 11
whole
tax_unit_countfamilies such asdividend_income,taxable_interest_income, andunemployment_compensation - those targets are defined as
tax_unit_countwith person-entity domain filters likedividend_income > 0plus tax-unit filters liketax_unit_is_filer == 1 - removing that structural limitation dropped unsupported targets on the
large no-donor replay from
572to0, and the fresh source rerun improved capped full-oracle loss from2.4329to1.3274
- broad-oracle runs were carrying an artificial unsupported wall across 11
whole
- Evidence:
- Decision:
- prioritize post-fix model and construction work against the remaining large-run oracle leaders rather than more deferred-stage tuning
- Why:
- fresh donor and no-donor
2000/2000source runs now share the same top full-oracle residual families and geographies - the largest remaining families are age counts,
tax_unit_countforeitc_child_count, and AGI count families; the leading geographies arestate:OR,state:GA, andstate:MO - within those geographies, the worst cells are concentrated in ACA PTC, AGI counts, SALT, rental income, tax-exempt interest income, and pass-through income
- fresh donor and no-donor
- Evidence:
- Decision:
- keep the richer interest/rental donor-conditioning semantics
- do not promote the property-cost semantic extension into the default pipeline
- Why:
- on matched
200/200smoke checkpoints, the accepted interest/rental change slightly improves capped full-oracle loss from1.4417803to1.4414441and lowers active-solve capped loss from1.8878380to1.8829362 - the accepted change cuts the capped stored-input mass attributed to
tax_exempt_interest_incomein the top drilldown from40to20 - the follow-on property-cost extension made capped full-oracle loss worse
(
1.4489770) and doubled property-side capped mass in the top drilldown, so it was reverted
- on matched
- Evidence:
- Decision:
- do not rebuild net
rental_incomeat PolicyEngine export fromrental_income_positive - rental_income_negative - keep exporting the observed net
rental_incomedirectly in the default path
- do not rebuild net
- Why:
- a saved-seed replay looked promising and improved capped full-oracle loss
from
1.3274to1.3169, which made the export-side normalization look like a clean way to use donor-integrated rental components - the fresh
2000/2000large no-donor source checkpoint contradicted that replay: capped full-oracle loss worsened from1.3274to1.3874 - active-solve capped loss also worsened from
2.6923to2.7722, and the number of active constraints fell from540to522 - source checkpoints decide default-path changes; replay-only wins are not sufficient
- a saved-seed replay looked promising and improved capped full-oracle loss
from
- Evidence:
- Decision:
- do not make zero-inflated donor rank matching honor the generated support mask
directly by replacing the donor positive-rate count with
scores > 0 - keep the existing donor-rate-based positive count in the default path
- do not make zero-inflated donor rank matching honor the generated support mask
directly by replacing the donor positive-rate count with
- Why:
- the idea was structurally coherent: the QRF path already trains a zero model, so propagating its zero mask through final donor assignment looked like a way to stop rank matching from reintroducing positive tail support
- the fresh
2000/2000large no-donor source checkpoint failed badly: capped full-oracle loss worsened from1.3274to1.9223 - active-solve capped loss worsened from
2.6923to4.3296, and active constraints rose from540to703, so the change was not merely trading one metric for another - again, source checkpoints decide default-path changes
- Evidence:
- Decision:
- do not promote the
interest_income + tax_exempt_interest_sharedonor block into the default pipeline - keep
taxable_interest_incomeandtax_exempt_interest_incomeon separate donor lanes for now
- do not promote the
- Why:
- the medium no-donor checkpoint was promising: capped full-oracle loss fell
from
2.3931to1.3644 - the matched large no-donor confirmation did not hold: capped full-oracle
loss worsened from
1.3274to1.3555 - raw full-oracle loss also worsened sharply on the large run, from
2256.6to16980.7, and active-solve capped loss worsened from2.6923to2.8229 - the default path should follow the larger, more representative no-donor run, not the thinner medium win
- the medium no-donor checkpoint was promising: capped full-oracle loss fell
from
- Evidence:
- Decision:
- keep donor-support sampling with replacement in the default donor path
- Why:
- a no-replacement support sampler sounds cleaner, but the matched smoke run was worse on the only metrics that matter here
- capped full-oracle loss worsened from
1.4414to1.6369 - active-solve capped loss worsened from
1.8829to2.7402 - this should remain a rejected experiment unless a stronger construction change makes it worthwhile to revisit
- Evidence:
- Decision:
- treat IRS SOI EITC child-count targets as recipient strata that require
eitc > 0, not just filer strata split byeitc_child_count - keep Microplex compatible with the corrected DB by treating
domain_variableas a set-membership field when target rows carry multiple domain constraints such aseitc,eitc_child_count
- treat IRS SOI EITC child-count targets as recipient strata that require
- Why:
- the active targets DB guide already described
eitc_child_countas EITC recipient strata, andpolicyengine-us-data's own loss code evaluates those cells as(eitc > 0) * meets_child_criteria - the ETL was the inconsistent layer: it created child-count strata under filer strata without the positive-EITC condition
- after correcting the DB and rerunning the matched
2000/2000large no-donor source checkpoint, capped full-oracle loss fell from1.0149to0.9718on an apples-to-apples corrected-oracle comparison - the same comparison moved
tax_unit_count|domain=eitc_child_countout of the top-3 residual families, so this was a real oracle bug, not just a cosmetic target renaming
- the active targets DB guide already described
- Evidence:
- Decision:
- default PE-oracle rebuilds should always consider one deferred support-10 calibration pass
- keep that pass narrow by default: top 3 deferred families, top 4 deferred geographies, and at most 24 constraints
- let the existing capped full-oracle accept/reject rule decide whether the stage is retained, instead of gating the attempt behind a hard trigger
- Why:
- after the EITC-recipient oracle fix, the old
2.45trigger became the brittle heuristic rather than the principled part of the policy - on matched
2000/2000large no-donor and donor-inclusive source runs, the same narrow stage-2 pass improved capped full-oracle loss from0.9729to0.9498and from0.9730to0.9502 - the same narrow pass also improved the medium no-donor run slightly, from
1.0298to1.0291, so the accept/reject rule is carrying the right burden and the hard trigger is not buying us much
- after the EITC-recipient oracle fix, the old
- Evidence:
- Decision:
- keep the default deferred-stage family focus at 3 rather than widening it
to 4 just to admit
aca_ptc|domain=aca_ptc
- keep the default deferred-stage family focus at 3 rather than widening it
to 4 just to admit
- Why:
- the broader no-donor row-level drilldown made ACA look like the next
plausible family to admit into stage 2, but the matched
5000/5000checkpoint withtop_family_count = 4produced the exact same final result astop_family_count = 3 - capped full-oracle loss stayed at
0.8908588019931089 - active-solve capped loss stayed at
0.8950141021216582 - the stage-2 cap remained
24, so widening the family focus did not meaningfully change which cells won capacity
- the broader no-donor row-level drilldown made ACA look like the next
plausible family to admit into stage 2, but the matched
- Evidence:
- Decision:
- keep the row-aware deferred selector
- within the existing top-3-family / top-4-geography focus and 24-constraint cap, rank candidate stage-2 rows by capped target error plus family and geography loss share rather than family/geography share alone
- Why:
- widening the focused family set did nothing because the bottleneck is the 24-slot cap, not admission into the focused set
- the row-aware ranking is neutral on the medium no-donor checkpoint, slightly better on the broader no-donor checkpoint, and materially better on the broader donor-inclusive checkpoint
- that is the right direction for the actual objective, capped full-oracle loss, without changing the surrounding stage-2 policy
- Evidence:
- matched medium no-donor row-aware rerun:
manifest.json
- unchanged from the prior medium default,
1.0298017982 -> 1.0291445335
- unchanged from the prior medium default,
- matched broader no-donor row-aware rerun:
manifest.json
- improves capped full-oracle loss from
0.8908588020to0.8907527501
- improves capped full-oracle loss from
- matched broader donor-inclusive row-aware rerun:
manifest.json
- improves capped full-oracle loss from
0.8932869027to0.8782556650
- improves capped full-oracle loss from
- matched medium no-donor row-aware rerun:
manifest.json
- Decision:
- change the canonical PE-oracle rebuild default from one deferred support-10
pass to two deferred passes at support
10and1 - keep the same
24-constraint cap and top-3-family / top-4-geography focus on each deferred pass
- change the canonical PE-oracle rebuild default from one deferred support-10
pass to two deferred passes at support
- Why:
- the support-1 stage is now solving the right residual class: mostly ultra-thin age and AGI rows that remain after the row-aware support-10 pass
- it improves the actual objective, capped full-oracle loss, on broader donor-inclusive, broader no-donor, and medium no-donor reruns
- the existing accept/reject rule already prevents the stage from sticking if it ever becomes harmful on another run
- Evidence:
- matched broader donor-inclusive rerun with an extra support-1 stage:
manifest.json
- improves capped full-oracle loss from
0.8782556650to0.8212707783
- improves capped full-oracle loss from
- matched broader no-donor rerun with the same extra support-1 stage:
manifest.json
- improves capped full-oracle loss from
0.8907527501to0.8362042462
- improves capped full-oracle loss from
- matched medium no-donor rerun with the same extra support-1 stage:
manifest.json
- improves capped full-oracle loss from
1.0291445335to1.0028694956
- improves capped full-oracle loss from
- fresh medium no-donor checkpoint through the default entrypoint:
manifest.json
- reproduces the same three-stage result exactly, confirming the default
schedule is now
(10, 1)in the real entrypoint path
- reproduces the same three-stage result exactly, confirming the default
schedule is now
- matched broader donor-inclusive rerun with an extra support-1 stage:
manifest.json
- Decision:
- change the canonical PE-oracle rebuild default from top-3 deferred families to top-4 deferred families, keeping the same top-4 geographies and 24-constraint cap
- Why:
- after the row-aware selector and the extra support-1 stage, ACA PTC becomes the fourth largest deferred family by capped loss mass and still has many cells with support in the teens
- that means it is being excluded by family admission, not by impossible support, and letting it into the focused set materially improves the full-oracle objective
- Evidence:
- matched broader donor-inclusive rerun with top-4 deferred families:
manifest.json
- improves capped full-oracle loss from
0.8212707783to0.7908917500
- improves capped full-oracle loss from
- matched broader no-donor rerun with top-4 deferred families:
manifest.json
- improves capped full-oracle loss from
0.8362042462to0.7995775732
- improves capped full-oracle loss from
- matched medium no-donor rerun with top-4 deferred families:
manifest.json
- improves capped full-oracle loss from
1.0028694956to0.9968822972
- improves capped full-oracle loss from
- fresh medium no-donor checkpoint through the default entrypoint:
manifest.json
- reproduces the same top-4-family result exactly, confirming the default
family focus is now
4in the real entrypoint path
- reproduces the same top-4-family result exactly, confirming the default
family focus is now
- matched broader donor-inclusive rerun with top-4 deferred families:
manifest.json
- Decision:
- keep the canonical PE-oracle rebuild default at top-4 deferred geographies rather than widening the geography focus further
- Why:
- the fresh broader donor-inclusive default-entrypoint rerun reproduces the existing top-4-family/top-4-geography result exactly, so the default path is already stable on the current broader donor benchmark
- the fresh residual drilldown does show age and AGI pressure spread across
several states, but widening geography focus to
8on the same matched broader donor run worsens the real objective instead of helping
- Evidence:
- fresh broader donor-inclusive checkpoint through the unmodified default
entrypoint:
manifest.json
- reproduces capped full-oracle loss
0.7908917500with the default top-4-family/top-4-geography policy
- reproduces capped full-oracle loss
- matched broader donor-inclusive rerun with top-8 deferred geographies:
manifest.json
- regresses capped full-oracle loss from
0.7908917500to0.7991939177
- regresses capped full-oracle loss from
- fresh broader donor default drilldown:
tmp_broader_default_top4family_donor_drilldown_20260412.json
- confirms the remaining capped-error mass is still led by age, AGI, ACA, and EITC families, so the next work should move upstream rather than continuing to widen deferred geography focus
- fresh broader donor-inclusive checkpoint through the unmodified default
entrypoint:
manifest.json
- Decision:
- reject both tested versions of the CPS AGI-alignment hypothesis:
- do not materialize PE-style interest/dividend/pension leaf inputs inside the CPS source provider for the mixed-source rebuild path
- do not apply the same split inside the default PolicyEngine export builder either
- reject both tested versions of the CPS AGI-alignment hypothesis:
- Why:
policyengine-us-datadoes use fixed CPS split assumptions for those leaf inputs, but Microplex is not a single-source CPS build; it is a mixed-source fusion path where early promotion of estimated tax leafs can distort donor integration and downstream calibration- the source-side version confirmed that concern directly by creating a large new tax-exempt-interest residual family on the broader donor benchmark
- moving the split later to the export boundary avoids the catastrophic source distortion, but it still does not beat the incumbent default on the frontier metric
- Evidence:
- matched broader donor incumbent baseline:
manifest.json
- capped full-oracle loss
0.7329149849
- capped full-oracle loss
- source-side CPS leaf-input candidate:
manifest.json
- regresses capped full-oracle loss to
0.9164981002 - introduces large new interest-family residuals, especially
tax_unit_count|domain=tax_exempt_interest_income
- regresses capped full-oracle loss to
- export-side candidate:
manifest.json
- improves on the source-side candidate but still regresses capped
full-oracle loss to
0.7998451134
- improves on the source-side candidate but still regresses capped
full-oracle loss to
- matched broader donor incumbent baseline:
manifest.json
- Read:
- the direct PE CPS split assumptions are not plug-compatible with the current Microplex broader rebuild path
- this lane should be treated as explored and rejected for the current frontier objective, not as an untested TODO
- next upstream AGI work should look for better alignment boundaries than copying PE CPS tax-leaf splits wholesale
- Decision:
- keep the donor-side analogue of the accepted CPS checkpoint
state x age-bandfloor in the default sampled-query path for donor-inclusive checkpoints
- keep the donor-side analogue of the accepted CPS checkpoint
- Why:
- the current checkpoint asymmetry was real: CPS sampling guaranteed
state x 5-year age-bandcoverage, while donor survey sampling still only applied a plain state floor - donor survey providers already carry household state and person age for the sources where this matters, so the cleanest test was to mirror the CPS checkpoint floor there and keep it only if the full-oracle metric moved
- the improvement is small, but the run is deterministic and the code surface is narrow, so this is still worth keeping as a low-risk checkpoint-default refinement
- the current checkpoint asymmetry was real: CPS sampling guaranteed
- Evidence:
- matched broader donor baseline with the accepted CPS age floor only:
manifest.json
- capped full-oracle loss
0.7329149849 - active-solve capped loss
0.8498782563 - selected constraints
1059
- capped full-oracle loss
- matched broader donor rerun with donor-side
state x age-bandfloor: manifest.json- capped full-oracle loss
0.7327632809 - active-solve capped loss
0.8495978941 - selected constraints
1059
- capped full-oracle loss
- matched broader donor baseline with the accepted CPS age floor only:
manifest.json
- Read:
- this is not a large methodological change and should not be described that way
- it is a small but real upstream support improvement on the big metric, and it keeps the donor-inclusive checkpoint path more symmetric with the accepted CPS checkpoint support rule
- Code:
- keep PE-style random-in-bin decoding for
_puf_agerange,_puf_agedp*, and_puf_earnsplitinsrc/microplex_us/data_sources/puf.py - keep PE-style spouse/dependent sex draws in the same PE-demographics branch
- keep the seeded PE-demographics regression in
tests/test_puf_source_provider.py
- keep PE-style random-in-bin decoding for
- Why:
- the previous implementation was a direct parity bug, not a modeling choice:
it decoded PE demographic helper bins to fixed midpoints, while
policyengine-us-datasamples within those coded intervals and uses randomized spouse/dependent sex assignment - this is upstream alignment work on the exact PUF construction boundary, which is a better next step than inventing a new AGI heuristic
- the previous implementation was a direct parity bug, not a modeling choice:
it decoded PE demographic helper bins to fixed midpoints, while
- Focused verification:
python -m py_compile src/microplex_us/data_sources/puf.py tests/test_puf_source_provider.pyuv run pytest tests/test_puf_source_provider.py -q -k 'expand_to_persons or sample_tax_units'uv run pytest tests/test_puf_source_provider.py -q -k 'not pre_tax_contributions_via_policyengine_subprocess'
- Artifacts:
- source-stage parity candidate:
artifacts/tmp_puf_source_stage_parity_personexpansion_20260412.json - legacy source-stage parity reference:
artifacts/source_stage_parity_20260408/puf_2024_raw_source_stage_parity.json - matched broader donor checkpoint:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_puf_personexpansion_donors/broader-donors-puf-personexpansion-v1 - matched broader no-donor checkpoint:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_puf_personexpansion_nodonors/broader-nodonors-puf-personexpansion-v1
- source-stage parity candidate:
- Read:
- raw PUF source-stage parity moves materially closer to PolicyEngine on the
most relevant variables:
- age weighted-mean ratio:
1.0367 -> 1.0275 - employment-income weighted-mean ratio:
1.2196 -> 0.9996 - taxable-interest weighted-mean ratio:
2.2495 -> 1.1774
- age weighted-mean ratio:
- matched broader no-donor checkpoint:
- baseline capped full-oracle loss:
0.7368409543 - candidate capped full-oracle loss:
0.7336528770 - delta:
-0.0031880773 - active-solve capped loss:
0.8497778115 -> 0.8005940161
- baseline capped full-oracle loss:
- matched broader donor checkpoint:
- baseline capped full-oracle loss:
0.7327632809 - candidate capped full-oracle loss:
0.7342149723 - delta:
+0.0014516915worse - active-solve capped loss:
0.8495978941 -> 0.8037192584
- baseline capped full-oracle loss:
- conclusion:
- keep the upstream parity fix
- do not overclaim it as an unconditional frontier win
- treat the donor-path regression as the next interaction to investigate, rather than reverting a real PE-alignment correction
- raw PUF source-stage parity moves materially closer to PolicyEngine on the
most relevant variables:
- Code:
- keep PE-style
EARNSPLITsampling insrc/microplex_us/data_sources/puf.py - revert default PE-demographics age-bin and spouse/dependent-sex randomization in the same file
- keep the updated PE-demographics regression in
tests/test_puf_source_provider.py
- keep PE-style
- Why:
- the first bundled parity fix mixed two conceptually separate changes:
- age/sex randomization
- income-split randomization
- the only clean way to decide what belongs in the default path was a matched ablation on the broader donor checkpoint
- the first bundled parity fix mixed two conceptually separate changes:
- Focused verification:
python -m py_compile src/microplex_us/data_sources/puf.py tests/test_puf_source_provider.pyuv run pytest tests/test_puf_source_provider.py -q -k 'expand_to_persons or sample_tax_units'uv run pytest tests/test_puf_source_provider.py -q -k 'not pre_tax_contributions_via_policyengine_subprocess'
- Artifacts:
- donor baseline:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_donor_stateage1_donors/broader-donors-donor-stateage1-v1 - age/sex-only ablation:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_puf_personexpansion_ageonly_donors/broader-donors-puf-personexpansion-ageonly-v1 - earnsplit-only ablation:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_puf_personexpansion_earnsplitonly_donors/broader-donors-puf-personexpansion-earnsplitonly-v1 - real code-path confirmation:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_puf_personexpansion_default_donors/broader-donors-puf-personexpansion-default-v2
- donor baseline:
- Read:
- age/sex-only is clearly the wrong half for the current frontier objective:
- baseline capped full-oracle loss:
0.7327632809 - candidate:
0.7463902007 - delta:
+0.0136269199worse
- baseline capped full-oracle loss:
- earnsplit-only is clearly the right half:
- candidate:
0.7176041064 - delta vs baseline:
-0.0151591745 - active-solve capped loss:
0.8495978941 -> 0.7726915403
- candidate:
- the real code-path rerun matches the winning ablation exactly
- conclusion:
- default to PE-style
EARNSPLITrandomization - do not default to PE-style age/sex randomization yet
- treat age-bin randomization as an open parity lane rather than a settled improvement
- default to PE-style
- age/sex-only is clearly the wrong half for the current frontier objective:
- Code:
src/microplex_us/pipelines/pe_us_data_rebuild.pytests/pipelines/test_pe_us_data_rebuild.pytests/pipelines/test_pe_us_data_rebuild_checkpoint.pyartifacts/experiment_index.jsonldocs/methodology-ledger.md
- Why:
- after the accepted
EARNSPLITfix, the sharpest surviving rows were no longer mostly age/AGI; the worst individual cells were now concentrated inaca_ptcandrental_income - the staged selector was still spending its family slots on AGI and EITC pairs, so ACA and rental were being excluded from deferred consideration even when they were among the highest-error rows
- after the accepted
- Focused verification:
- matched broader donor checkpoint with
top_family_count = 7 - donor-free broader confirmation with
top_family_count = 7 uv run pytest tests/pipelines/test_pe_us_data_rebuild.py tests/pipelines/test_pe_us_data_rebuild_checkpoint.py -q
- matched broader donor checkpoint with
- Artifacts:
- donor baseline:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_puf_personexpansion_default_donors/broader-donors-puf-personexpansion-default-v2 - donor family-7 rerun:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_puf_personexpansion_family7_donors/broader-donors-puf-personexpansion-family7-v1 - donor-free baseline:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_puf_personexpansion_default_nodonors/broader-nodonors-puf-personexpansion-default-v2 - donor-free confirmation:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_puf_personexpansion_family7_nodonors/broader-nodonors-puf-personexpansion-family7-v1
- donor baseline:
- Read:
- on the broader donor run, widening deferred family focus from
4to7improves capped full-oracle loss from0.7176041064to0.7044626415 - the selected deferred families now explicitly include:
aca_ptc|domain=aca_ptcrental_income|domain=rental_income
- the matched donor-free broader run also improves from
0.7170633141to0.7039665310with the same focused family set - conclusion:
- promote
top_family_count = 7into the default rebuild policy - keep geography focus at
4 - treat ACA/rental as active deferred-calibration families rather than residuals that should stay outside the search surface
- promote
- on the broader donor run, widening deferred family focus from
- Code:
src/microplex_us/data_sources/puf.pywas restored to the earnsplit-only default after the retesttests/test_puf_source_provider.pywas restored to the incumbent earnsplit-only regression expectationsartifacts/experiment_index.jsonldocs/methodology-ledger.md
- Why:
- revisiting upstream person structure was reasonable, but this specific PE-style age/sex path had already lost once and needed to beat the current stronger family-7 default, not the older top-family-4 baseline
- the clean test was a one-axis donor rerun with the current default config, not another parity argument in the abstract
- Focused verification:
uv run pytest tests/test_puf_source_provider.py -q -k 'expand_to_persons_uses_pe_demographic_helpers_when_present or expand_to_persons_preserves_joint_tax_unit_monetary_totals or expand_to_persons_splits_negative_joint_self_employment_losses or expand_to_persons_clears_status_flags_for_non_head_members'
- Artifacts:
- current donor incumbent:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_puf_personexpansion_family7_donors/broader-donors-puf-personexpansion-family7-v1 - full-rng retest:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_puf_personexpansion_rng_donors/broader-donors-puf-personexpansion-rng-v1
- current donor incumbent:
- Read:
- donor incumbent capped full-oracle loss:
0.7044626415
- full-rng retest:
0.7111876263
- delta:
+0.0067249848worse
- conclusion:
- keep the earnsplit-only default
- treat full PE-style age/sex randomization as re-rejected for the current frontier objective
- move the next upstream work to AGI or EITC structure, not back into this same person-expansion branch
- donor incumbent capped full-oracle loss:
- Code:
src/microplex_us/data_sources/cps.pytests/test_cps_source_provider.pyartifacts/experiment_index.jsonldocs/methodology-ledger.md
- Why:
- a direct code review against
policyengine-us-datashowed the main CPS structural gap was that source tax-unit semantics were still too flat in Microplex even when later pipeline stages could reconstruct similar roles - the clean fix was to derive tax-unit head/spouse/dependent roles,
jointness, and dependent counts from raw
TAX_IDin the CPS source layer instead of leaving that work implicit downstream
- a direct code review against
- Verification:
python -m py_compile src/microplex_us/data_sources/cps.py tests/test_cps_source_provider.pyuv run pytest tests/test_cps_source_provider.py -q -k 'derives_tax_unit_roles_from_tax_id or caches_household_geography_on_persons or derives_survivor_and_dependent_social_security or loads_observation_frame or canonical_income_alias'
- Artifacts:
- donor incumbent:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_puf_personexpansion_family7_donors/broader-donors-puf-personexpansion-family7-v1 - source-structure rerun:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_cps_taxunit_structure_donors/broader-donors-cps-taxunit-structure-v1
- donor incumbent:
- Read:
- frontier metric is neutral:
0.7044626415 -> 0.7044626415
- conclusion:
- keep the source-layer CPS tax-unit derivation
- treat it as architecture cleanup and PE-boundary alignment, not as an independent frontier gain
- frontier metric is neutral:
- Code:
src/microplex_us/data_sources/cps.pywas restored after the testtests/test_cps_source_provider.pywas restored after the testartifacts/experiment_index.jsonldocs/methodology-ledger.md
- Why:
- after moving tax-unit structure to the source boundary, the next narrow
EITC-side parity hypothesis was to expose
is_full_time_college_studentdirectly from CPSA_HSCOL, becausepolicyengine-ususes that input in qualifying-child logic - the clean test was a one-axis broader donor rerun, not an argument from policy parity alone
- after moving tax-unit structure to the source boundary, the next narrow
EITC-side parity hypothesis was to expose
- Verification:
python -m py_compile src/microplex_us/data_sources/cps.py tests/test_cps_source_provider.pyuv run pytest tests/test_cps_source_provider.py -q -k 'derives_tax_unit_roles_from_tax_id or caches_household_geography_on_persons or derives_survivor_and_dependent_social_security or loads_observation_frame or canonical_income_alias'
- Artifacts:
- donor incumbent:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_puf_personexpansion_family7_donors/broader-donors-puf-personexpansion-family7-v1 - student-input rerun:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_cps_student_donors/broader-donors-cps-student-v1
- donor incumbent:
- Read:
- direct CPS student input is strongly harmful on the broader donor frontier:
0.7044626415 -> 0.7815651801
- conclusion:
- do not promote
is_full_time_college_studentinto the current mixed-source broader default - treat this as another case where direct PE CPS inputs are not automatically plug-compatible with the broader Microplex path
- do not promote
- direct CPS student input is strongly harmful on the broader donor frontier:
- Code:
src/microplex_us/pipelines/us.pytests/pipelines/test_us.pyartifacts/experiment_index.jsonldocs/methodology-ledger.md
- Why:
- after the CPS tax-unit structure cleanup, the strongest remaining direct alignment hypothesis was to keep authoritative source tax-unit IDs for households that already have them and only optimize donor households with missing tax-unit IDs
- that is a coherent architectural boundary, but it still had to beat the broader donor frontier metric rather than just look more PE-like on paper
- Verification:
python -m py_compile src/microplex_us/pipelines/us.py tests/pipelines/test_us.pyuv run pytest tests/pipelines/test_us.py -q -k 'preserve_existing_tax_unit_ids or falls_back_when_existing_tax_unit_ids_cross_households or partially_preserves_existing_tax_unit_ids'
- Artifacts:
- donor incumbent:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_puf_personexpansion_family7_donors/broader-donors-puf-personexpansion-family7-v1 - partial-preservation rerun:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_partial_preserve_taxunits_donors/broader-donors-partial-preserve-taxunits-v1
- donor incumbent:
- Read:
- capped full-oracle loss regresses slightly:
0.7044626415 -> 0.7055670761
- active-solve capped loss improves materially:
0.7909211525 -> 0.7648463685
- conclusion:
- keep the mixed-preservation code path as an optional capability
- do not promote
policyengine_prefer_existing_tax_unit_ids=Trueinto the current broader default - move the next upstream work off this boundary and back to the remaining AGI and EITC input/eligibility lanes
- capped full-oracle loss regresses slightly:
- implemented PE-style CPS
ssn_card_typederivation insrc/microplex_us/data_sources/cps.py- use the raw CPS immigration, benefits, work, and housing-assistance fields
to assign:
CITIZENNON_CITIZEN_VALID_EADOTHER_NON_CITIZENNONE
- added a safe fallback so if a future CPS extract is missing one of the raw
helper fields, Microplex still emits
ssn_card_type = CITIZENrather than silently dropping the column
- use the raw CPS immigration, benefits, work, and housing-assistance fields
to assign:
- allowed
ssn_card_typeinto the PE export surface insrc/microplex_us/policyengine/us.py- mixed-source missing rows now backfill to
CITIZENat export time
- mixed-source missing rows now backfill to
- focused verification:
python -m py_compile src/microplex_us/data_sources/cps.py src/microplex_us/policyengine/us.py tests/test_cps_source_provider.py tests/policyengine/test_us.pyuv run pytest tests/test_cps_source_provider.py -q -k 'ssn_card_type or derives_tax_unit_roles_from_tax_id'uv run pytest tests/policyengine/test_us.py -q -k 'default_policyengine_us_export_surface or defaults_missing_ssn_card_type_to_citizen'
- artifact comparison:
- incumbent broader donor default:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_puf_personexpansion_family7_donors/broader-donors-puf-personexpansion-family7-v1 ssn_card_typererun:artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_ssn_card_type_donors/broader-donors-ssn-card-type-v1
- incumbent broader donor default:
- read:
- capped full-oracle loss improves:
0.7044626415 -> 0.6955460
- active-solve capped loss also improves:
0.7909211525 -> 0.7813926586
- the direct
ssn_card_typefamily improves sharply:person_count|domain=ssn_card_type1.0000 -> 0.3786
- EITC child-count families improve:
eitc|domain=eitc,eitc_child_count0.8283 -> 0.7499tax_unit_count|domain=eitc,eitc_child_count0.8154 -> 0.7408
- the aggregate
eitcrow itself gets worse:0.1066 -> 0.2954
- conclusion:
- keep this change because it clears the frontier bar and the direction of movement is specifically consistent with the intended EITC-identification lane
- describe it narrowly: it improves the full-oracle metric and the identification / child-count families, not “all EITC targets”
- capped full-oracle loss improves:
- implemented a PE-style
takes_up_eitc/would_file_taxes_voluntarilytax-unit input path insrc/microplex_us/pipelines/us.py- the prototype used materialized
eitc_child_countto assign PE-style take-up rates and voluntary-filing draws before export - a review pass also hardened the prototype so materialization failures fell back explicitly instead of silently dropping the new columns
- the prototype used materialized
- temporarily exposed those variables in
src/microplex_us/policyengine/us.pyso the PE export surface could carry them - focused verification before the checkpoint:
python -m py_compile src/microplex_us/pipelines/us.py src/microplex_us/policyengine/us.py tests/pipelines/test_us.py tests/policyengine/test_us.pyuv run pytest tests/pipelines/test_us.py -q -k 'build_policyengine_entity_tables'uv run pytest tests/policyengine/test_us.py -q -k 'default_policyengine_us_export_surface or defaults_missing_ssn_card_type_to_citizen'
- artifact comparison:
- incumbent broader donor default:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_ssn_card_type_donors/broader-donors-ssn-card-type-v1 - take-up rerun:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_takeup_donors/broader-donors-takeup-v1
- incumbent broader donor default:
- read:
- capped full-oracle loss regresses:
0.6955460 -> 0.7041134
- active-solve capped loss regresses:
0.7813927 -> 0.7896826
- EITC child-count families improve:
eitc|domain=eitc,eitc_child_count0.7499 -> 0.7030tax_unit_count|domain=eitc,eitc_child_count0.7408 -> 0.6757
- but the aggregate
eitcfamily gets worse:0.2954 -> 0.4010
- ACA amount and count families also get worse:
aca_ptc|domain=aca_ptc2.3488 -> 2.5737tax_unit_count|domain=aca_ptc1.1521 -> 1.3708
- conclusion:
- reject the change on the current broader donor frontier metric
- revert the code path and keep the broader runtime at the
ssn_card_typeincumbent - do not interpret this as rejecting the conceptual separation between:
- filing because required
- filing voluntarily for non-credit reasons
- filing to claim refundable credits
- taking up EITC conditional on filing / eligibility
- the rejection is narrower: the current late export-layer port of
takes_up_eitcandwould_file_taxes_voluntarilyis not yet the right implementation in the broader mixed-source runtime - if this lane is revisited later, treat it as a challenger path that needs upstream filer / take-up calibration evidence rather than another direct PE-input port
- capped full-oracle loss regresses:
- tested a matched broader donor checkpoint with stronger upstream checkpoint
sampling support:
- CPS
state_age_floor = 2 - donor
state_age_floor = 2
- CPS
- artifact comparison:
- incumbent broader donor default:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_ssn_card_type_donors/broader-donors-ssn-card-type-v1 - stronger-floor rerun:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_stateage2_donors/broader-donors-stateage2-v1
- incumbent broader donor default:
- read:
- capped full-oracle loss regresses sharply:
0.6955460 -> 0.7361964
- active-solve capped loss also regresses:
0.7813927 -> 0.8371045
- the target family that motivated the run does improve:
person_count|domain=age0.4681 -> 0.4480
- but the broader frontier gets worse because AGI, EITC-child-count, and ACA
families all move in the wrong direction:
person_count|domain=adjusted_gross_income0.7119 -> 0.7553tax_unit_count|domain=adjusted_gross_income0.6372 -> 0.6618eitc|domain=eitc,eitc_child_count0.7499 -> 0.8880tax_unit_count|domain=eitc,eitc_child_count0.7408 -> 0.8755aca_ptc|domain=aca_ptc2.3488 -> 2.9982
- conclusion:
- reject stronger checkpoint age-floor heuristics
- keep the accepted
state_age_floor = 1incumbent - move the next parity work to upstream PUF age/AGI construction rather than stronger checkpoint support heuristics
- capped full-oracle loss regresses sharply:
- tested a matched broader donor checkpoint with a checkpoint-only PUF sampling
change:
- preserve the top raw PUF AGI tail whenever
sample_nis active - keep the rest of the broader donor runtime unchanged
- preserve the top raw PUF AGI tail whenever
- artifact comparison:
- incumbent:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_ssn_card_type_donors/broader-donors-ssn-card-type-v1 - candidate:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_puf_agi_tail_donors/broader-donors-puf-agi-tail-v1
- incumbent:
- metric read:
- capped full-oracle loss:
0.6955460 -> 1.1132009
- active-solve capped loss:
0.7813927 -> 1.9290
- selected constraints:
1031 -> 1163
- a fast raw PUF source-stage proxy did improve taxable-interest and dividend parity, but it simultaneously worsened self-employment and rental structure enough that the real broader checkpoint failed outright
- capped full-oracle loss:
- action:
- reject high-AGI-preserving checkpoint PUF sampling
- revert the checkpoint-only sampler code path completely
- keep the broader donor incumbent on the accepted
ssn_card_typeruntime - continue the next parity work in upstream construction/imputation rather than checkpoint-only tail heuristics
Update this document when any of the following changes:
- the canonical measurement contract
- the default runtime pipeline shape
- the default imputation or selection method family
- the meaning of the parity/audit sidecars
- the set of artifacts required for a headline claim
- the boundary between incumbent-compatibility work and challenger work
When writing the eventual paper:
- Start from this ledger, not from memory.
- Pull claims only from code-backed docs and artifact-backed evidence.
- Preserve the distinction between canonical, provisional, and open items.
- Cite the exact artifact family that supported each headline claim.
- Avoid rewriting temporary engineering names like
pe_us_data_rebuildinto misleading methodological claims.
Some internal module names still say pe_us_data_rebuild.
Treat that as historical naming, not as the canonical project description. The canonical description is:
- Microplex is the runtime
- PolicyEngine is the oracle/evaluator
- PE-US-data is the incumbent comparator
- traced the ACA residual lane and confirmed that
takes_up_aca_if_eligibleis a real PE construction-stage input rather than a made-up Microplex feature- PE-US-data assigns it during CPS construction
- PE-US uses it directly in the ACA PTC formula
- implemented the narrowest plausible version in
src/microplex_us/pipelines/us.pyandsrc/microplex_us/policyengine/us.pyas a direct probe:- add a deterministic PE-style
takes_up_aca_if_eligibledraw during tax-unit construction - expose that variable on the PE export surface
- add a deterministic PE-style
- verification before evaluation:
python -m py_compile src/microplex_us/pipelines/us.py src/microplex_us/policyengine/us.py tests/pipelines/test_us.py tests/policyengine/test_us.pyuv run pytest tests/pipelines/test_us.py -q -k 'aca_takeup or export_policyengine_dataset or derives_tax_input_columns'uv run pytest tests/policyengine/test_us.py -q -k 'default_policyengine_us_export_surface_avoids_formula_aggregates'
- evaluation method:
- reevaluated the incumbent broader donor synthetic population in memory against the shared oracle instead of running a fresh saved checkpoint, because disk pressure made a large rerun unreliable
- baseline:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_ssn_card_type_donors/broader-donors-ssn-card-type-v1 - saved readout:
artifacts/tmp_broader_aca_takeup_recalibration_20260412.json
- metric read:
- capped full-oracle loss regresses:
0.6955460 -> 0.8211989
- active-solve capped loss improves:
0.7813927 -> 0.7013644
- the intended ACA families improve sharply:
aca_ptc|domain=aca_ptc2.3488 -> 0.5529tax_unit_count|domain=aca_ptc1.1521 -> 0.7112person_count|domain=aca_ptc,is_aca_ptc_eligible1.0994 -> 0.7771
- capped full-oracle loss regresses:
- action:
- reject this implementation from the default broader runtime and revert it
- keep the concept in scope as required upstream parity work
- interpret the result narrowly:
- this is not evidence against separate ACA take-up behavior
- it is evidence that a standalone tax-unit/export-boundary patch is the wrong implementation boundary in the current mixed-source runtime
- ACA-specific review conclusion:
- beyond raw
has_marketplace_health_coverage/has_esi, the only real ACA-specific upstream input istakes_up_aca_if_eligible - there is no large hidden ACA-specific construction surface still missing from Microplex before export
- beyond raw
- diagnostic comparison:
- compared the incumbent broader donor artifact
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_ssn_card_type_donors/broader-donors-ssn-card-type-v1/policyengine_us.h5against PE'senhanced_cps_2024.h5 - saved readout:
artifacts/tmp_broader_aca_eligibility_decomposition_20260412.json
- compared the incumbent broader donor artifact
- read:
- the incumbent has higher under-20 Medicaid/CHIP eligibility than the PE
baseline:
eligible_share_under20:0.4909 -> 0.6094medicaid_share_under20:0.3930 -> 0.5278
- the dominant driver is much lower child-unit
medicaid_income_levelin the incumbent:- median under-20
medicaid_income_level:15.1512 -> 1.6054 - p75 under-20
medicaid_income_level:364.3831 -> 3.9464
- median under-20
- child filing-status mix is not the main failure mode:
- the incumbent actually places more under-20s in
JOINTunits than the PE baseline
- the incumbent actually places more under-20s in
- current interpretation:
- the next lane is AGI / tax-unit construction and imputation for child units
- ACA should no longer be treated as primarily an ACA-specific export/input problem
- the incumbent has higher under-20 Medicaid/CHIP eligibility than the PE
baseline:
- hypothesis:
- because the seeded integrated microdata already has near-PE under-20
singleton-tax-unit structure, preserving source
tax_unit_idvalues in the PE rebuild path might be a direct parity win and should beat the current optimizer-driven rebuild on the big metric
- because the seeded integrated microdata already has near-PE under-20
singleton-tax-unit structure, preserving source
- code path under test:
- flipped
policyengine_prefer_existing_tax_unit_idstoTrueonly insrc/microplex_us/pipelines/pe_us_data_rebuild.py - left the generic
USMicroplexBuildConfigdefault unchanged; this was only a PE rebuild / checkpoint default probe - updated the default-config assertions in
tests/pipelines/test_pe_us_data_rebuild.pyandtests/pipelines/test_pe_us_data_rebuild_checkpoint.py
- flipped
- verification:
- focused config tests passed
- an explorer review found no concrete code-level regression path from the default flip
- matched broader donor source rerun:
artifacts/live_pe_us_data_rebuild_checkpoint_20260413_preserve_taxunits_default_donors/broader-donors-preserve-taxunits-default-v1
- read:
- the synthetic-data proxy was slightly positive:
- optimizer:
0.63654 - preserve existing IDs:
0.63583
- optimizer:
- but the real broader donor checkpoint still loses on the mission metric:
- incumbent:
artifacts/live_pe_us_data_rebuild_checkpoint_20260412_broader_ssn_card_type_donors/broader-donors-ssn-card-type-v1 - candidate:
artifacts/live_pe_us_data_rebuild_checkpoint_20260413_preserve_taxunits_default_donors/broader-donors-preserve-taxunits-default-v1 - capped full-oracle loss:
0.6955 -> 0.6977 - active-solve capped loss:
0.7814 -> 0.7624 - selected constraints:
1031 -> 1019
- incumbent:
- the synthetic-data proxy was slightly positive:
- decision:
- reject the default flip and revert it from the canonical PE rebuild path
- keep source-tax-unit preservation as an optional structural probe rather than the default
- interpretation:
- this is another case where a promising structural parity clue clears a local or proxy test but still misses on the real broader frontier metric
- the child-unit AGI / Medicaid-income miss is still best treated as an upstream construction / source-impute problem, not as a rebuild-default switch we can justify today
- hypothesis:
- if full source-tax-unit preservation is too broad, preserve source
tax_unit_idvalues only in households with minors and let the optimizer rebuild adult-only households
- if full source-tax-unit preservation is too broad, preserve source
- code path under test:
- added an opt-in experiment flag in
src/microplex_us/pipelines/us.pyso preserved tax units applied only to households with at least one person under age 20 - added a focused household-level regression in
tests/pipelines/test_us.py
- added an opt-in experiment flag in
- verification:
- focused
py_compileand preservation tests passed before the real run - matched broader donor source rerun:
artifacts/live_pe_us_data_rebuild_checkpoint_20260413_minorhousehold_preserve_taxunits_donors/broader-donors-minorhousehold-preserve-taxunits-v1
- focused
- read:
- it materially fixes the exact child-structure symptom:
- under-20 singleton-tax-unit share:
0.1538 -> 0.0345 - under-20 mean
medicaid_income_level:2.7279 -> 3.0408 - under-20 median
medicaid_income_level:1.5131 -> 1.8068
- under-20 singleton-tax-unit share:
- but it still loses on the broader donor mission metric:
- capped full-oracle loss:
0.6955 -> 0.6985 - active-solve capped loss:
0.7814 -> 0.7614 - selected constraints:
1031 -> 1031
- capped full-oracle loss:
- it materially fixes the exact child-structure symptom:
- decision:
- reject the experiment and revert the code path
- interpretation:
- tax-unit assignment is only part of the child-lane miss
- the remaining gap is in child-linked AGI component construction, not just which adults children are attached to
- diagnostic comparison:
- compared the PE baseline, the broader donor incumbent, and the rejected minor-household-preservation rerun on person-mapped under-20 tax-unit aggregates
- read:
- the rejected preservation rerun raises under-20 mapped AGI and Medicaid
MAGI, but both remain far below the PE baseline:
- under-20 mapped
adjusted_gross_income:- PE baseline:
137623.5 - incumbent:
85755.2 - minor-preserve rerun:
98230.0
- PE baseline:
- under-20 mapped
medicaid_magi:- PE baseline:
140533.9 - incumbent:
86338.8 - minor-preserve rerun:
98586.5
- PE baseline:
- under-20 mapped
- the surviving gap looks like AGI composition, not simple child attachment:
- under-20 mapped
tax_unit_partnership_s_corp_income:- PE baseline:
23323.0 - incumbent:
9568.7 - minor-preserve rerun:
10710.1
- PE baseline:
- under-20 mapped
net_capital_gains:- PE baseline:
3200.0 - incumbent:
534.3 - minor-preserve rerun:
945.7
- PE baseline:
- under-20 mapped
qualified_dividend_income:- PE baseline:
47.2 - incumbent:
0.0 - minor-preserve rerun:
0.0
- PE baseline:
- under-20 mapped
tax_exempt_interest_income:- PE baseline:
4.68 - incumbent:
0.0 - minor-preserve rerun:
0.0
- PE baseline:
- under-20 mapped
- the rejected preservation rerun raises under-20 mapped AGI and Medicaid
MAGI, but both remain far below the PE baseline:
- action:
- move the next direct-path lane to AGI component construction / source-impute parity for child-linked tax units
- stop spending more effort on source-tax-unit preservation variants
- hypothesis:
- the child-linked AGI miss might be coming from a real architecture gap: PE imputes PUF tax variables with one sequential QRF over a joint block, while Microplex currently donor-imputes those leaves mostly as independent blocks
- a PE-like grouped sequential-QRF challenger for the main PUF AGI leaves could therefore be a more direct parity move than more tax-unit heuristics
- code path under test:
- added a non-default
sequential_qrfdonor-imputer backend - grouped the main PUF AGI component leaves into one joint donor block when that backend was selected
- added focused regressions, then ran matched medium and broader donor checkpoints
- added a non-default
- verification:
- focused
py_compileand the new block/backend regression slice passed before the real runs - matched medium donor rerun:
artifacts/live_pe_us_data_rebuild_checkpoint_20260413_sequential_puf_joint_medium/medium-donors-sequential-puf-joint-v1 - matched broader donor rerun:
artifacts/live_pe_us_data_rebuild_checkpoint_20260413_sequential_puf_joint_donors/broader-donors-sequential-puf-joint-v1
- focused
- read:
- the broader donor frontier metric regresses:
- capped full-oracle loss:
0.6955 -> 0.7190 - active-solve capped loss:
0.7814 -> 0.7757 - selected constraints:
1031 -> 999
- capped full-oracle loss:
- the medium donor rerun is also not attractive:
- capped full-oracle loss:
0.9426 - active-solve capped loss:
0.6618
- capped full-oracle loss:
- a direct matched CPS+PUF stage probe on a
1000/1000sample shows the PE-like backend changes the child-linked AGI composition aggressively, but not in a clearly correct direction:- under-20 linked
qualified_dividend_income:40.0 -> 1199.0 - under-20 linked
taxable_interest_income:507.2 -> 1634.6 - under-20 linked
tax_exempt_interest_income:4.66 -> 249.4 - under-20 linked
taxable_pension_income:9118.5 -> 19317.6
- under-20 linked
- the broader donor frontier metric regresses:
- decision:
- reject the challenger and revert the experiment code
- interpretation:
- the parity observation is still useful: PE really does use a more joint QRF architecture for this lane
- but a direct port into the current donor/rank-match runtime is not numerically safe enough to keep
- keep the next lane on narrower upstream AGI construction / source-impute parity for child-linked units, not on a wholesale donor-backend swap
- diagnosis:
- the child-linked AGI misallocation is not coming from raw PUF person expansion
- direct inspection of
PUFSourceProvider(..., expand_persons=True)on a matched sample showed under-20 dependent rows carry zeropartnership_s_corp_income,taxable_pension_income,taxable_interest_income,qualified_dividend_income, andtax_exempt_interest_income - the incumbent broader donor seed artifact instead carried large dependent
mass on some of those leaves, especially:
- under-20
partnership_s_corp_income:4.09M - under-20
taxable_pension_income:17.77M - under-20
taxable_interest_income:33.98k
- under-20
- so the structural clue was real: donor integration is creating dependent-row mass that is not present in raw expanded PUF
- tested:
- added a post-donor semantic guard that zeroed the affected PUF tax leaves on
rows with
is_tax_unit_dependent > 0 - verified locally that the guard nearly removed the seeded child mass:
- under-20
partnership_s_corp_income:4.09M -> 87.3k - under-20
taxable_pension_income:17.77M -> 172.6k - under-20
taxable_interest_income:33.98k -> 3.28k
- under-20
- ran a matched broader donor checkpoint:
artifacts/live_pe_us_data_rebuild_checkpoint_20260413_dependent_zero_tax_leaves_donors/broader-donors-dependent-zero-tax-leaves-v1
- added a post-donor semantic guard that zeroed the affected PUF tax leaves on
rows with
- read:
- the real frontier result is decisively worse:
- capped full-oracle loss:
0.6955 -> 1.1372 - active-solve capped loss:
0.7814 -> 1.6581
- capped full-oracle loss:
- the first calibration stage was already much worse than the incumbent:
- post-stage-1 capped full-oracle loss:
1.3660
- post-stage-1 capped full-oracle loss:
- later deferred stages improved on that bad starting point, but still never
recovered:
- post-stage-2 capped full-oracle loss:
1.2460 - final capped full-oracle loss:
1.1372
- post-stage-2 capped full-oracle loss:
- the real frontier result is decisively worse:
- decision:
- reject the guard and revert the code
- interpretation:
- the structural diagnosis still holds: donor integration is where the dependent-row mass is being created
- but a blunt post-donor zeroing rule destroys too much signal elsewhere and is not a valid repair
- the next lane should target narrower donor-impute/source-impute parity for these leaves, not post-hoc dependent suppression
- hypothesis:
- the blunt post-donor zeroing guard failed because it acted too late
- a narrower parity move would be to keep the donor-impute path but partition
fitting and matching by
is_tax_unit_dependentfor the leaves that were actually exploding on child-linked rows:partnership_s_corp_incometaxable_pension_incometaxable_interest_income
- tested:
- added a block-level exact-match partition on
is_tax_unit_dependentfor those singleton donor blocks - verified the block-planning assertions locally, then ran a matched broader
donor checkpoint:
artifacts/live_pe_us_data_rebuild_checkpoint_20260413_dependent_partition_tax_leaves_donors/broader-donors-dependent-partition-tax-leaves-v1 - also requested an independent code review of the partition implementation
- added a block-level exact-match partition on
- read:
- the frontier result is again decisively worse:
- capped full-oracle loss:
0.6955 -> 1.2406 - active-solve capped loss:
0.7814 -> 1.6943
- capped full-oracle loss:
- the seeded child-dependent mass is still strongly suppressed:
- under-20
partnership_s_corp_income:74.5k - under-20
taxable_pension_income:257.4k - under-20
taxable_interest_income:3.33k
- under-20
- so the narrower support change did move the child rows, but still did not improve the real oracle objective
- the frontier result is again decisively worse:
- review findings:
- null partition keys would fall through to the global donor fallback instead of staying partitioned
is_tax_unit_dependentpartition labels were lossy after entity projection because the projected value could come from aFIRST-style collapse rather than the unit’s real dependent composition- empty donor partitions also fell back silently to the global donor pool, which weakened the exact-match semantics
- decision:
- reject the experiment and revert the code
- interpretation:
- the structural clue is still right: donor integration is the failure point
- but neither blunt post-donor zeroing nor this first exact-partition repair is a safe or effective solution
- the next lane should move closer to PE source-impute structure itself: leaf-specific block design and condition-surface parity for these AGI components, rather than more role-suppression heuristics
- hypothesis:
- the previous parity attempts may have failed because the current
pe_prespecifieddonor path was forcing these sparse PUF leaves onto a demographic-only condition surface - a narrower repair would keep the existing donor backend and singleton block
structure, but enrich the preferred condition surface for
partnership_s_corp_income,taxable_interest_income, andtaxable_pension_incomewith current income state
- the previous parity attempts may have failed because the current
- code path under test:
- expanded the preferred condition vars for those leaves to include
income,employment_income,self_employment_income, and for pension alsosocial_security - added focused regressions confirming that only those leaves changed their preferred-condition surface and that the pipeline resolved the extra income predictor when it was available
- ran a matched broader donor checkpoint:
artifacts/live_pe_us_data_rebuild_checkpoint_20260413_income_aware_puf_tax_leaves_donors/broader-donors-income-aware-puf-tax-leaves-v1
- expanded the preferred condition vars for those leaves to include
- verification:
- focused
py_compilepassed - focused
tests/test_variables.pyandtests/pipelines/test_us.pyslices passed before the real rerun
- focused
- read:
- the broader donor frontier metric still regresses:
- capped full-oracle loss:
0.6955 -> 0.7420 - active-solve capped loss:
0.7814 -> 0.8499 - selected constraints:
1031 -> 1027
- capped full-oracle loss:
- staged calibration improves the candidate internally, but the final result
still loses to the incumbent:
- post-stage-1 capped full-oracle loss:
0.8326 - post-stage-2 capped full-oracle loss:
0.7879 - final capped full-oracle loss:
0.7420
- post-stage-1 capped full-oracle loss:
- the broader donor frontier metric still regresses:
- PE code read:
- PolicyEngine does not solve this lane with richer singleton donor surfaces
- these leaves sit inside one sequential PUF QRF pass, with
partnership_s_corp_incomealso included in the override pass - the only donor-survey block directly touching one of them is the ACS path
for
taxable_pension_income
- decision:
- reject the richer singleton condition-surface patch and revert the code
- interpretation:
- this was a reasonable approximation attempt, but it still tried to emulate a joint sequential-QRF lane with a patched singleton-donor runtime
- local code read also confirms the ownership seam: provider order is
CPS -> PUF -> ACS -> SIPP -> SCF, these leaves are mapped directly by the PUF adapter before person expansion, and the current rebuild does not treat them as explicit direct-override variables - the next lane should stop broadening singleton condition surfaces and move toward the actual structure gap: how these PUF leaves enter the build before donor integration and how much of that lane should remain PUF-native rather than generic donor-imputed
- hypothesis:
- the richer singleton-condition experiment lost because it was still trying to fix a PUF-owned lane inside the generic donor runtime
- a narrower and more PE-aligned repair would move these leaves into a
provider-owned QRF hook at PUF tax-unit load time for
partnership_s_corp_income,taxable_interest_income, andtaxable_pension_income, then let the normal donor integration stack use the rebuilt PUF support
- code path under test:
- added a temporary PE-style QRF hook in
map_puf_variables()/_build_puf_tax_units()for exactly those three leaves - trained the temporary models from the PE extended CPS artifact and passed them through the PUF provider only; no calibration defaults or donor-engine logic changed
- ran a matched broader donor checkpoint:
artifacts/live_pe_us_data_rebuild_checkpoint_20260413_puf_tax_leaf_qrf_donors/broader-donors-puf-tax-leaf-qrf-v1
- added a temporary PE-style QRF hook in
- verification:
- focused
py_compilepassed - focused
tests/test_puf_source_provider.pyslices passed before the real rerun
- focused
- read:
- the broader donor frontier metric regresses sharply:
- capped full-oracle loss:
0.6955 -> 0.8729 - active-solve capped loss:
0.7814 -> 1.1545 - selected constraints:
1031 -> 1064
- capped full-oracle loss:
- the run completes cleanly, so this is a real model loss rather than a harness artifact
- the broader donor frontier metric regresses sharply:
- decision:
- reject the standalone PUF-native QRF hook and revert the code
- interpretation:
- this confirms that the structure problem is not just “put a QRF on the PUF side”
- moving the hook to the provider boundary without also reproducing the rest of PolicyEngine’s sequential clone/impute shape still gives the wrong runtime behavior
- the next lane should stay structural, but it needs to revisit the ownership boundary more carefully than “PUF provider QRF for three leaves”
- motivation:
- the sequential PUF joint experiment surfaced large child-linked AGI shifts that were hard to isolate from the full-oracle metrics
- we need a repeatable summary to compare child vs adult income components across seed, calibrated, and synthetic stages before touching calibration
- tool:
python -m microplex_us.pipelines.summarize_child_tax_unit_agi_drift <artifact>- summarizes per-person subsets (all, under-20, dependents-under-20, adults) and per-tax-unit subsets (all, with-children, without-children)
- uses the income variables that exist in the current artifact surfaces (total/income/employment/wage/self-employment/social-security/SSI/ public-assistance/pension/dividend/rental/tax-leaf components)
- initial read:
- wrote the latest summary to
artifacts/tmp_child_tax_unit_agi_drift_20260413.json - this will be the baseline diagnostic for upcoming PUF AGI ownership experiments before we touch calibration boundaries
- wrote the latest summary to
- scope:
- compared calibrated-stage child/adult income shares for three artifacts:
broader-donors-ssn-card-type-v1broader-donors-puf-personexpansion-family7-v1broader-donors-sequential-puf-joint-v1
- metric: dependents-under-20 sum divided by adult sum for each variable
- compared calibrated-stage child/adult income shares for three artifacts:
- read (dependents-under-20 sum share; calibrated stage):
- broader donors ssn-card-type:
- taxable interest:
0.0085 - taxable pension:
0.8507 - dividends:
0.0000 - partnership/S-corp:
0.9633 - rental:
0.0009 - wage:
0.0046 - employment:
0.0126
- taxable interest:
- broader donors puf-personexpansion family7:
- taxable interest:
0.0000 - taxable pension:
0.0000 - dividends:
0.0000 - partnership/S-corp:
0.0000 - rental:
0.0000 - wage:
0.0000 - employment:
0.0000
- taxable interest:
- broader donors sequential PUF joint:
- taxable interest:
0.3036 - taxable pension:
0.0960 - dividends:
0.1239 - partnership/S-corp:
0.2482 - rental:
0.0031 - wage:
0.0040 - employment:
0.0085
- taxable interest:
- broader donors ssn-card-type:
- interpretation:
- the sequential PUF joint path shifts significant child-linked mass into the interest/dividend/partnership lanes relative to the family7 baseline, while the SSN-card-type baseline already shows outsized child shares for pension and partnership components
- the next structural fixes should aim to move child-linked mass away from these PUF tax leaves without collapsing legitimate child wage/employment mass
- goal:
- reduce dependent-row tax-leaf spikes by softly capping PUF tax leaves on dependents at a fraction of base earned income
- configuration:
dependent_tax_leaf_soft_cap_multiplier=0.1, base variablesemployment_income,wage_income,self_employment_income - capped variables:
taxable_interest_income,tax_exempt_interest_income,taxable_pension_income,dividend_income,qualified_dividend_income,non_qualified_dividend_income,partnership_s_corp_income,rental_income
- run:
artifacts/live_pe_us_data_rebuild_checkpoint_20260414_dependent_tax_leaf_soft_cap/broader-donors-dependent-tax-leaf-softcap-v1
- result:
- full-oracle capped loss:
0.6955 -> 1.1498 - active-solve capped loss:
0.7814 -> 1.6832 - candidate beats harness MAE and composite parity loss but still loses the native broad loss check
- full-oracle capped loss:
- decision:
- reject the dependent tax-leaf soft cap guard
- interpretation:
- the soft cap removes too much mass in the dependent tail without improving the full-oracle fit; this needs a structural donor/conditioning fix rather than a post-hoc clip
- motivation:
- the dependent soft-cap failure reinforced that the problem is in donor conditioning structure, not in post-hoc clipping
- we needed artifact-level evidence for which predictors the
pe_prespecifiedlane actually keeps and which shared predictors it drops
- instrumentation:
- artifacts now carry
synthesis.donor_conditioning_diagnostics - added
python -m microplex_us.pipelines.summarize_donor_conditioning <artifact>to inspect selected vs dropped donor predictors by block
- artifacts now carry
- current structural hypothesis:
- keep the PE-style structural predictor backbone for the problematic zero-inflated PUF tax leaves
- admit a narrow supplemental shared set
(
employment_status,income,state_fips) instead of reopening the full broad-common predictor surface
- status:
- checkpoint run in progress; do not treat this as accepted or rejected yet
- run:
artifacts/live_pe_us_data_rebuild_checkpoint_20260414_structured_puf_shared_supplement/broader-donors-structured-puf-shared-supplement-v1
- result:
- full-oracle capped loss:
0.6955 -> 1.1739 - active-solve capped loss:
0.7814 -> 1.7118 - native broad loss:
0.0202 -> 9.6703 - harness MAE/composite parity still beat the incumbent slice, but the run failed the native broad loss gate again
- full-oracle capped loss:
- diagnostic read:
- the new donor-conditioning diagnostics show that for the four problematic
PUF tax-leaf blocks in this run (
qualified/non-qualified dividend,partnership_s_corp_income,taxable_interest_income,taxable_pension_income), the selected condition vars remained the pure PE structural set - the intended supplemental shared vars did not enter those blocks on the real artifact because they were not in the actual compatible shared overlap for those runs
- the new donor-conditioning diagnostics show that for the four problematic
PUF tax-leaf blocks in this run (
- decision:
- reject this exact supplement patch as a real fix
- interpretation:
- this was more diagnostic than corrective: the structured lane is still too narrow in practice, but the immediate blocker is not just "allow three more vars in semantics metadata"
- the next experiment needs to inspect why those income/state/employment features are absent from compatible overlap on the live PUF blocks, rather than assuming they can simply be appended to the preferred list
- run:
artifacts/live_pe_us_data_rebuild_checkpoint_20260414_structured_puf_shared_supplement_diag_smoke/broader-donors-structured-puf-shared-supplement-diagnostic-smoke-v1
- question:
- for the problematic PUF tax-leaf blocks, why do the requested supplemental
shared predictors fail to enter the live
pe_prespecifiedcondition set?
- for the problematic PUF tax-leaf blocks, why do the requested supplemental
shared predictors fail to enter the live
- read:
employment_statusfailed withincompatible_condition_supportstate_fipsfailed withincompatible_condition_supportincomefailed withexcluded_from_block_shared_overlap- this pattern repeated across the four main problematic blocks:
dividend split,
partnership_s_corp_income,taxable_interest_income, andtaxable_pension_income
- interpretation:
- the main blocker is upstream of the preferred-list merge
incomeappears to be dropped before block-level shared-overlap selectionemployment_statusandstate_fipssurvive as columns but fail the live compatibility check on the prepared donor/current condition frames
- status:
- superseded by the raw-overlap confirmation below
- immediate next step at the time:
- instrument the block-preparation path itself so we can distinguish a true overlap / compatibility failure from an earlier source-capability gate
- run:
artifacts/live_pe_us_data_rebuild_checkpoint_20260414_structured_puf_shared_supplement_diag_smoke/broader-donors-structured-puf-shared-supplement-diagnostic-smoke-v2
- question:
- after instrumenting raw overlap, are these supplemental PUF tax-leaf vars really failing in block preparation, or are they blocked earlier by source capability policy?
- read:
- across all four problematic PUF tax-leaf blocks, the raw supplemental
statuses for
employment_status,income, andstate_fipsare alldonor_source_disallows_conditioning - the prepared-stage readout remains:
employment_status->incompatible_condition_supportincome->excluded_from_block_shared_overlapstate_fips->incompatible_condition_support
- so the raw overlap never actually admitted those vars into the PUF donor conditioning pool in the first place
- across all four problematic PUF tax-leaf blocks, the raw supplemental
statuses for
- alignment read:
- local
policyengine-us-dataevidence resolves the PE question:policyengine_us_data/calibration/puf_impute.pytrains the PUF clone QRF onDEMOGRAPHIC_PREDICTORSonly, which matches the structuralage/ tax-unit-role backbone and does not useincome,employment_status, orstate_fips
- local
- interpretation:
- the prior supplemental-shared experiment was not just ineffective; it was also off the PE-aligned path
- the PUF source policy is doing the right thing by blocking those derived / non-geographic convenience columns as donor conditions
- action:
- keep the instrumentation and summarizer
- revert the PUF IRS tax-leaf semantics back to structural-only PE-style conditioning
- treat any future widening as an explicit challenger experiment using source-native PUF predictors, not as a PE-alignment patch
- run:
artifacts/live_pe_us_data_rebuild_checkpoint_20260414_pe_plus_puf_native_challenger_diag_smoke/puf-native-challenger-diag-smoke-v1
- question:
- if we add an explicit non-default challenger lane that keeps the PE structural backbone but appends a narrow source-native PUF overlap, do those vars actually enter the four problematic tax-leaf blocks on a live artifact?
- setup:
donor_imputer_condition_selection = pe_plus_puf_native_challenger- keep the PE structural predictors for the PUF IRS tax-leaf family
- append only explicit source-native challengers:
- dividend / taxable-interest blocks:
self_employment_income,rental_income,social_security_retirement - taxable-pension block:
social_security_retirement,social_security_disability,unemployment_compensation - partnership block:
self_employment_income,rental_income,alimony_income
- dividend / taxable-interest blocks:
- read:
- the challenger vars now enter the live artifact for all four targeted blocks
- selected sets were:
- dividend split:
PE structural backbone +
self_employment_income,rental_income,social_security_retirement taxable_interest_income: PE structural backbone +self_employment_income,rental_income,social_security_retirementtaxable_pension_income: PE structural backbone +social_security_retirement,social_security_disability,unemployment_compensationpartnership_s_corp_income: PE structural backbone +self_employment_income,rental_incomewhilealimony_incomefailed withincompatible_condition_support
- dividend split:
PE structural backbone +
- interpretation:
- this clears the immediate blocker from the earlier failed supplement patch:
we now have a real opt-in challenger lane whose native PUF predictors are
visible in live
donor_conditioning_diagnostics - the next real question is no longer "can the vars get in?" but "does this challenger help or hurt the PE-oracle losses once we run a full checkpoint"
- this clears the immediate blocker from the earlier failed supplement patch:
we now have a real opt-in challenger lane whose native PUF predictors are
visible in live
- next step:
- run one matched broader checkpoint with this challenger mode and compare it against the structural-only PE-aligned default