|
| 1 | +Add a chunked mixed-geography matrix builder for memory-bounded national |
| 2 | +calibration (`--chunked-matrix`) that streams matrix columns in clone-household |
| 3 | +chunks with resumable per-chunk COO shards, progress logging (running average, |
| 4 | +elapsed, ETA), and a shared `entity_clone` module for household-subset |
| 5 | +materialization. |
| 6 | + |
| 7 | +Fix three target-input integrity bugs surfaced by a new |
| 8 | +`analyze_target_consistency` diagnostic that flags cross-level and |
| 9 | +AGI-bucket-coverage inconsistencies: |
| 10 | + |
| 11 | +- Drop the IRS workbook override for `total_self_employment_income`, |
| 12 | + `tax_unit_partnership_s_corp_income`, and `net_capital_gains`. The workbook |
| 13 | + columns `business_net_profits` / `partnership_and_s_corp_income` / |
| 14 | + `capital_gains_gross` are gross-only, while the geography-file line codes |
| 15 | + 00900 / 26270 / 01000 already report net-of-loss. The override inflated |
| 16 | + these national targets by +40.7% / +26.1% / +3.1% at 2023 values. After |
| 17 | + the fix, all three reconcile to the penny across national, state, and |
| 18 | + district levels. |
| 19 | +- Remove the self-employment QRF winsor in `puf_impute.py`. QRF predictions |
| 20 | + are already bounded by training support; the 0.5/99.5 percentile clip |
| 21 | + was discarding the top 0.5% of legitimate signal and truncating imputed |
| 22 | + self-employment income at ~$1.1M vs the PUF training max of $74.6M. |
| 23 | +- Replace percentile-based top selection in `create_stratified_cps` with |
| 24 | + per-bracket caps (400/400/400/300/300 for the $500k-$1M through $10M+ |
| 25 | + bands). Stops PUF templates from piling up above $10M and starving the |
| 26 | + middle-high $1M-$10M range. |
| 27 | + |
| 28 | +Split calibration checkpoint signature validation into fatal structural |
| 29 | +mismatches and soft hyperparameter mismatches, letting callers tune |
| 30 | +`lambda_l0`, `beta`, `lambda_l2`, and `learning_rate` across resume phases. |
| 31 | + |
| 32 | +Add `income_tax` national and state SOI targets, drop the unachievable |
| 33 | +JCT `deductible_mortgage_interest` target, and preserve positive mortgage |
| 34 | +interest inputs through structural conversion. |
| 35 | + |
| 36 | +Retune the national Modal calibration to `lambda_l0=2e-2` at 1000 epochs |
| 37 | +and align `modal_app/pipeline.py` `log_freq` to 100. |
| 38 | + |
| 39 | +Harden `make clean` so its ignored-CSV cleanup skips local environment and |
| 40 | +dependency directories such as `.venv/`, `venv/`, `env/`, `.tox/`, `.nox/`, |
| 41 | +and `node_modules/`, avoiding accidental deletion of package data inside local |
| 42 | +virtual environments. |
0 commit comments