Skip to content

Commit 8a99769

Browse files
Tighten /diagnose-issue: input parity in Step 6, verify reporter's claims in Step 1 (#897)
* /diagnose-issue: require full TAXSIM input mapping when using direct Simulation Lesson from re-examining taxsim #882: forgetting to pass tax_unit_childcare_expenses in a direct Simulation situation zeroed the federal CDCC, which shifted tax_liability_if_not_itemizing by ~$300 and made it look like Microsim and Simulation produced different answers. They actually agreed — I was comparing apples to oranges because the inputs weren't identical. Step 6 now mandates a TAXSIM-to-PE variable cross-walk before running a direct Simulation, with a table of the easy-to-miss mappings (childcare → tax_unit_childcare_expenses, proptax → real_estate_taxes, mortgage → deductible_mortgage_interest, rentpaid → rent). Debugging checklist updated to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * /diagnose-issue Step 1: treat reporter's claims as hypothesis, verify against output.txt If the reporter cites a specific PE value, confirm it appears in the bundle's output.txt before building a diagnosis around it. Reporters sometimes paste values from a different case; without this check you can construct a wrong narrative around a wrong number. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 143227c commit 8a99769

2 files changed

Lines changed: 47 additions & 10 deletions

File tree

.claude/commands/diagnose-issue.md

Lines changed: 46 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,8 @@ Only if all three checks say "still relevant, fresh problem" do you go to Step 1
9999
- Read the description carefully for diagnostic hints
100100
- Note any specific numbers mentioned (PE vs TaxAct values)
101101

102+
Treat the issue body as a *hypothesis*, not as fact. If the reporter cites a specific PE value, confirm it appears in the bundled `output.txt` before building a diagnosis around it. Reporters sometimes mix up cases. If the cited value isn't there, work from what `output.txt` actually shows.
103+
102104
### Step 2: Verify the Input Parameters
103105
**CRITICAL**: Before deep-diving into code, verify the TAXSIM input parameters are correct!
104106

@@ -166,27 +168,61 @@ for path in sorted(glob.glob('/tmp/taxsim_$ISSUE/*.pdf')):
166168
**If the reporter's claim and the PDF differ, the PDF is correct.** Numeric claims in issue bodies are sometimes paraphrased or based on a stale PE run.
167169

168170
### Step 6: Deep Dive if Needed
169-
If the basic test shows incorrect values:
171+
172+
If the basic test shows incorrect values, drop into a direct `Simulation` to inspect individual variables.
173+
174+
**WARNING — input parity is critical.** If you write a `Simulation` situation that omits one of the TAXSIM inputs, the simulation will compute different intermediates than the emulator and you'll mis-attribute a real bug to a "framework difference." (Real example: omitting `tax_unit_childcare_expenses` zeroed out the federal CDCC in direct `Simulation`, which changed `tax_liability_if_not_itemizing` by ~$300 and made me think Microsim vs Simulation diverged when they actually agreed.)
175+
176+
**Mandatory TAXSIM-input → PE-variable cross-walk before running:**
177+
178+
| TAXSIM input | PE-US variable | Entity |
179+
|---|---|---|
180+
| `pwages`, `swages` | `employment_income` | person |
181+
| `intrec` | `taxable_interest_income` | person |
182+
| `pensions` | `taxable_pension_income` | person |
183+
| `gssi` | `social_security` | person |
184+
| `proptax` | `real_estate_taxes` | person |
185+
| `mortgage` | `deductible_mortgage_interest` | person |
186+
| `rentpaid` | `rent` | person |
187+
| `childcare` | `tax_unit_childcare_expenses` | **tax_unit** ← easy to miss |
188+
| `dividends` | `qualified_dividend_income` | person |
189+
| `stcg` | `short_term_capital_gains` | person |
190+
| `ltcg` | `long_term_capital_gains` | person |
191+
192+
Always look at the bundle's `txpydata.csv` and map **every non-zero column** before writing the situation. The canonical mapping is in `policyengine_taxsim/config/variable_mappings.yaml` if a column isn't in the table above.
170193

171194
```python
172-
# Test with Simulation directly
173195
from policyengine_us import Simulation
174196

197+
# Example — map EVERY non-zero TAXSIM input from txpydata.csv
175198
situation = {
176199
"people": {
177-
"person1": {"age": {"2024": 70}, "taxable_interest_income": {"2024": 64500}},
178-
"person2": {"age": {"2024": 70}, "taxable_interest_income": {"2024": 64500}},
200+
"head": {"age": {"2025": 65},
201+
"employment_income": {"2025": 1571.43}, # pwages
202+
"taxable_pension_income": {"2025": 46265.95}, # pensions
203+
"taxable_interest_income": {"2025": 36.44}, # intrec
204+
"social_security": {"2025": 30000}, # gssi
205+
"real_estate_taxes": {"2025": 30000}, # proptax
206+
"deductible_mortgage_interest": {"2025": 20000}, # mortgage
207+
},
208+
"k1": {"age": {"2025": 11}},
209+
"k2": {"age": {"2025": 2}},
179210
},
180-
"tax_units": {"tax_unit": {"members": ["person1", "person2"]}},
181-
"households": {"household": {"members": ["person1", "person2"], "state_fips": {"2024": 34}}},
182-
# ... other units
211+
"tax_units": {"tu": {"members": ["head", "k1", "k2"],
212+
"tax_unit_childcare_expenses": {"2025": 3000}}}, # childcare
213+
"households": {"hh": {"members": ["head", "k1", "k2"], "state_fips": {"2025": 8}}},
214+
"marital_units": {"m": {"members": ["head"]}, "m2": {"members": ["k1"]}, "m3": {"members": ["k2"]}},
215+
"families": {"f": {"members": ["head", "k1", "k2"]}},
216+
"spm_units": {"s": {"members": ["head", "k1", "k2"]}},
183217
}
184218

185219
sim = Simulation(situation=situation)
186-
print("State AGI:", sim.calculate("{state}_agi", 2024))
187-
print("Exclusion:", sim.calculate("{state}_retirement_exclusion", 2024))
220+
print("federal AGI:", sim.calculate("adjusted_gross_income", 2025))
221+
print("State AGI:", sim.calculate("co_agi", 2025))
188222
```
189223

224+
**Verify input parity before drawing conclusions**: after building the situation, run the emulator (`policyengine_taxsim/cli.py policyengine ...`) on the same row and check that key intermediates (`adjusted_gross_income`, `cdcc`, `ctc`, federal `income_tax`) match. If they don't, you're missing a TAXSIM input mapping — fix that before going further.
225+
190226
### Step 7: Research Legal Documentation
191227

192228
When PE and TaxAct disagree on a specific credit, deduction, or line item, **fetch the primary sources** — don't rely on web-search summaries. Search summaries are routinely wrong or stale (e.g., a search may claim "State X does not offer credit Y" when the statute clearly establishes it). Use search only to *find* the right primary-source URL, then fetch the document.
@@ -336,7 +372,7 @@ When an issue doesn't reproduce as expected:
336372
- [ ] **Filing status inference?** (`mstat=1 + depx≥1` → HoH, not single)
337373
- [ ] **Ages set correctly?** (Many provisions are age-gated)
338374
- [ ] **Income assigned to right person?** (Joint filers: check both)
339-
- [ ] **Test with Simulation directly?** (Bypasses taxsim mapping)
375+
- [ ] **Test with Simulation directly?** When you do, **map every non-zero TAXSIM input** (especially `tax_unit_childcare_expenses` from `childcare`, `real_estate_taxes` from `proptax`, `deductible_mortgage_interest` from `mortgage`) — missing inputs will make Simulation diverge from the emulator and you'll mis-attribute the gap to a framework difference. See Step 6.
340376
- [ ] **Check existing tests in policyengine-us?** (May show expected behavior)
341377
- [ ] **PDFs extracted and analyzed?** (Reporter's expected values may be wrong!)
342378
- [ ] **Compared current PE vs TaxAct?** Every PE value queried directly (no inference from gaps between variables).
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Add Step 6 input-parity warning to /diagnose-issue skill: when running direct Simulation, map every non-zero TAXSIM input from txpydata.csv before drawing conclusions. Includes the TAXSIM-to-PE variable cross-walk.

0 commit comments

Comments
 (0)