Skip to content

Commit 2e1d2d5

Browse files
Audit and tighten /diagnose-issue slash command (#895)
* Audit and tighten /diagnose-issue skill Based on recent issue runs (#877, #879, #884, #885, #886, #889, #890): - Add Step 0 pre-triage with three short-circuit checks: (a) bug claim vs informational question, (b) compare reporter's PE version to local + CHANGELOG grep, (c) gh search for existing PE-US tracking issue/PR. - Reframe Critical Rule #1 from "PE / TAXSIM / TaxAct" to "current PE / reporter's PE / TaxAct" — matches what the bundle actually contains. - Add Critical Rule #4: verify against primary sources, not search summaries (search results on state tax law are frequently stale or wrong). - Expand Step 7: when PE and TaxAct disagree, fetch all three — statute text, current-year form PDF, instructions booklet — and cross-reference before concluding. - Document mstat=1 + depx>=1 -> HoH inference in Step 2. - Step 5 PDF snippet iterates all bundle PDFs instead of hardcoded form.pdf. - Step 9 drops stale issue_analysis tracker; adds cross-link step. - Remove the always-404 <issue>.yaml from the batch download. - Fix v37/v38/v39 variable-table placeholders that lost their state-substitution braces. - Fix self-referencing example link in Common Root Causes. Tested on issue #886 (WV CDCC): new Step 0c surfaced PR #3019 quickly; new Step 7 forced fetching the 2025 IT-140 booklet (page 7 recap + page 17 description), confirming PE correctly implements W.Va. Code §11-21-26. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Skill audit: query PE directly; drop reporter's PE column Two adjustments based on testing on issue #883: - Critical Rule #1: comparison is now "current PE vs TaxAct only". Reporter's PE version is still useful for Step 0b triage (version comparison + changelog grep) but never for the actual diagnosis comparison. Avoids confusion about which PE values to use. - Step 4: comparison table now has just two PE columns and an explicit rule that every PE value MUST come from a direct query (Step 3 CSV or `Simulation.calculate(...)`). Never infer a PE value from a gap between other variables. - Debugging checklist updated to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add changelog fragment for diagnose-issue skill audit Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 8f9519c commit 2e1d2d5

2 files changed

Lines changed: 115 additions & 56 deletions

File tree

.claude/commands/diagnose-issue.md

Lines changed: 114 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,10 @@ You are helping diagnose discrepancies between PolicyEngine and TAXSIM tax calcu
44

55
## Critical Rules
66

7-
1. **Always compare ALL THREE systems**: PolicyEngine, TAXSIM, and TaxAct. Never conclude based on just two.
7+
1. **Compare current PE vs TaxAct.** Only the current PE-US install matters for diagnosis; the reporter's PE version in `output.txt` is useful for triage (Step 0b) but never for the actual comparison. True TAXSIM-proper output is rarely in the bundle — don't pretend to compare against it if it isn't there.
88
2. **NEVER post GitHub issues, comments, or PRs without explicit user confirmation.** Always show draft content first and wait for approval.
99
3. **Phrase TAXSIM issues as questions** (e.g., "Does TAXSIM-35 incorrectly apply...?" not "TAXSIM-35 incorrectly applies...").
10+
4. **Verify against primary sources, not search summaries.** When PE and TaxAct disagree on a specific credit or deduction, fetch the actual statute text + current-year form PDF + instructions booklet (see Step 7). Web-search summaries about state tax law are routinely wrong or stale.
1011

1112
## Repositories
1213

@@ -23,9 +24,8 @@ GitHub:
2324
Each issue at https://github.com/PolicyEngine/policyengine-taxsim/issues contains:
2425
1. **Title pattern**: `{STATE} {filing_status} {year} {income_description}` (e.g., "NJ joint 2024 elderly 129Kintrec")
2526
2. **Description/Comments** - Often contains diagnostic hints, suspected root cause, or specific observations from the filer (READ THIS CAREFULLY - it often points to the problem)
26-
3. **YAML test file** - PolicyEngine situation and expected outputs (attached)
27-
4. **TaxAct PDF** - The ground truth tax forms (attached)
28-
5. **TAXSIM reference files** - Available at `taxsim.nber.org/out2psl/{issue_number}/`
27+
3. **TaxAct PDFs** - Federal 1040, the state form, and any relevant schedules
28+
4. **TAXSIM reference files** - Available at `taxsim.nber.org/out2psl/{issue_number}/`. Typical contents: `txpydata.csv` (input), `output.txt` (PE emulator output with version banner), `<issue>.txt` (full run log), and one or more PDFs. There is **no YAML file** — don't expect one.
2929

3030
**IMPORTANT**: The issue description often contains the filer's analysis of what's wrong. Start by reading the description carefully before diving into code.
3131

@@ -39,7 +39,6 @@ mkdir -p /tmp/taxsim_$ISSUE && cd /tmp/taxsim_$ISSUE
3939

4040
# Batch download all files
4141
curl -sL "http://taxsim.nber.org/out2psl/$ISSUE/" -o index.html
42-
curl -sL "http://taxsim.nber.org/out2psl/$ISSUE/$ISSUE.yaml" -o $ISSUE.yaml &
4342
curl -sL "http://taxsim.nber.org/out2psl/$ISSUE/$ISSUE.txt" -o $ISSUE.txt &
4443
curl -sL "http://taxsim.nber.org/out2psl/$ISSUE/output.txt" -o output.txt &
4544
curl -sL "http://taxsim.nber.org/out2psl/$ISSUE/txpydata.csv" -o txpydata.csv &
@@ -56,6 +55,45 @@ Then work ENTIRELY from local files - no more network calls to TAXSIM.
5655

5756
## Diagnostic Steps
5857

58+
### Step 0: Pre-diagnosis triage (DO THIS FIRST — it short-circuits a lot of work)
59+
60+
Three quick checks before any file downloads or PE runs. If any of them fires, you may be done in 5 minutes instead of an hour.
61+
62+
**0a. Is this a bug report or an informational question?**
63+
64+
Read the issue body. If the reporter is asking *what PE does* or *how PE handles X* (e.g., "What does PE use for fuel cost?") without claiming a specific wrong value, this is **informational**. Skip the full diagnosis — answer their question directly with code references, draft a comment, and confirm before posting. Don't file a PE-US issue.
65+
66+
If the body claims a specific discrepancy ("PE returns $X, TaxAct returns $Y, should be $Z"), it's a **bug claim** — continue.
67+
68+
**0b. Is this already fixed in current PE?**
69+
70+
```bash
71+
# Reporter's PE-US version (printed in <issue>.txt near the top)
72+
grep -A1 "policyengine-us" /tmp/taxsim_$ISSUE/$ISSUE.txt | head -3
73+
74+
# Local PE-US version
75+
pip show policyengine-us | grep -i version
76+
```
77+
78+
If the local version is much newer than the reporter's, grep the changelog for relevant state/feature work in the gap:
79+
80+
```bash
81+
cd /Users/pavelmakarchuk/policyengine-us && grep -in "<state-name>\|<state-abbrev>" CHANGELOG.md | head -20
82+
```
83+
84+
If a relevant fix landed between the reporter's version and yours, run a quick PE-direct test with the current version (Step 3) to confirm the issue no longer reproduces. If it doesn't reproduce, close with a brief "after model adjustments, the values now align" comment and stop.
85+
86+
**0c. Is there already an open PE-US issue or PR tracking this?**
87+
88+
```bash
89+
gh search issues --repo PolicyEngine/policyengine-us "<state> <variable-or-symptom>" --include-prs \
90+
--json number,title,state,url
91+
```
92+
93+
Try a few queries (state name, specific PE variable, taxsim issue number). If there's an existing tracking issue or open PR that addresses this, just cross-link with a short comment ("Will be addressed here: <PR link>") and stop.
94+
95+
Only if all three checks say "still relevant, fresh problem" do you go to Step 1.
96+
5997
### Step 1: Read the Issue
6098
- Fetch the issue from GitHub (`gh issue view {number} --repo PolicyEngine/policyengine-taxsim`)
6199
- Read the description carefully for diagnostic hints
@@ -66,7 +104,7 @@ Then work ENTIRELY from local files - no more network calls to TAXSIM.
66104

67105
Common data entry errors to check:
68106
- **State code**: TAXSIM uses alphabetical numbering (1-51), NOT FIPS codes!
69-
- **Filing status (mstat)**: 1=single, 2=joint, 6=dependent
107+
- **Filing status (mstat)**: `1=single`, `2=joint`, `6=dependent`. Note: TAXSIM has no separate HoH code — **PE infers HoH from `mstat=1` with `depx≥1`**. So `mstat=1, depx=0` is true single; `mstat=1, depx≥1` is HoH. Most recent issues are HoH despite `mstat=1`.
70108
- **Ages (page/sage)**: Required for age-based provisions
71109

72110
### Step 3: Test with PolicyEngine Directly
@@ -82,34 +120,36 @@ python policyengine_taxsim/cli.py policyengine /tmp/test.csv -o /tmp/output.csv
82120
cat /tmp/output.csv
83121
```
84122

85-
### Step 4: Three-Way Comparison
86-
Compare ALL THREE values for each key variable:
123+
### Step 4: Comparison table
124+
125+
Compare current PE values against the TaxAct PDF. **Every PE value in the table must come from a direct PE query (Step 3 CSV output or a `Simulation.calculate(...)` call)** — never infer a PE value from gaps between other variables. If you want the pension deduction, query `me_pension_income_deduction` directly; don't subtract AGI − federal AGI.
87126

88-
| Variable | PolicyEngine | TAXSIM | TaxAct (PDF) | Who's Right? |
89-
|----------|-------------|--------|--------------|--------------|
90-
| siitax | | | | |
91-
| fiitax | | | | |
92-
| v10 | | | | |
93-
| v32 | | | | |
94-
| v36 | | | | |
127+
| Variable | Current PE (queried) | TaxAct (PDF) | Who's right? |
128+
|----------|----------------------|--------------|--------------|
129+
| siitax | | | |
130+
| fiitax | | | |
131+
| v10 | | | |
132+
| v32 | | | |
133+
| v36 | | | |
95134

96135
**If v32=0 for a state tax issue**: The state isn't being set correctly. Check the state code!
97136

98137
### Step 5: Extract and Analyze TaxAct PDF Forms (MANDATORY)
99138

100139
**THIS STEP IS CRITICAL AND MUST NOT BE SKIPPED.**
101140

102-
The TaxAct PDF contains the actual filled-out tax forms - this is the ground truth. The YAML expected values may be incorrect, so always verify against the actual PDF.
141+
The TaxAct PDFs contain the actual filled-out tax forms this is the ground truth. Issues usually bundle multiple PDFs (federal 1040, the state form, and any relevant schedules). Iterate them:
103142

104143
```bash
105-
# Extract text using PyMuPDF (from local downloaded file)
106144
python3 -c "
107-
import fitz
108-
doc = fitz.open('/tmp/taxsim_$ISSUE/form.pdf')
109-
for page_num in range(len(doc)):
110-
page = doc[page_num]
111-
print(f'=== Page {page_num + 1} ===')
112-
print(page.get_text())
145+
import fitz, glob
146+
for path in sorted(glob.glob('/tmp/taxsim_$ISSUE/*.pdf')):
147+
print(f'=== FILE: {path} ===')
148+
doc = fitz.open(path)
149+
for page_num in range(len(doc)):
150+
page = doc[page_num]
151+
print(f'--- Page {page_num + 1} ---')
152+
print(page.get_text())
113153
"
114154
```
115155

@@ -123,7 +163,7 @@ for page_num in range(len(doc)):
123163
6. **Tax Due** - The actual tax calculated on the form
124164
7. **Credits** - Which credits were claimed and amounts
125165

126-
**If YAML expected and PDF differ, the PDF is correct.** The YAML may have been generated incorrectly.
166+
**If the reporter's claim and the PDF differ, the PDF is correct.** Numeric claims in issue bodies are sometimes paraphrased or based on a stale PE run.
127167

128168
### Step 6: Deep Dive if Needed
129169
If the basic test shows incorrect values:
@@ -148,10 +188,18 @@ print("Exclusion:", sim.calculate("{state}_retirement_exclusion", 2024))
148188
```
149189

150190
### Step 7: Research Legal Documentation
151-
If PE logic appears wrong, verify against official sources:
152-
- State tax form instructions (primary source)
153-
- State tax code/statutes
154-
- Tax Foundation / Tax Policy Center summaries
191+
192+
When PE and TaxAct disagree on a specific credit, deduction, or line item, **fetch the primary sources** — don't rely on web-search summaries. Search summaries are routinely wrong or stale (e.g., a search may claim "State X does not offer credit Y" when the statute clearly establishes it). Use search only to *find* the right primary-source URL, then fetch the document.
193+
194+
For state credits, verify all three of:
195+
196+
1. **Statute text**`WebFetch` the actual statute (e.g., `https://code.<state>legislature.gov/<section>/`). Quote the relevant language verbatim. Confirms whether the credit exists in law.
197+
2. **Current-year form PDF** — fetch the current-year state return PDF and check whether the credit actually appears as a line. A statutory credit that hasn't been operationalized on the form may not be claimable in practice for that year.
198+
3. **Current-year instructions booklet** — fetch the instructions PDF and look for eligibility criteria, income caps, age requirements, or filing prerequisites that PE may not model.
199+
200+
Cross-reference: statute → form line → instructions eligibility. If any of the three says something different, **hedge in your write-up** — don't claim PE is correct just because the statute is on the books, and don't claim PE is wrong just because TaxAct's PDF didn't apply a credit (TaxAct can miss things, too; the bundled PDFs sometimes omit schedules).
201+
202+
For federal items, use IRS publications and Form 1040 instructions directly.
155203

156204
### Step 8: Check policyengine-us Implementation
157205
```bash
@@ -162,16 +210,16 @@ ls /Users/pavelmakarchuk/policyengine-us/policyengine_us/variables/gov/states/{s
162210
grep -r "variable_name" /Users/pavelmakarchuk/policyengine-us/policyengine_us/
163211
```
164212

165-
### Step 9: Document Finding
166-
Update the tracker in `issue_analysis/README.md`.
213+
### Step 9: Draft and confirm
167214

168215
If PE needs a fix, **draft** an issue for policyengine-us with:
169216
1. Summary of the problem
170-
2. Root cause analysis with code references
171-
3. Suggested fix
172-
4. Integration test with correct expected values (from TaxAct PDF, not PE's buggy output)
217+
2. Link back to the originating taxsim issue
218+
3. Root cause analysis with code references
219+
4. Suggested fix
220+
5. Integration test with correct expected values (from TaxAct PDF, not PE's buggy output)
173221

174-
**Show the draft to the user and wait for approval before posting.**
222+
**Show the draft to the user and wait for approval before posting.** After posting, cross-link the PE-US issue back from the taxsim issue with a short comment.
175223

176224
---
177225

@@ -214,57 +262,65 @@ print(get_state_code(34)) # Should print "NC"
214262
| Var | Description | PE Variable |
215263
|-----|-------------|-------------|
216264
| fiitax | Federal income tax | income_tax |
217-
| siitax | State income tax | state_income_tax |
265+
| siitax | State income tax | `<state>_income_tax` |
218266
| v10 | Federal AGI | adjusted_gross_income |
219267
| v13 | Standard Deduction | standard_deduction |
220268
| v18 | Taxable Income | taxable_income |
221269
| v22 | Child Tax Credit | ctc_value |
222270
| v25 | EITC | eitc |
223-
| v32 | State AGI | {state}_agi |
224-
| v34 | State Std Deduction | {state}_standard_deduction |
225-
| v36 | State Taxable Income | {state}_taxable_income |
271+
| v32 | State AGI | `<state>_agi` |
272+
| v34 | State Std Deduction | `<state>_standard_deduction` |
273+
| v36 | State Taxable Income | `<state>_taxable_income` |
226274

227275
---
228276

229277
## Common Root Causes
230278

231-
### 1. YAML Expected Value Incorrect
232-
- YAML expected value doesn't match actual TaxAct PDF form
233-
- Always verify by extracting and reading the PDF
234-
- Example: Issue #657 YAML said $147.97 but PDF showed $0
279+
### 1. Reporter's claim based on stale PE version
280+
- Reporter ran an older PE-US; the issue is already fixed.
281+
- Caught by Step 0b (version comparison + changelog grep).
282+
283+
### 2. Already-tracked in policyengine-us
284+
- An open PE-US issue or PR is in flight covering the same root cause.
285+
- Caught by Step 0c (`gh search issues`).
286+
287+
### 3. YAML / Reporter's Expected Value Incorrect
288+
- Numbers cited in the issue body don't match the actual TaxAct PDF form
289+
- Always verify by extracting and reading the PDFs
290+
- Example: reporter said "PE allows $502" but PDF showed PTC = $0 and PE returned $0 (the $502 was a max-table value, not what PE returned)
235291

236-
### 2. Data Entry Errors in Issue
292+
### 4. Data Entry Errors in Issue
237293
- Wrong state code (FIPS vs TAXSIM alphabetical)
238294
- Missing or incorrect age (breaks age-based provisions)
239-
- Wrong filing status
295+
- Wrong filing status (most commonly: assuming `mstat=1` is single when it's HoH with dependents)
240296

241-
### 3. TAXSIM Bug
297+
### 5. TAXSIM Bug
242298
- TAXSIM source code has known bugs (SALT add-back, DTC phaseout, EITC age limits)
243-
- Compare all three systems to identify
299+
- Compare current PE vs reporter's PE vs TaxAct to triangulate
244300

245-
### 4. Missing State Provisions in PE
301+
### 6. Missing State Provisions in PE
246302
- State credit/deduction not implemented
247303
- Year-specific parameter not updated
248304

249-
### 5. Missing Optimization in PE
305+
### 7. Missing Optimization in PE
250306
- PE uses fixed 50/50 splits for exemptions/deductions
251307
- Some states (e.g., MS) allow optimal allocation between spouses
252308
- TaxAct optimizes these allocations automatically
253309

254-
### 6. Input Mapping Issues (policyengine-taxsim)
310+
### 8. Input Mapping Issues (policyengine-taxsim)
255311
- Income not being split correctly between spouses
256312
- Variable not mapped from TAXSIM input to PE situation
257313

258-
### 7. Output Mapping Issues (policyengine-taxsim)
314+
### 9. Output Mapping Issues (policyengine-taxsim)
259315
- State variable name not being substituted correctly
260316
- Variable exists but not in output mapping
261317

262-
### 8. Calculation Logic Issues (policyengine-us)
318+
### 10. Calculation Logic Issues (policyengine-us)
263319
- Eligibility criteria wrong
264320
- Formula doesn't match tax form worksheet
265321
- Parameter values outdated
266322

267-
### 9. Negative Income Edge Cases
323+
### 11. Negative Income Edge Cases
268324
- Credits/taxes not handling negative AGI or capital losses correctly
269325
- Phantom credits when income is negative
270326

@@ -274,15 +330,17 @@ print(get_state_code(34)) # Should print "NC"
274330

275331
When an issue doesn't reproduce as expected:
276332

333+
- [ ] **Did Step 0 (triage) short-circuit?** Already fixed in newer PE, already tracked, or informational only?
277334
- [ ] **State code correct?** (TAXSIM alphabetical, not FIPS)
278335
- [ ] **v32 (State AGI) non-zero?** (If 0, state setup is wrong)
336+
- [ ] **Filing status inference?** (`mstat=1 + depx≥1` → HoH, not single)
279337
- [ ] **Ages set correctly?** (Many provisions are age-gated)
280338
- [ ] **Income assigned to right person?** (Joint filers: check both)
281339
- [ ] **Test with Simulation directly?** (Bypasses taxsim mapping)
282340
- [ ] **Check existing tests in policyengine-us?** (May show expected behavior)
283-
- [ ] **PDF form extracted and analyzed?** (YAML expected values may be wrong!)
284-
- [ ] **PDF tax matches YAML expected?** (If not, use PDF as ground truth)
285-
- [ ] **All three systems compared?** (PE, TAXSIM, and TaxAct)
341+
- [ ] **PDFs extracted and analyzed?** (Reporter's expected values may be wrong!)
342+
- [ ] **Compared current PE vs TaxAct?** Every PE value queried directly (no inference from gaps between variables).
343+
- [ ] **For credit/deduction disagreements: did you fetch the statute + current-year form + instructions booklet?** (Search summaries are not authoritative.)
286344

287345
---
288346

@@ -293,12 +351,12 @@ When an issue doesn't reproduce as expected:
293351
- `policyengine_taxsim/core/input_mapper.py` - Converts TAXSIM input to PE situations
294352
- `policyengine_taxsim/core/output_mapper.py` - Extracts PE results to TAXSIM format
295353
- `policyengine_taxsim/core/utils.py` - State code mappings (SOI_TO_FIPS_MAP)
296-
- `issue_analysis/` - Diagnosis tracking and findings
297354

298355
### policyengine-us
299356
- `policyengine_us/variables/gov/states/{state}/tax/income/` - State tax variables
300357
- `policyengine_us/parameters/gov/states/{state}/tax/income/` - State tax parameters
301358
- `policyengine_us/tests/policy/baseline/gov/states/{state}/` - Existing tests
359+
- `CHANGELOG.md` - Use to compare reporter's PE version to current and find relevant recent fixes
302360

303361
---
304362

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Audit and tighten the /diagnose-issue slash command: add Step 0 pre-triage (Q-vs-bug / version compare / existing PE-US tracking), require primary-source fetches for credit/deduction disagreements, mandate direct PE queries (no inference), and drop stale references.

0 commit comments

Comments
 (0)