Skip to content

Commit 674f566

Browse files
MaxGhenisclaude
andauthored
Derive partnership_se_income from PUF source columns (#485)
* Derive partnership_se_income from PUF instead of looking for missing columns The raw IRS PUF doesn't contain k1bx14p/k1bx14s columns - these are derived by PSLmodels/taxdata from the total SE income (E30400/E30500) minus Schedule C (E00900) and Schedule F (E02100) income. This fix implements the same derivation logic from taxdata's finalprep.py split_earnings_variables function. The formula is: partnership_se = (E30400 + E30500) - E00900 - E02100 This ensures partnership_se_income has non-zero values in the PUF-based datasets, enabling accurate SE tax calculations for general partners. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Use Yale Budget Lab's gross-up approach for partnership_se_income The E30400/E30500 PUF columns are already TAXABLE SE income (post-0.9235 deduction factor). Since PolicyEngine applies the 0.9235 factor itself in taxable_self_employment_income, we need to provide GROSS partnership SE income. Changes: - Gross up E30400+E30500 by dividing by 0.9235 before subtracting Sch C/F - Only compute when partnership activity exists (E25940+E25980-E25920-E25960 != 0) This aligns with Yale Budget Lab's Tax-Data approach in process_puf.R: part_se = if_else(E25940 + E25980 - E25920 - E25960 != 0, (E30400 + E30500) / 0.9235 - E00900 - E02100, 0) Weighted sum increases from $12.7B to $55.7B, which is more realistic given total SE income of ~$400B. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add changelog entry --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent c1d5cd0 commit 674f566

2 files changed

Lines changed: 29 additions & 5 deletions

File tree

changelog_entry.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
- bump: patch
2+
changes:
3+
fixed:
4+
- Derive partnership_se_income from PUF source columns using Yale Budget Lab's gross-up approach instead of looking for non-existent k1bx14 columns.

policyengine_us_data/datasets/puf/puf.py

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -383,11 +383,31 @@ def preprocess_puf(puf: pd.DataFrame) -> pd.DataFrame:
383383
# Ignore cmbtp (estimate of AMT income not in AGI)
384384

385385
# Partnership self-employment income from Schedule K-1 Box 14
386-
# This is the portion of partnership income subject to SE tax (general partners only)
387-
# k1bx14p = taxpayer, k1bx14s = spouse
388-
k1bx14p = puf["k1bx14p"] if "k1bx14p" in puf.columns else 0
389-
k1bx14s = puf["k1bx14s"] if "k1bx14s" in puf.columns else 0
390-
puf["partnership_se_income"] = k1bx14p + k1bx14s
386+
# This is the portion of partnership income subject to SE tax (general partners)
387+
# Derived from total SE income minus Schedule C and Schedule F income
388+
# Based on Yale Budget Lab's Tax-Data process_puf.R approach:
389+
# E30400 = taxpayer's TAXABLE SE income (already * 0.9235)
390+
# E30500 = spouse's TAXABLE SE income (already * 0.9235)
391+
# E00900 = Schedule C net profit/loss (gross)
392+
# E02100 = Schedule F farm income (gross)
393+
# Since E30400/E30500 are post-deduction (taxable), we gross them up
394+
# by dividing by 0.9235 before subtracting Sch C/F.
395+
# PolicyEngine applies the 0.9235 factor itself in taxable_self_employment_income.
396+
SE_DEDUCTION_FACTOR = 0.9235 # 1 - 0.5 * 0.153 (half of SE tax rate)
397+
taxable_se = puf["E30400"].fillna(0) + puf["E30500"].fillna(0)
398+
gross_se = taxable_se / SE_DEDUCTION_FACTOR
399+
schedule_c_f_income = puf["E00900"].fillna(0) + puf["E02100"].fillna(0)
400+
# Only compute when there's partnership activity (net partnership income != 0)
401+
has_partnership = (
402+
puf["E25940"].fillna(0)
403+
+ puf["E25980"].fillna(0)
404+
- puf["E25920"].fillna(0)
405+
- puf["E25960"].fillna(0)
406+
) != 0
407+
partnership_se = np.where(
408+
has_partnership, gross_se - schedule_c_f_income, 0
409+
)
410+
puf["partnership_se_income"] = partnership_se
391411

392412
# --- Qualified Business Income Deduction (QBID) simulation ---
393413
w2, ubia = simulate_w2_and_ubia_from_puf(puf, seed=42)

0 commit comments

Comments
 (0)