Skip to content

Commit fc2608b

Browse files
vahid-ahmadiclaude
andcommitted
Merge origin/main into targets-registry
Resolve conflict in utils/loss.py (keep registry delegation). Incorporate salary sacrifice headcount targets from PR #268 into the structured registry (obr.py, compute/income.py, build_loss_matrix.py). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2 parents 07e3452 + 4aa0077 commit fc2608b

9 files changed

Lines changed: 227 additions & 20 deletions

File tree

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [1.37.0] - 2026-02-16 10:16:06
9+
10+
### Added
11+
12+
- Calibrate salary sacrifice population to HMRC/ASHE headcount targets (7.7mn total, 3.3mn above 2k cap, 4.3mn below 2k cap). Two-stage imputation in salary_sacrifice.py converts pension contributors to below-cap SS users, and three new headcount calibration targets in loss.py.
13+
814
## [1.36.2] - 2026-01-21 19:06:42
915

1016
### Fixed
@@ -592,6 +598,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
592598

593599

594600

601+
[1.37.0]: https://github.com/PolicyEngine/policyengine-us-data/compare/1.36.2...1.37.0
595602
[1.36.2]: https://github.com/PolicyEngine/policyengine-us-data/compare/1.36.1...1.36.2
596603
[1.36.1]: https://github.com/PolicyEngine/policyengine-us-data/compare/1.36.0...1.36.1
597604
[1.36.0]: https://github.com/PolicyEngine/policyengine-us-data/compare/1.35.1...1.36.0

changelog.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -517,3 +517,11 @@
517517
- Use region.values for Scotland comparisons in loss function to ensure consistent
518518
behavior with StringArray types
519519
date: 2026-01-21 19:06:42
520+
- bump: minor
521+
changes:
522+
added:
523+
- Calibrate salary sacrifice population to HMRC/ASHE headcount targets (7.7mn
524+
total, 3.3mn above 2k cap, 4.3mn below 2k cap). Two-stage imputation in salary_sacrifice.py
525+
converts pension contributors to below-cap SS users, and three new headcount
526+
calibration targets in loss.py.
527+
date: 2026-02-16 10:16:06

policyengine_uk_data/datasets/imputations/salary_sacrifice.py

Lines changed: 54 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,19 @@
11
"""
22
Salary sacrifice imputation for pension contributions.
33
4-
This module imputes salary sacrifice pension amounts using QRF trained on
5-
FRS respondents who were asked the SALSAC question. The model predicts
6-
the continuous amount (pension_contributions_via_salary_sacrifice), with
7-
non-participants naturally having 0.
4+
Two-stage imputation:
85
9-
Training data (FRS 2023-24):
10-
- SALSAC='1' (Yes): ~224 jobs with reported SPNAMT amounts
11-
- SALSAC='2' (No): ~3,803 jobs with SPNAMT=0
6+
1. QRF trained on FRS respondents who were asked SALSAC (~224 yes,
7+
~3,803 no). Predicts SS amounts for ~13,265 jobs where SALSAC was
8+
not asked.
129
13-
Imputation candidates:
14-
- SALSAC=' ' (skip/not asked): ~13,265 jobs
10+
2. Headcount-targeted imputation: converts a fraction of pension
11+
contributors without SS into below-cap (≤£2,000) SS users, moving
12+
employee pension contributions to salary sacrifice. Targets the
13+
OBR/ASHE estimate of ~4.3mn below-cap SS users.
1514
16-
Targeting to HMRC totals (~24bn SS contributions) happens via weight
17-
calibration, not in this imputation step.
15+
Exact monetary totals (~£24bn SS contributions) and final headcount
16+
calibration happen via weight optimisation in a subsequent step.
1817
"""
1918

2019
import pandas as pd
@@ -124,13 +123,10 @@ def impute_salary_sacrifice(
124123
"""
125124
Impute salary sacrifice pension amounts for FRS non-respondents.
126125
127-
For respondents not asked about salary sacrifice (SALSAC=' '), uses
128-
a QRF model trained on those who were asked to predict the SS pension
129-
contribution amount directly. The model naturally predicts 0 for
130-
non-participants and positive amounts for likely participants.
131-
132-
Note: This imputation does NOT target any specific total. Targeting
133-
to HMRC figures happens via weight calibration in a subsequent step.
126+
Stage 1: QRF predicts SS amounts for respondents not asked SALSAC.
127+
Stage 2: Converts a fraction of pension contributors to below-cap
128+
SS users, targeting ~4.3mn (OBR/ASHE). Moves employee pension
129+
contributions to salary sacrifice to keep total pension consistent.
134130
135131
Args:
136132
dataset: PolicyEngine UK dataset with salary_sacrifice_asked
@@ -183,7 +179,46 @@ def impute_salary_sacrifice(
183179
imputed_ss, # Use imputed for non-respondents
184180
)
185181

186-
# Update dataset
182+
# Stage 2: Headcount-targeted imputation for below-cap SS users.
183+
# ASHE data shows many more SS users than the FRS captures due to
184+
# self-reporting bias in auto-enrolment. Impute additional SS users
185+
# from pension contributors to create enough records for calibration
186+
# to hit OBR headcount targets (7.7mn total, 4.3mn below £2,000).
187+
person_weight = sim.calculate("person_weight").values
188+
employee_pension = dataset.person[
189+
"employee_pension_contributions"
190+
].values.copy()
191+
has_ss = final_ss > 0
192+
below_cap_ss = has_ss & (final_ss <= 2000)
193+
194+
# Donor pool: employed pension contributors not already SS users
195+
is_donor = (employee_pension > 0) & ~has_ss & (employment_income > 0)
196+
197+
# Target ~4.3mn below-cap SS users (HMRC/ASHE estimate)
198+
TARGET_BELOW_CAP = 4_300_000
199+
current_below_cap = (person_weight * below_cap_ss).sum()
200+
shortfall = max(0, TARGET_BELOW_CAP - current_below_cap)
201+
202+
if shortfall > 0:
203+
donor_weighted = (person_weight * is_donor).sum()
204+
if donor_weighted > 0:
205+
imputation_rate = min(0.8, shortfall / donor_weighted)
206+
rng = np.random.default_rng(seed=2024)
207+
newly_imputed = is_donor & (
208+
rng.random(len(final_ss)) < imputation_rate
209+
)
210+
211+
# Move up to £2,000 of employee pension to SS
212+
ss_new = np.minimum(employee_pension, 2000.0)
213+
final_ss = np.where(newly_imputed, ss_new, final_ss)
214+
215+
# Reduce employee pension correspondingly
216+
dataset.person["employee_pension_contributions"] = np.where(
217+
newly_imputed,
218+
employee_pension - ss_new,
219+
employee_pension,
220+
)
221+
187222
dataset.person["pension_contributions_via_salary_sacrifice"] = final_ss
188223

189224
return dataset

policyengine_uk_data/targets/build_loss_matrix.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
compute_scotland_uc_child,
3939
compute_scottish_child_payment,
4040
compute_ss_contributions,
41+
compute_ss_headcount,
4142
compute_ss_it_relief,
4243
compute_ss_ni_relief,
4344
compute_tenure,
@@ -341,6 +342,10 @@ def _compute_column(
341342
if name == "hmrc/salary_sacrifice_contributions":
342343
return compute_ss_contributions(target, ctx)
343344

345+
# Salary sacrifice headcount
346+
if name.startswith("obr/salary_sacrifice_users_"):
347+
return compute_ss_headcount(target, ctx)
348+
344349
# Salary sacrifice NI relief
345350
if name in (
346351
"hmrc/salary_sacrifice_employee_nics_relief",

policyengine_uk_data/targets/compute/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
compute_esa,
3030
compute_income_band,
3131
compute_ss_contributions,
32+
compute_ss_headcount,
3233
compute_ss_it_relief,
3334
compute_ss_ni_relief,
3435
)
@@ -55,6 +56,7 @@
5556
"compute_scotland_uc_child",
5657
"compute_scottish_child_payment",
5758
"compute_ss_contributions",
59+
"compute_ss_headcount",
5860
"compute_ss_it_relief",
5961
"compute_ss_ni_relief",
6062
"compute_tenure",

policyengine_uk_data/targets/compute/income.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,19 @@ def compute_ss_ni_relief(target, ctx) -> np.ndarray:
7474
return ctx.household_from_person(ni_cf - ni_base)
7575

7676

77+
def compute_ss_headcount(target, ctx) -> np.ndarray:
78+
"""Compute salary sacrifice user headcounts."""
79+
ss = ctx.sim.calculate("pension_contributions_via_salary_sacrifice")
80+
name = target.name
81+
if "below_cap" in name:
82+
mask = (ss > 0) & (ss <= 2000)
83+
elif "above_cap" in name:
84+
mask = ss > 2000
85+
else:
86+
mask = ss > 0
87+
return ctx.household_from_person(mask)
88+
89+
7790
def compute_esa(target, ctx) -> np.ndarray:
7891
"""Compute ESA (combined income-related + contributory)."""
7992
return ctx.household_from_person(

policyengine_uk_data/targets/sources/obr.py

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -434,6 +434,22 @@ def _parse_tv_licence(wb: openpyxl.Workbook) -> list[Target]:
434434
y: 2.9e9 * 1.03 ** max(0, y - 2024) for y in range(2024, 2032)
435435
}
436436

437+
# Salary sacrifice headcount: 7.7m total (3.3m above £2k, 4.3m below)
438+
# OBR para 1.7: SS population grows 0.9% faster than employees (~2.4%/yr)
439+
_SS_HEADCOUNT_GROWTH = 1.024
440+
_SS_TOTAL_USERS = {
441+
y: 7_700_000 * _SS_HEADCOUNT_GROWTH ** max(0, y - 2024)
442+
for y in range(2024, 2032)
443+
}
444+
_SS_BELOW_CAP_USERS = {
445+
y: 4_300_000 * _SS_HEADCOUNT_GROWTH ** max(0, y - 2024)
446+
for y in range(2024, 2032)
447+
}
448+
_SS_ABOVE_CAP_USERS = {
449+
y: 3_300_000 * _SS_HEADCOUNT_GROWTH ** max(0, y - 2024)
450+
for y in range(2024, 2032)
451+
}
452+
437453

438454
def get_targets() -> list[Target]:
439455
config = load_config()
@@ -487,4 +503,44 @@ def get_targets() -> list[Target]:
487503
)
488504
)
489505

506+
# Salary sacrifice headcount targets
507+
_SS_REF = (
508+
"https://www.gov.uk/government/publications/"
509+
"salary-sacrifice-reform-for-pension-contributions"
510+
"-effective-from-6-april-2029"
511+
)
512+
targets.append(
513+
Target(
514+
name="obr/salary_sacrifice_users_total",
515+
variable="pension_contributions_via_salary_sacrifice",
516+
source="obr",
517+
unit=Unit.COUNT,
518+
values=_SS_TOTAL_USERS,
519+
is_count=True,
520+
reference_url=_SS_REF,
521+
)
522+
)
523+
targets.append(
524+
Target(
525+
name="obr/salary_sacrifice_users_below_cap",
526+
variable="pension_contributions_via_salary_sacrifice",
527+
source="obr",
528+
unit=Unit.COUNT,
529+
values=_SS_BELOW_CAP_USERS,
530+
is_count=True,
531+
reference_url=_SS_REF,
532+
)
533+
)
534+
targets.append(
535+
Target(
536+
name="obr/salary_sacrifice_users_above_cap",
537+
variable="pension_contributions_via_salary_sacrifice",
538+
source="obr",
539+
unit=Unit.COUNT,
540+
values=_SS_ABOVE_CAP_USERS,
541+
is_count=True,
542+
reference_url=_SS_REF,
543+
)
544+
)
545+
490546
return targets
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
"""Test salary sacrifice headcount calibration targets.
2+
3+
Source: HMRC, "Salary sacrifice reform for pension contributions"
4+
https://www.gov.uk/government/publications/salary-sacrifice-reform-for-pension-contributions-effective-from-6-april-2029
5+
7.7mn total SS users (3.3mn above 2k cap, 4.3mn below 2k cap)
6+
"""
7+
8+
import pytest
9+
10+
TOLERANCE = 0.15 # 15% relative tolerance
11+
12+
13+
@pytest.mark.xfail(
14+
reason="Will pass after recalibration with new headcount targets"
15+
)
16+
def test_salary_sacrifice_total_users(baseline):
17+
"""Test that total SS user count is close to 7.7mn."""
18+
ss = baseline.calculate(
19+
"pension_contributions_via_salary_sacrifice",
20+
map_to="person",
21+
period=2025,
22+
)
23+
person_weight = baseline.calculate(
24+
"person_weight", map_to="person", period=2025
25+
).values
26+
27+
total_users = (person_weight * (ss.values > 0)).sum()
28+
TARGET = 7_700_000
29+
30+
assert abs(total_users / TARGET - 1) < TOLERANCE, (
31+
f"Expected ~{TARGET/1e6:.1f}mn SS users, "
32+
f"got {total_users/1e6:.1f}mn ({total_users/TARGET*100:.0f}% of target)"
33+
)
34+
35+
36+
@pytest.mark.xfail(
37+
reason="Will pass after recalibration with new headcount targets"
38+
)
39+
def test_salary_sacrifice_below_cap_users(baseline):
40+
"""Test that below-cap (<=2k) SS users are close to 4.3mn."""
41+
ss = baseline.calculate(
42+
"pension_contributions_via_salary_sacrifice",
43+
map_to="person",
44+
period=2025,
45+
)
46+
person_weight = baseline.calculate(
47+
"person_weight", map_to="person", period=2025
48+
).values
49+
50+
below_cap = (ss.values > 0) & (ss.values <= 2000)
51+
total_below_cap = (person_weight * below_cap).sum()
52+
TARGET = 4_300_000
53+
54+
assert abs(total_below_cap / TARGET - 1) < TOLERANCE, (
55+
f"Expected ~{TARGET/1e6:.1f}mn below-cap SS users, "
56+
f"got {total_below_cap/1e6:.1f}mn ({total_below_cap/TARGET*100:.0f}% of target)"
57+
)
58+
59+
60+
@pytest.mark.xfail(
61+
reason="Will pass after recalibration with new headcount targets"
62+
)
63+
def test_salary_sacrifice_above_cap_users(baseline):
64+
"""Test that above-cap (>2k) SS users are close to 3.3mn."""
65+
ss = baseline.calculate(
66+
"pension_contributions_via_salary_sacrifice",
67+
map_to="person",
68+
period=2025,
69+
)
70+
person_weight = baseline.calculate(
71+
"person_weight", map_to="person", period=2025
72+
).values
73+
74+
above_cap = ss.values > 2000
75+
total_above_cap = (person_weight * above_cap).sum()
76+
TARGET = 3_300_000
77+
78+
assert abs(total_above_cap / TARGET - 1) < TOLERANCE, (
79+
f"Expected ~{TARGET/1e6:.1f}mn above-cap SS users, "
80+
f"got {total_above_cap/1e6:.1f}mn ({total_above_cap/TARGET*100:.0f}% of target)"
81+
)

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "policyengine_uk_data"
7-
version = "1.36.2"
7+
version = "1.37.0"
88
description = "A package to create representative microdata for the UK."
99
readme = "README.md"
1010
authors = [

0 commit comments

Comments
 (0)