Problem
The current dataset only uses raw FRS SPNAMT (salary sacrifice pension amount) field, which has only 371 observations with non-zero values out of ~36,000 persons. This represents £5bn weighted, but the HMRC target is ~£24bn.
PR #216 attempted to add calibration targets for salary sacrifice, but this requires ~5x weight scaling which inflates population from 68M to 74M (6% over target vs 2% tolerance).
Root Cause
Salary sacrifice is NOT imputed - unlike consumption, wealth, VAT, services, income, and capital gains which all have imputation steps in create_datasets.py. The raw FRS severely under-reports SS participation.
Proposed Solution
Implement ML-based imputation for salary sacrifice participation, similar to how other variables are imputed.
Key Finding: We CAN Distinguish Non-Response from Zero
The FRS SALSAC variable is a routing question that asks "Does your employer offer a salary sacrifice scheme for pension contributions?":
| SALSAC Value |
Meaning |
Count |
| '1' |
Yes, participates in SS |
224 jobs |
| '2' |
No, doesn't participate |
3,803 jobs |
| ' ' (blank) |
Skip/not asked |
13,265 jobs |
This provides:
- Training data: 4,027 observations with definite Yes/No responses
- Imputation candidates: 13,265 observations where the question was skipped
External Validation Target
Per HMRC surveys, approximately 30% of private sector employees use salary sacrifice for pension contributions. This can be used to validate imputation results.
HMRC Table 6.2 Targets (2023-24)
- Total SS pension contributions: ~£24bn
- IT relief from SS: ~£7.2bn
- Basic rate: £1.6bn
- Higher rate: £4.4bn
- Additional rate: £1.2bn
Implementation Steps
- Create imputation model using SALSAC='1'/'2' as training labels
- Predict SS participation probability for SALSAC=' ' (skipped) observations
- Impute SS amounts based on participation probability and employee characteristics
- Validate against HMRC 30% participation rate target
- Remove calibration targets from loss function (or reduce their weight significantly)
Related Issues/PRs
Files to Modify
policyengine_uk_data/datasets/frs.py - Add SALSAC extraction
policyengine_uk_data/datasets/create_datasets.py - Add SS imputation step
- New file:
policyengine_uk_data/datasets/imputations/salary_sacrifice.py
Problem
The current dataset only uses raw FRS
SPNAMT(salary sacrifice pension amount) field, which has only 371 observations with non-zero values out of ~36,000 persons. This represents £5bn weighted, but the HMRC target is ~£24bn.PR #216 attempted to add calibration targets for salary sacrifice, but this requires ~5x weight scaling which inflates population from 68M to 74M (6% over target vs 2% tolerance).
Root Cause
Salary sacrifice is NOT imputed - unlike consumption, wealth, VAT, services, income, and capital gains which all have imputation steps in
create_datasets.py. The raw FRS severely under-reports SS participation.Proposed Solution
Implement ML-based imputation for salary sacrifice participation, similar to how other variables are imputed.
Key Finding: We CAN Distinguish Non-Response from Zero
The FRS
SALSACvariable is a routing question that asks "Does your employer offer a salary sacrifice scheme for pension contributions?":This provides:
External Validation Target
Per HMRC surveys, approximately 30% of private sector employees use salary sacrifice for pension contributions. This can be used to validate imputation results.
HMRC Table 6.2 Targets (2023-24)
Implementation Steps
Related Issues/PRs
Files to Modify
policyengine_uk_data/datasets/frs.py- Add SALSAC extractionpolicyengine_uk_data/datasets/create_datasets.py- Add SS imputation steppolicyengine_uk_data/datasets/imputations/salary_sacrifice.py