|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# Student loan repayment validation\n", |
| 8 | + "\n", |
| 9 | + "This notebook compares PolicyEngine UK's calculated student loan repayments against reported repayments from the Family Resources Survey (FRS) microdata. Understanding the alignment between modelled and reported values helps assess model accuracy and identify areas for improvement." |
| 10 | + ] |
| 11 | + }, |
| 12 | + { |
| 13 | + "cell_type": "markdown", |
| 14 | + "metadata": {}, |
| 15 | + "source": [ |
| 16 | + "## Background\n", |
| 17 | + "\n", |
| 18 | + "Student loan repayments in the UK are calculated as a percentage of income above a threshold, varying by loan plan:\n", |
| 19 | + "\n", |
| 20 | + "- **Plan 1** (pre-2012 England/Wales, Scotland, NI): 9% of income above £24,990 (2024-25)\n", |
| 21 | + "- **Plan 2** (post-2012 England/Wales): 9% of income above £27,295 (2024-25)\n", |
| 22 | + "- **Plan 4** (Scotland post-2017): 9% of income above £27,660 (2024-25)\n", |
| 23 | + "- **Plan 5** (England post-2023): 9% of income above £25,000 (2024-25)\n", |
| 24 | + "- **Postgraduate**: 6% of income above £21,000 (2024-25)\n", |
| 25 | + "\n", |
| 26 | + "The FRS captures reported student loan repayments, while PolicyEngine calculates repayments based on income and loan plan type." |
| 27 | + ] |
| 28 | + }, |
| 29 | + { |
| 30 | + "cell_type": "code", |
| 31 | + "execution_count": null, |
| 32 | + "metadata": {}, |
| 33 | + "outputs": [], |
| 34 | + "source": [ |
| 35 | + "from policyengine_uk import Microsimulation\n", |
| 36 | + "import numpy as np\n", |
| 37 | + "import pandas as pd\n", |
| 38 | + "\n", |
| 39 | + "sim = Microsimulation()\n", |
| 40 | + "year = 2025" |
| 41 | + ] |
| 42 | + }, |
| 43 | + { |
| 44 | + "cell_type": "code", |
| 45 | + "execution_count": null, |
| 46 | + "metadata": {}, |
| 47 | + "outputs": [], |
| 48 | + "source": [ |
| 49 | + "# Get student loan data\n", |
| 50 | + "reported = sim.calculate(\"student_loan_repayments\", year).values\n", |
| 51 | + "modelled = sim.calculate(\"student_loan_repayment\", year).values\n", |
| 52 | + "plan = sim.calculate(\"student_loan_plan\", year).values\n", |
| 53 | + "income = sim.calculate(\"adjusted_net_income\", year).values\n", |
| 54 | + "weight = sim.calculate(\"person_weight\", year).values" |
| 55 | + ] |
| 56 | + }, |
| 57 | + { |
| 58 | + "cell_type": "markdown", |
| 59 | + "metadata": {}, |
| 60 | + "source": [ |
| 61 | + "## Student loan plan distribution\n", |
| 62 | + "\n", |
| 63 | + "First, let's examine the distribution of student loan plans in the weighted population:" |
| 64 | + ] |
| 65 | + }, |
| 66 | + { |
| 67 | + "cell_type": "code", |
| 68 | + "execution_count": null, |
| 69 | + "metadata": {}, |
| 70 | + "outputs": [], |
| 71 | + "source": [ |
| 72 | + "# Plan distribution (weighted)\n", |
| 73 | + "plan_names = {0: \"None\", 1: \"Plan 1\", 2: \"Plan 2\", 3: \"Postgraduate\", 4: \"Plan 4\", 5: \"Plan 5\"}\n", |
| 74 | + "for plan_id, name in plan_names.items():\n", |
| 75 | + " count = weight[plan == plan_id].sum() / 1e6\n", |
| 76 | + " print(f\"{name}: {count:.2f}m people\")" |
| 77 | + ] |
| 78 | + }, |
| 79 | + { |
| 80 | + "cell_type": "markdown", |
| 81 | + "metadata": {}, |
| 82 | + "source": [ |
| 83 | + "## Aggregate comparison\n", |
| 84 | + "\n", |
| 85 | + "Comparing total reported vs modelled repayments:" |
| 86 | + ] |
| 87 | + }, |
| 88 | + { |
| 89 | + "cell_type": "code", |
| 90 | + "execution_count": null, |
| 91 | + "metadata": {}, |
| 92 | + "outputs": [], |
| 93 | + "source": [ |
| 94 | + "total_reported = (reported * weight).sum() / 1e9\n", |
| 95 | + "total_modelled = (modelled * weight).sum() / 1e9\n", |
| 96 | + "\n", |
| 97 | + "print(f\"Total reported repayments: £{total_reported:.2f}bn\")\n", |
| 98 | + "print(f\"Total modelled repayments: £{total_modelled:.2f}bn\")\n", |
| 99 | + "print(f\"Ratio (modelled/reported): {total_modelled/total_reported:.2f}\")" |
| 100 | + ] |
| 101 | + }, |
| 102 | + { |
| 103 | + "cell_type": "markdown", |
| 104 | + "metadata": {}, |
| 105 | + "source": [ |
| 106 | + "## Individual-level alignment\n", |
| 107 | + "\n", |
| 108 | + "For people who report making student loan repayments, how well do our calculations align?" |
| 109 | + ] |
| 110 | + }, |
| 111 | + { |
| 112 | + "cell_type": "code", |
| 113 | + "execution_count": null, |
| 114 | + "metadata": {}, |
| 115 | + "outputs": [], |
| 116 | + "source": [ |
| 117 | + "# Filter to people with reported repayments > 0\n", |
| 118 | + "has_reported = reported > 0\n", |
| 119 | + "\n", |
| 120 | + "if has_reported.sum() > 0:\n", |
| 121 | + " # Correlation\n", |
| 122 | + " correlation = np.corrcoef(reported[has_reported], modelled[has_reported])[0, 1]\n", |
| 123 | + " print(f\"Correlation (people with reported > 0): {correlation:.3f}\")\n", |
| 124 | + " \n", |
| 125 | + " # Match rate\n", |
| 126 | + " both_positive = (reported > 0) & (modelled > 0)\n", |
| 127 | + " match_rate = both_positive.sum() / has_reported.sum() * 100\n", |
| 128 | + " print(f\"People with both reported & modelled > 0: {match_rate:.1f}% of reporters\")\n", |
| 129 | + " \n", |
| 130 | + " # Mean values\n", |
| 131 | + " print(f\"\\nMean reported (reporters): £{reported[has_reported].mean():,.0f}\")\n", |
| 132 | + " print(f\"Mean modelled (reporters): £{modelled[has_reported].mean():,.0f}\")\n", |
| 133 | + " print(f\"Mean income (reporters): £{income[has_reported].mean():,.0f}\")" |
| 134 | + ] |
| 135 | + }, |
| 136 | + { |
| 137 | + "cell_type": "markdown", |
| 138 | + "metadata": {}, |
| 139 | + "source": [ |
| 140 | + "## Analysis of discrepancies\n", |
| 141 | + "\n", |
| 142 | + "The relatively low individual-level correlation suggests several factors may explain differences:\n", |
| 143 | + "\n", |
| 144 | + "1. **Timing differences**: Reported repayments reflect actual payments made during the tax year, which may include voluntary overpayments or vary based on pay frequency and employment changes.\n", |
| 145 | + "\n", |
| 146 | + "2. **Employment variation**: Someone may have had periods below or above the repayment threshold during the year, while our model assumes constant annual income.\n", |
| 147 | + "\n", |
| 148 | + "3. **Multiple loan plans**: Some individuals may have both Plan 1 and Plan 2 loans, complicating the calculation.\n", |
| 149 | + "\n", |
| 150 | + "4. **Study status**: Current students may have different repayment patterns not fully captured in the model.\n", |
| 151 | + "\n", |
| 152 | + "5. **Plan misclassification**: The loan plan imputation in the microdata may not perfectly match individuals' actual loan types.\n", |
| 153 | + "\n", |
| 154 | + "Despite individual-level variation, the aggregate totals are reasonably aligned, suggesting the model captures the overall scale of student loan repayments in the UK economy." |
| 155 | + ] |
| 156 | + }, |
| 157 | + { |
| 158 | + "cell_type": "markdown", |
| 159 | + "metadata": {}, |
| 160 | + "source": [ |
| 161 | + "## Conclusion\n", |
| 162 | + "\n", |
| 163 | + "PolicyEngine UK's student loan repayment model produces aggregate totals within a reasonable range of reported values. The individual-level correlation is lower than for income tax calculations, reflecting the complexity of student loan timing and the limitations of annual income-based calculations. For microsimulation purposes, the model provides a reasonable approximation of student loan repayment flows, while users should be aware of these limitations when analysing individual-level impacts." |
| 164 | + ] |
| 165 | + } |
| 166 | + ], |
| 167 | + "metadata": { |
| 168 | + "kernelspec": { |
| 169 | + "display_name": "Python 3", |
| 170 | + "language": "python", |
| 171 | + "name": "python3" |
| 172 | + }, |
| 173 | + "language_info": { |
| 174 | + "name": "python", |
| 175 | + "version": "3.10.0" |
| 176 | + } |
| 177 | + }, |
| 178 | + "nbformat": 4, |
| 179 | + "nbformat_minor": 4 |
| 180 | +} |
0 commit comments