Skip to content

Commit 0720fc5

Browse files
committed
Move all randomness to data package for deterministic country package
This change moves ALL random number generation from policyengine-uk into the dataset generation in policyengine-uk-data. The country package is now a purely deterministic rules engine. ## Key Changes ### policyengine-uk-data: - Add take-up rate YAML parameter files in `parameters/take_up/` - Generate all stochastic decisions in FRS dataset using these rates - Generate boolean would_claim variables directly in dataset - Generate random draws for variables that need them (tie-breaking, etc.) - Use seeded RNG (seed=100) for full reproducibility ### Stochastic variables generated: **Take-up decisions (boolean):** - would_claim_child_benefit - child_benefit_opts_out - would_claim_pc (Pension Credit) - would_claim_uc (Universal Credit) - would_claim_marriage_allowance - would_claim_tfc (Tax-Free Childcare) - would_claim_extended_childcare - would_claim_universal_childcare - would_claim_targeted_childcare **Other stochastic variables (boolean):** - household_owns_tv - would_evade_tv_licence_fee - main_residential_property_purchased_is_first_home - is_disabled_for_benefits (based on reported benefits) **Random draws (float [0,1)):** - is_higher_earner_random_draw (for tie-breaking) - attends_private_school_random_draw (for income-conditional probability) ## Trade-offs **IMPORTANT**: Take-up rates can no longer be adjusted dynamically via policy reforms or in the web app. They are fixed in the microdata. This is an acceptable trade-off for the cleaner architecture of keeping the country package purely deterministic. To adjust take-up rates, the microdata must be regenerated. Related: policyengine-uk PR (must be merged after this)
1 parent b0814b9 commit 0720fc5

17 files changed

Lines changed: 258 additions & 28 deletions

changelog_entry.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
- bump: minor
2+
changes:
3+
added:
4+
- Take-up rate parameters in YAML files for stochastic simulation
5+
- Parameter loader for take-up rates
6+
- Generation of all stochastic boolean variables in FRS dataset
7+
- Random draws for tie-breaking and conditional probabilities
8+
changed:
9+
- Moved all randomness from policyengine-uk to policyengine-uk-data
10+
- Country package is now purely deterministic
11+
- All stochastic decisions generated once during dataset creation

policyengine_uk_data/datasets/frs.py

Lines changed: 81 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
fill_with_mean,
2020
STORAGE_FOLDER,
2121
)
22+
from policyengine_uk_data.parameters import load_take_up_rate, load_parameter
2223

2324

2425
def create_frs(
@@ -751,35 +752,89 @@ def determine_education_level(fted_val, typeed2_val, age_val):
751752
paragraph_3 | paragraph_4 | paragraph_5
752753
)
753754

754-
# Add random seed variables for stochastic simulation
755-
# Each seed is for a specific independent random decision to avoid artificial correlations
756-
# Random seeds are generated once during dataset creation and stored
755+
# Generate stochastic take-up decisions
756+
# All randomness is generated here in the data package using take-up rates
757+
# stored in YAML parameter files. This keeps the country package purely deterministic.
757758

758759
generator = np.random.default_rng(seed=100)
759760

760-
# Person-level seeds
761-
pe_person["is_disabled_for_benefits_seed"] = generator.random(len(pe_person))
762-
pe_person["marriage_allowance_take_up_seed"] = generator.random(len(pe_person))
763-
pe_person["is_higher_earner_seed"] = generator.random(len(pe_person))
764-
pe_person["attends_private_school_seed"] = generator.random(len(pe_person))
765-
766-
# Benefit unit-level seeds
767-
pe_benunit["child_benefit_take_up_seed"] = generator.random(len(pe_benunit))
768-
pe_benunit["child_benefit_opts_out_seed"] = generator.random(len(pe_benunit))
769-
pe_benunit["pension_credit_take_up_seed"] = generator.random(len(pe_benunit))
770-
pe_benunit["universal_credit_take_up_seed"] = generator.random(len(pe_benunit))
771-
772-
# Household-level seeds
773-
pe_household["first_home_purchase_seed"] = generator.random(len(pe_household))
774-
pe_household["household_owns_tv_seed"] = generator.random(len(pe_household))
775-
pe_household["tv_licence_evasion_seed"] = generator.random(len(pe_household))
776-
777-
# Add childcare take-up seeds
778-
# These will be used by the formulas in policyengine-uk with parameters
779-
pe_benunit["tax_free_childcare_take_up_seed"] = generator.random(len(pe_benunit))
780-
pe_benunit["extended_childcare_take_up_seed"] = generator.random(len(pe_benunit))
781-
pe_benunit["universal_childcare_take_up_seed"] = generator.random(len(pe_benunit))
782-
pe_benunit["targeted_childcare_take_up_seed"] = generator.random(len(pe_benunit))
761+
# Load take-up rates from parameter files
762+
child_benefit_rate = load_take_up_rate("child_benefit", year)
763+
pension_credit_rate = load_take_up_rate("pension_credit", year)
764+
universal_credit_rate = load_take_up_rate("universal_credit", year)
765+
marriage_allowance_rate = load_take_up_rate("marriage_allowance", year)
766+
child_benefit_opts_out_rate = load_take_up_rate(
767+
"child_benefit_opts_out_rate", year
768+
)
769+
tfc_rate = load_take_up_rate("tax_free_childcare", year)
770+
extended_childcare_rate = load_take_up_rate("extended_childcare", year)
771+
universal_childcare_rate = load_take_up_rate("universal_childcare", year)
772+
targeted_childcare_rate = load_take_up_rate("targeted_childcare", year)
773+
774+
# Generate take-up decisions by comparing random draws to take-up rates
775+
# Person-level
776+
pe_person["would_claim_marriage_allowance"] = (
777+
generator.random(len(pe_person)) < marriage_allowance_rate
778+
)
779+
780+
# Benefit unit-level
781+
pe_benunit["would_claim_child_benefit"] = (
782+
generator.random(len(pe_benunit)) < child_benefit_rate
783+
)
784+
pe_benunit["child_benefit_opts_out"] = (
785+
generator.random(len(pe_benunit)) < child_benefit_opts_out_rate
786+
)
787+
pe_benunit["would_claim_pc"] = (
788+
generator.random(len(pe_benunit)) < pension_credit_rate
789+
)
790+
pe_benunit["would_claim_uc"] = (
791+
generator.random(len(pe_benunit)) < universal_credit_rate
792+
)
793+
pe_benunit["would_claim_tfc"] = (
794+
generator.random(len(pe_benunit)) < tfc_rate
795+
)
796+
pe_benunit["would_claim_extended_childcare"] = (
797+
generator.random(len(pe_benunit)) < extended_childcare_rate
798+
)
799+
pe_benunit["would_claim_universal_childcare"] = (
800+
generator.random(len(pe_benunit)) < universal_childcare_rate
801+
)
802+
pe_benunit["would_claim_targeted_childcare"] = (
803+
generator.random(len(pe_benunit)) < targeted_childcare_rate
804+
)
805+
806+
# Generate other stochastic variables using rates from parameter files
807+
# These are also generated in the dataset to keep the country package deterministic
808+
tv_ownership_rate = load_parameter("stochastic", "tv_ownership_rate", year)
809+
tv_evasion_rate = load_parameter(
810+
"stochastic", "tv_licence_evasion_rate", year
811+
)
812+
first_time_buyer_rate = load_parameter(
813+
"stochastic", "first_time_buyer_rate", year
814+
)
815+
816+
# Household-level: TV ownership
817+
pe_household["household_owns_tv"] = (
818+
generator.random(len(pe_household)) < tv_ownership_rate
819+
)
820+
821+
# Household-level: TV licence evasion
822+
pe_household["would_evade_tv_licence_fee"] = (
823+
generator.random(len(pe_household)) < tv_evasion_rate
824+
)
825+
826+
# Household-level: First home purchase
827+
pe_household["main_residential_property_purchased_is_first_home"] = (
828+
generator.random(len(pe_household)) < first_time_buyer_rate
829+
)
830+
831+
# Person-level: Tie-breaking for higher earner (uniform random for tie-breaking)
832+
pe_person["higher_earner_tie_break"] = generator.random(len(pe_person))
833+
834+
# Person-level: Private school attendance random draw
835+
pe_person["attends_private_school_random_draw"] = generator.random(
836+
len(pe_person)
837+
)
783838

784839
# Generate extended childcare hours usage values with mean 15.019 and sd 4.972
785840
extended_hours_values = generator.normal(15.019, 4.972, len(pe_benunit))
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
"""
2+
Take-up rate parameters for stochastic simulation.
3+
4+
These parameters are stored in the data package to keep the country package
5+
as a purely deterministic rules engine.
6+
"""
7+
8+
import yaml
9+
from pathlib import Path
10+
11+
PARAMETERS_DIR = Path(__file__).parent
12+
13+
14+
def load_take_up_rate(variable_name: str, year: int = 2015) -> float:
15+
"""Load take-up rate from YAML parameter files.
16+
17+
Args:
18+
variable_name: Name of the take-up parameter file (without .yaml)
19+
year: Year for which to get the rate
20+
21+
Returns:
22+
Take-up rate as a float between 0 and 1
23+
"""
24+
yaml_path = PARAMETERS_DIR / "take_up" / f"{variable_name}.yaml"
25+
26+
with open(yaml_path) as f:
27+
data = yaml.safe_load(f)
28+
29+
# Find the applicable value for the year
30+
values = data["values"]
31+
applicable_value = None
32+
33+
for date_key, value in sorted(values.items()):
34+
# Handle both string and datetime.date objects from YAML
35+
if hasattr(date_key, "year"):
36+
# It's a datetime.date object
37+
date_year = date_key.year
38+
else:
39+
# It's a string
40+
date_year = int(date_key.split("-")[0])
41+
42+
if date_year <= year:
43+
applicable_value = value
44+
else:
45+
break
46+
47+
if applicable_value is None:
48+
raise ValueError(
49+
f"No take-up rate found for {variable_name} in {year}"
50+
)
51+
52+
return applicable_value
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
description: Percentage of residential property purchases that are by first-time buyers
2+
metadata:
3+
unit: /1
4+
label: First-time buyer rate
5+
reference:
6+
- title: ONS First-time buyer mortgage sales by local authority
7+
href: https://www.ons.gov.uk/releases/firsttimebuyermortgagesalesbylocalauthorityuk2006to2023
8+
- title: Uswitch First-Time Buyer Statistics 2024
9+
href: https://www.uswitch.com/mortgages/first-time-buyer-statistics/
10+
values:
11+
2013-01-01: 0.280 # ONS data
12+
2023-01-01: 0.384 # 38.4% of property sales were first-time buyers
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
description: Percentage of TV-owning households that evade the TV licence fee
2+
metadata:
3+
unit: /1
4+
label: TV licence evasion rate
5+
reference:
6+
- title: TV Licensing annual evader statistics
7+
href: https://www.tvlicensing.co.uk/about/media-centre/news/tv-licensing-publishes-annual-evader-statistics-NEWS31
8+
- title: House of Commons Library - TV licence fee statistics
9+
href: https://commonslibrary.parliament.uk/research-briefings/cbp-8101/
10+
values:
11+
2015-01-01: 0.05 # Historical low point
12+
2018-01-01: 0.0657 # Official BBC estimate
13+
2022-01-01: 0.1058 # Significant increase
14+
2024-01-01: 0.1252 # Current BBC estimate
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
description: Percentage of households that own a functioning colour TV
2+
metadata:
3+
unit: /1
4+
label: TV ownership rate
5+
reference:
6+
- title: Ofcom - 95% of UK homes had at least one TV set in 2020
7+
href: https://www.statista.com/statistics/269969/number-of-tv-households-in-the-uk/
8+
values:
9+
2015-01-01: 0.96
10+
2020-01-01: 0.95
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
description: Share of eligible children that participate in Child Benefit
2+
metadata:
3+
unit: /1
4+
reference:
5+
- title: "Child Benefit statistics: 2022 annual release"
6+
href: https://www.gov.uk/government/statistics/child-benefit-statistics-annual-release-august-2022/child-benefit-statistics-annual-release-data-at-august-2022#:~:text=since%202012%20the%20take%2Dup,level%20in%202022%20of%2089%25.
7+
values:
8+
2012-01-01: 0.97
9+
2022-01-01: 0.89
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
description: Percentage of fully High Income Child Benefit Charge-liable families who opt out of Child Benefit.
2+
metadata:
3+
unit: /1
4+
label: Child Benefit HITC-liable opt-out rate
5+
reference:
6+
- title: "Child Benefit Statistics: Annual Release, August 2022"
7+
href: https://www.gov.uk/government/statistics/child-benefit-statistics-annual-release-august-2022/child-benefit-statistics-annual-release-data-at-august-2022
8+
values:
9+
2019-01-01: 0.23 # 3m families have ANI over £60k in the 2023 FRS, 683k families opt out of CB.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
description: Extended childcare entitlement take-up rate
2+
metadata:
3+
unit: /1
4+
period: year
5+
reference:
6+
- title: Empirical estimate from FRS data
7+
href: https://github.com/PolicyEngine/policyengine-uk-data
8+
values:
9+
2015-01-01: 0.812
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
description: Percentage of eligible couples who claim Marriage Allowance.
2+
metadata:
3+
unit: /1
4+
label: Marriage Allowance take-up rate
5+
values:
6+
2000-01-01: 1

0 commit comments

Comments
 (0)