Skip to content

Add synthetic FRS#133

Closed
nikhilwoodruff wants to merge 2 commits into
mainfrom
nikhilwoodruff/issue132
Closed

Add synthetic FRS#133
nikhilwoodruff wants to merge 2 commits into
mainfrom
nikhilwoodruff/issue132

Conversation

@nikhilwoodruff

Copy link
Copy Markdown
Contributor

Fixes #132

@nikhilwoodruff nikhilwoodruff self-assigned this Jun 17, 2025
@nikhilwoodruff nikhilwoodruff added the enhancement New feature or request label Jun 17, 2025

@anth-volk anth-volk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @nikhilwoodruff. I am not 100% sure on what the EULA says around this sort of data usage, but will trust your judgment on it. I highlighted one potential issue. If this dataset were public-facing, I'd probably also have some code questions, but don't think they're necessary for this.

)

data = {}
for variable in simulation.tax_benefit_system.variables:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: The EULA seems unclear to me on whether this is valid usage, but I am willing to trust your interpretation here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're looking at the same text. @MaxGhenis could you weigh in?

import numpy as np


def create_synth_frs():

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment, non-blocking: I think the code generating this is sufficient for a non-public dataset like this, but would have some questions around a couple parts of this were it public-facing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is public-facing (we want to use it for tests)

]

for file_path in dataset_files:
public_dataset_files = [

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: On line 7 above (couldn't link here), I would rename dataset_files to private_dataset_files to make this clearer


upload_data_files(
files=public_dataset_files,
hf_repo_name="policyengine/policyengine-uk-data-public",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue, blocking?: This repo already exists with files from February; do we need to delete those before proceeding here for versioning clarity?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave them there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add synthetic FRS

2 participants