Convert dataset generation to function-based, not class-based#161
Merged
Conversation
517f903 to
b6b165f
Compare
a2e0647 to
39701f5
Compare
anth-volk
reviewed
Jul 14, 2025
anth-volk
left a comment
Contributor
There was a problem hiding this comment.
I really loved reading through this @nikhilwoodruff. There are so many parts of this that are a real upgrade to what we currently do. I did a lighter review, given that this is not my area of expertise and I know you're looking to move fast. I left a few non-blocking nits that you might want to check out, but otherwise, love this trajectory.
082b8c7 to
d0c26d5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR modernises the dataset generation architecture by replacing the class-based approach with a simpler function-based system. The key changes include consolidating all dataset creation logic into a single create_datasets.py script, moving from Python 3.12 to 3.13, and simplifying the build pipeline.
The new architecture eliminates the complex class hierarchy for dataset generation in favour of straightforward functions that produce the same output. This makes the codebase easier to maintain and understand whilst preserving all existing functionality. The enhanced FRS dataset generation now follows a linear process: create base FRS, add imputations, uprate to 2025, calibrate with targets, then downrate back to 2023.
Additional improvements include simplified dependency installation using uv, streamlined CI workflows, and better organisation of local area data files.