New data loading#307
Conversation
First pass at implementing a new way to store and read in data files. This method stores variables as CSV files (one for each variable). Users are expected to edit CSV files directly. CSVs are then read into the model, removing the xlsx->csv step. Polars is also used in place of Pandas for speed gains (not clear yet how much the gain in speed is due to this swap). Also included is a script (convert_inputs_to_new_format.py) which can be used to convert old style files to the new format. Model output has been tested for Tr, P, Fr, H. Shares results the same as before.
no longer needed as part of new data loading
Removing some old text left over from prototyping
|
@copilot: can you review all the .py files? No need to look at the file changes in the Inputs folder |
Agent-Logs-Url: https://github.com/cpmodel/FTT_StandAlone/sessions/c7808ca3-18df-4bbb-b204-6c66dc5ff412 Co-authored-by: Femkemilene <26096675+Femkemilene@users.noreply.github.com>
Reviewed all changed |
|
@copilot: can you explain what you did here and why? (1) removed sys.path mutation and restored package import in SourceCode/support/convert_masterfiles_to_csv.py |
In |
|
@copilot: I've recently merged a PR into main. Can you merge main into this branch, so it's easier to assess that everything works together before I start reviewing the PR? |
# Conflicts: # SourceCode/model_class.py # SourceCode/support/input_functions.py Co-authored-by: Femkemilene <26096675+Femkemilene@users.noreply.github.com>
Done — I merged |
|
@cormacmlynch: In the previous version, we had a lot of error messages when things went wrong. Now, the data_loading was more finicky before, so maybe it's not as urgent, but I was wondering if it was a conscious choice to get rid of this functionality. Otherwise, it's lovely. I've tidied up slightly and hopefully increased the stability of the GUI (it gave a weird warning that Claude fixed). I think the type of data errors we need to check for:
I think that should be fairly clean to check for. My first attempt with Claude led to low-quality vibe code, so have not committed this. Would you have time to reintroduce these two checks? |
This pull request restructures how data is stored and read in to the model. The main changes are in input_functions.py.
Excel-based master files and per region per variable csv files have been replaced by a single csv file for each input variable. The benefits of this method are enhanced speed of data loading (no need for the xlsx to csv step and significantly fewer files to read) and better machine-friendly data format (for version control). The Polars library is also used in place of Pandas for speed gains.
The new data loading is only implemented for the power, heat, transport, and freight models. Data files for just S0 are included for now. Scenarios work as they did before -- a full set of variable csvs is not required for a scenario, only files different from S0 (e.g., MEWR in a scenario with regulation on coal).